Skip to main content

A Comprehensive Approach for the Conceptual Modeling of Genomic Data

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2022)

Abstract

The human genome is traditionally represented as a DNA sequence of three billion base pairs. However, its intricacies are captured by many more complex signals, representing DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data formats is used to represent such signals. Different conceptual models explain such elaborate structure and behavior. Among them, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior – independent of data formats. The Genomic Conceptual Model (GCM) instead provides a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. We hereby propose to join these two approaches to achieve a more complete vision, linking (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. The link is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual, as data records can be semantically described by high-level concepts and exploit their links. In turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. As a result, it will be possible to express queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. The approach is here exemplified using the DNA variation data type but is applicable to all genomic information.

A. Bernasconi and A. García S.—Should be regarded as Joint First Authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 39.99
Price excludes VAT (USA)
Softcover Book
USD 54.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Augustyn, D.R., et al.: Perspectives of using Cloud computing in integrative analysis of multi-omics data. Brief. Funct. Genomics 20(4), 198–206 (2021)

    Google Scholar 

  2. Bass, J.I.F., et al.: Human gene-centered transcription factor networks for enhancers and disease variants. Cell 161(3), 661–673 (2015)

    Article  Google Scholar 

  3. Bernasconi, A., et al.: The road towards data integration in human genomics: players, steps and interactions. Brief. Bioinform. 22(1), 30–44 (2021). https://doi.org/10.1093/bib/bbaa080

    Article  Google Scholar 

  4. Bernasconi, A., et al.: META-BASE: a novel architecture for large-scale genomic metadata integration. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(1), 543–557 (2022)

    Article  Google Scholar 

  5. Bernasconi, A., Ceri, S., Campi, A., Masseroli, M.: Conceptual modeling for genomics: building an integrated repository of open data. In: Mayr, H.C., Guizzardi, G., Ma, H., Pastor, O. (eds.) ER 2017. LNCS, vol. 10650, pp. 325–339. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69904-2_26

    Chapter  Google Scholar 

  6. Bornberg-Bauer, E., et al.: Conceptual data modelling for bioinformatics. Brief. Bioinform. 3(2), 166–180 (2002)

    Article  Google Scholar 

  7. Calvanese, D., et al.: Ontology-based database access. In: SEBD, pp. 324–331 (2007)

    Google Scholar 

  8. Canakoglu, A., et al.: GenoSurf: metadata driven semantic search system for integrated genomic datasets. Database 2019 (2019)

    Google Scholar 

  9. Cappelli, E., et al.: OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata. Appl. Sci. 10(18), 6367 (2020)

    Article  Google Scholar 

  10. Ceri, S., Bernasconi, A., Canakoglu, A., Gulino, A., Kaitoua, A., Masseroli, M., Nanni, L., Pinoli, P.: Overview of GeCo: a project for exploring and integrating signals from the genome. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds.) DAMDID/RCDL 2017. CCIS, vol. 822, pp. 46–57. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96553-6_4

    Chapter  Google Scholar 

  11. Ceri, S., et al.: What you always wanted to know about Datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1(1), 146–166 (1989)

    Article  Google Scholar 

  12. García, A., et al.: Towards the understanding of the human genome: a holistic conceptual modeling approach. IEEE Access 8, 197111–197123 (2020)

    Article  Google Scholar 

  13. García, A., et al.: A conceptual model-based approach to improve the representation and management of omics data in precision medicine. IEEE Access 9, 154071–154085 (2021)

    Article  Google Scholar 

  14. García S., A., Casamayor, J.C., Pastor, O.: ISGE: a conceptual model-based method to correctly manage genome data. In: Nurcan, S., Korthaus, A. (eds.) CAiSE 2021. LNBIP, vol. 424, pp. 47–54. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79108-7_6

    Chapter  Google Scholar 

  15. Gopinath, C., et al.: Contemporary animal models for human gene therapy applications. Curr. Gene Ther. 15(6), 531–540 (2015)

    Article  Google Scholar 

  16. Mamidi, T.K.K., et al.: Integrating germline and somatic variation information using genomic data for the discovery of biomarkers in prostate cancer. BMC Cancer 19(1), 1–12 (2019)

    Article  Google Scholar 

  17. Masseroli, M., et al.: Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data. Bioinformatics 35(5), 729–736 (2018)

    Google Scholar 

  18. Masseroli, M., et al.: Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111, 3–11 (2016)

    Article  Google Scholar 

  19. Mayr, H.C., et al.: The triptych of conceptual modeling. Softw. Syst. Model. 20(1), 7–24 (2021)

    Article  Google Scholar 

  20. Meyerson, W., et al.: Origins and characterization of variants shared between databases of somatic and germline human mutations. BMC Bioinform. 21(1), 1–22 (2020)

    Article  Google Scholar 

  21. Nghiem, P.P., et al.: Gene therapies in canine models for duchenne muscular dystrophy. Hum. Genet. 138(5), 483–489 (2019)

    Article  Google Scholar 

  22. O’Leary, N.A., et al.: Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44(D1), D733–D745 (2016)

    Article  Google Scholar 

  23. Pastor, O., et al.: Model-based engineering applied to the interpretation of the human genome. In: The Evolution of Conceptual Modeling, pp. 306–330. Springer (2011)

    Google Scholar 

  24. Paton, N.W., et al.: Conceptual modelling of genomic information. Bioinformatics 16(6), 548–557 (2000)

    Article  Google Scholar 

  25. Przytycki, P.F., et al.: Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med. 9(1), 79 (2017)

    Article  Google Scholar 

  26. 1000 Genomes Project Consortium: a global reference for human genetic variation. Nature 526(7571), 68 (2015)

    Google Scholar 

  27. ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)

    Article  Google Scholar 

  28. Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 404–412. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_31

    Chapter  Google Scholar 

  29. Safran, M., Rosen, N., Twik, M., BarShir, R., Stein, T.I., Dahary, D., Fishilevich, S., Lancet, D.: The GeneCards Suite. In: Abugessaisa, I., Kasukawa, T. (eds.) Practical Guide to Life Science Databases, pp. 27–56. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-5812-9_2

    Chapter  Google Scholar 

  30. Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)

    Article  MathSciNet  Google Scholar 

  31. Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)

    Article  Google Scholar 

Download references

Acknowledgement

This research is funded by the ERC Advanced Grant 693174 GeCo (Data-Driven Genomic Computing), INNEST/2021/57, and MICIN/AEI/ 10.13039/501100011033.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Bernasconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bernasconi, A., García S., A., Ceri, S., Pastor, O. (2022). A Comprehensive Approach for the Conceptual Modeling of Genomic Data. In: Ralyté, J., Chakravarthy, S., Mohania, M., Jeusfeld, M.A., Karlapalem, K. (eds) Conceptual Modeling. ER 2022. Lecture Notes in Computer Science, vol 13607. Springer, Cham. https://doi.org/10.1007/978-3-031-17995-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17995-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17994-5

  • Online ISBN: 978-3-031-17995-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics