Skip to main content

Life Science Workflow Services (LifeSWS): Motivations and Architecture

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems LV

Abstract

Data driven science requires manipulating large datasets coming from various data sources through complex workflows based on a variety of models and languages. With the increasing number of big data sources and models developed by different groups, it is hard to relate models and data and use them in unanticipated ways for specific data analysis. Current solutions are typically ad-hoc, specialized for particular data, models and workflow systems. In this paper, we focus on data driven life science and propose an open service-based architecture, Life Science Workflow Services (LifeSWS), which provides data analysis workflow services for life sciences. We illustrate our motivations and rationale for the architecture with real use cases from life science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 49.99
Price excludes VAT (USA)
Softcover Book
USD 64.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Afgan, E., et al.: The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res. 50(W1), 345–351 (2022)

    Article  Google Scholar 

  2. Artzet, S., et al.: Phenomenal: an automatic open source library for 3D shoot architecture reconstruction and analysis for image-based plant phenotyping. BioRxiv p. 805739 (2019)

    Google Scholar 

  3. Bondiombouy, C., Valduriez, P.: Query processing in multistore systems: an overview. Int. J. Cloud Comput. 5(4), 309–346 (2016)

    Article  Google Scholar 

  4. Boursiac, Y., et al.: Phenotyping and modeling of root hydraulic architecture reveal critical determinants of axial water transport. Plant Physiol. 190(2), 1289–1306 (2022)

    Article  Google Scholar 

  5. Brito, A., et al.: Lying in wait: the resurgence of dengue virus after the zika epidemic in Brazil. Nat. Commun. 12, 2619 (2021)

    Google Scholar 

  6. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)

    Google Scholar 

  7. Chen, A., et al.: Developments in MLflow: a system to accelerate the machine learning lifecycle. In: Workshop on Data Management for End-To-End Machine Learning (DEEM@SIGMOD), pp. 5:1–5:4 (2020)

    Google Scholar 

  8. Crusoe, M.R., et al.: Methods included: standardizing computational reuse and portability with the common workflow language. Commun. ACM 65(6), 54–63 (2022)

    Article  Google Scholar 

  9. Daviet, B., Fernandez, R., Cabrera-Bosquet, L., Pradal, C., Fournier, C.: Phenotrack3d: an automatic high-throughput phenotyping pipeline to track maize organs over time. Plant Methods 18(1), 130 (2022)

    Article  Google Scholar 

  10. Fernandez, R., Crabos, A., Maillard, M., Nacry, P., Pradal, C.: High-throughput and automatic structural and developmental root phenotyping on arabidopsis seedlings. Plant Methods 18(1), 1–19 (2022)

    Article  Google Scholar 

  11. Goff, S., et al.: The iplant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)

    Google Scholar 

  12. Guedes, T., et al.: Capturing and analyzing provenance from spark-based scientific workflows with samba-rap. Future Gener. Comput. Syst. 112, 658–669 (2020)

    Article  Google Scholar 

  13. Heidsieck, G., de Oliveira, D., Pacitti, E., Pradal, C., Tardieu, F., Valduriez, P.: Cache-aware scheduling of scientific workflows in a multisite cloud. Futur. Gener. Comput. Syst. 122, 172–186 (2021)

    Article  Google Scholar 

  14. Hey, T., Tansley, S., Tolle, K., Gray, J.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, October 2009

    Google Scholar 

  15. Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv. 54(4) (2021). https://doi.org/10.1145/3447772

  16. Joly, A., et al.: Interactive plant identification based on social image data. Ecol. Inform. 23, 22–34 (2014). Special Issue on Multimedia in Ecology and Environment

    Google Scholar 

  17. Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The CloudMdSQL multistore system. In: ACM SIGMOD International Conference on Management of Data, pp. 2113–2116 (2016)

    Google Scholar 

  18. Lourenço, R., Freire, J., Simon, E., Weber, G., Shasha, D.E.: Bugdoc: iterative debugging and explanation of pipeline. VLDB J. 32(1), 75–101 (2023)

    Article  Google Scholar 

  19. Ludäscher, B., et al.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  20. Lustosa, H.L.S., da Silva, A.C., da Silva, D.N.R., Valduriez, P., Porto, F.A.M.: SAVIME: an array DBMS for simulation analysis and ML models predictions. J. Inf. Data Manag. 11(3), 247–264 (2021)

    Google Scholar 

  21. Muller, B., Martre, P.: Plant and crop simulation models: powerful tools to link physiology, genetics, and phenomics. J. Exp. Bot. 70(9), 2339–2344 (2019)

    Article  Google Scholar 

  22. Neveu, P., et al.: Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven phenotyping hybrid information system. New Phytol. 221(1), 588–601 (2019)

    Article  Google Scholar 

  23. Özsu, M.T.: Data science: a systematic treatment. Commun. ACM 66(7), 106–116 (2023)

    Article  Google Scholar 

  24. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 4th edn. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26253-2

    Book  Google Scholar 

  25. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 8024–8035 (2019)

    Google Scholar 

  26. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Pereira, R.S., et al.: Djensemble: a cost-based selection and allocation of a disjoint ensemble of spatio-temporal models. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 226–231 (2021)

    Google Scholar 

  28. Pradal, C., et al.: InfraPhenoGrid: a scientific workflow infrastructure for Plant Phenomics on the Grid. Futur. Gener. Comput. Syst. 67, 341–353 (2017)

    Article  Google Scholar 

  29. Pradal, C., Cohen-Boulakia, S., Valduriez, P., Shasha, D.: VersionClimber: version upgrades without tears. IEEE Comput. Sci. Eng. 21(5), 87–93 (2019)

    Article  Google Scholar 

  30. Pradal, C., Fournier, C., Valduriez, P., Boulakia, S.C.: OpenAlea: scientific workflows combining data analysis and simulation. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 11:1–11:6 (2015)

    Google Scholar 

  31. Schlegel, M., Sattler, K.: Management of machine learning lifecycle artifacts: a survey. ACM SIGMOD Rec. 51(4), 18–35 (2022)

    Article  Google Scholar 

  32. Silva, V., de Oliveira, D., Valduriez, P., Mattoso, M.: DfAnalyzer: runtime dataflow analysis of scientific applications using provenance. Proc. VLDB Endow. (PVLDB) 11(12), 2082–2085 (2018)

    Article  Google Scholar 

  33. Souza, R., et al.: Workflow provenance in the lifecycle of scientific machine learning. Concur. Comput. Pract. Exp. 34(14) (2022)

    Google Scholar 

  34. Tardieu, F., Cabrera-Bosquet, L., Pridmore, T., Bennett, M.: Plant phenomics, from sensors to knowledge. Curr. Biol. 27(15), R770–R783 (2017)

    Article  Google Scholar 

  35. Valduriez, P., Porto, F.: Data and machine learning model management with Gypscie. In: CARLA workshop on HPC and data sciences meet scientific computing, pp. 1–2 (2022)

    Google Scholar 

  36. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) (2010)

    Google Scholar 

  37. Zhang, C., Ma, Y.: Ensemble Machine Learning, Methods and Applications. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-9326-7

    Book  MATH  Google Scholar 

  38. Zorrilla, R., Ogasawara, E.S., Valduriez, P., Porto, F.: A data-driven model selection approach to spatio-temporal prediction. In: Brazilian Symposium on Databases (SBBD), pp. 1–12 (2022)

    Google Scholar 

Download references

Acknowledgement

This work is within the context of the HPDaSc associated team between Inria and Brazil. Some of us are supported by CNPq research productivity fellowships. C. Pradal has support from the MaCS4Plants CIRAD network, initiated from the AGAP Institute and AMAP joint research units, and EU’s Horizon 2020 research and innovation program (IPM Decisions project No. 817617, BreedingValue project No. 101000747).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Akbarinia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Akbarinia, R. et al. (2023). Life Science Workflow Services (LifeSWS): Motivations and Architecture. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems LV. Lecture Notes in Computer Science(), vol 14280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68100-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-68100-8_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-68099-5

  • Online ISBN: 978-3-662-68100-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics