Life Science Workflow Services (LifeSWS): Motivations and Architecture

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 14280))

120 Accesses

Abstract

Data driven science requires manipulating large datasets coming from various data sources through complex workflows based on a variety of models and languages. With the increasing number of big data sources and models developed by different groups, it is hard to relate models and data and use them in unanticipated ways for specific data analysis. Current solutions are typically ad-hoc, specialized for particular data, models and workflow systems. In this paper, we focus on data driven life science and propose an open service-based architecture, Life Science Workflow Services (LifeSWS), which provides data analysis workflow services for life sciences. We illustrate our motivations and rationale for the architecture with real use cases from life science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SWEL: A Domain-Specific Language for Modeling Data-Intensive Workflows

Article Open access 06 August 2023

Big Data 2.0 Processing Systems: Taxonomy and Open Challenges

Article 24 June 2016

Scientific workflows

Article 13 June 2016

References

Afgan, E., et al.: The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res. 50(W1), 345–351 (2022)
Article Google Scholar
Artzet, S., et al.: Phenomenal: an automatic open source library for 3D shoot architecture reconstruction and analysis for image-based plant phenotyping. BioRxiv p. 805739 (2019)
Google Scholar
Bondiombouy, C., Valduriez, P.: Query processing in multistore systems: an overview. Int. J. Cloud Comput. 5(4), 309–346 (2016)
Article Google Scholar
Boursiac, Y., et al.: Phenotyping and modeling of root hydraulic architecture reveal critical determinants of axial water transport. Plant Physiol. 190(2), 1289–1306 (2022)
Article Google Scholar
Brito, A., et al.: Lying in wait: the resurgence of dengue virus after the zika epidemic in Brazil. Nat. Commun. 12, 2619 (2021)
Google Scholar
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
Google Scholar
Chen, A., et al.: Developments in MLflow: a system to accelerate the machine learning lifecycle. In: Workshop on Data Management for End-To-End Machine Learning (DEEM@SIGMOD), pp. 5:1–5:4 (2020)
Google Scholar
Crusoe, M.R., et al.: Methods included: standardizing computational reuse and portability with the common workflow language. Commun. ACM 65(6), 54–63 (2022)
Article Google Scholar
Daviet, B., Fernandez, R., Cabrera-Bosquet, L., Pradal, C., Fournier, C.: Phenotrack3d: an automatic high-throughput phenotyping pipeline to track maize organs over time. Plant Methods 18(1), 130 (2022)
Article Google Scholar
Fernandez, R., Crabos, A., Maillard, M., Nacry, P., Pradal, C.: High-throughput and automatic structural and developmental root phenotyping on arabidopsis seedlings. Plant Methods 18(1), 1–19 (2022)
Article Google Scholar
Goff, S., et al.: The iplant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2 (2011)
Google Scholar
Guedes, T., et al.: Capturing and analyzing provenance from spark-based scientific workflows with samba-rap. Future Gener. Comput. Syst. 112, 658–669 (2020)
Article Google Scholar
Heidsieck, G., de Oliveira, D., Pacitti, E., Pradal, C., Tardieu, F., Valduriez, P.: Cache-aware scheduling of scientific workflows in a multisite cloud. Futur. Gener. Comput. Syst. 122, 172–186 (2021)
Article Google Scholar
Hey, T., Tansley, S., Tolle, K., Gray, J.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, October 2009
Google Scholar
Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv. 54(4) (2021). https://doi.org/10.1145/3447772
Joly, A., et al.: Interactive plant identification based on social image data. Ecol. Inform. 23, 22–34 (2014). Special Issue on Multimedia in Ecology and Environment
Google Scholar
Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The CloudMdSQL multistore system. In: ACM SIGMOD International Conference on Management of Data, pp. 2113–2116 (2016)
Google Scholar
Lourenço, R., Freire, J., Simon, E., Weber, G., Shasha, D.E.: Bugdoc: iterative debugging and explanation of pipeline. VLDB J. 32(1), 75–101 (2023)
Article Google Scholar
Ludäscher, B., et al.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)
Article Google Scholar
Lustosa, H.L.S., da Silva, A.C., da Silva, D.N.R., Valduriez, P., Porto, F.A.M.: SAVIME: an array DBMS for simulation analysis and ML models predictions. J. Inf. Data Manag. 11(3), 247–264 (2021)
Google Scholar
Muller, B., Martre, P.: Plant and crop simulation models: powerful tools to link physiology, genetics, and phenomics. J. Exp. Bot. 70(9), 2339–2344 (2019)
Article Google Scholar
Neveu, P., et al.: Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven phenotyping hybrid information system. New Phytol. 221(1), 588–601 (2019)
Article Google Scholar
Özsu, M.T.: Data science: a systematic treatment. Commun. ACM 66(7), 106–116 (2023)
Article Google Scholar
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 4th edn. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26253-2
Book Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 8024–8035 (2019)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pereira, R.S., et al.: Djensemble: a cost-based selection and allocation of a disjoint ensemble of spatio-temporal models. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 226–231 (2021)
Google Scholar
Pradal, C., et al.: InfraPhenoGrid: a scientific workflow infrastructure for Plant Phenomics on the Grid. Futur. Gener. Comput. Syst. 67, 341–353 (2017)
Article Google Scholar
Pradal, C., Cohen-Boulakia, S., Valduriez, P., Shasha, D.: VersionClimber: version upgrades without tears. IEEE Comput. Sci. Eng. 21(5), 87–93 (2019)
Article Google Scholar
Pradal, C., Fournier, C., Valduriez, P., Boulakia, S.C.: OpenAlea: scientific workflows combining data analysis and simulation. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 11:1–11:6 (2015)
Google Scholar
Schlegel, M., Sattler, K.: Management of machine learning lifecycle artifacts: a survey. ACM SIGMOD Rec. 51(4), 18–35 (2022)
Article Google Scholar
Silva, V., de Oliveira, D., Valduriez, P., Mattoso, M.: DfAnalyzer: runtime dataflow analysis of scientific applications using provenance. Proc. VLDB Endow. (PVLDB) 11(12), 2082–2085 (2018)
Article Google Scholar
Souza, R., et al.: Workflow provenance in the lifecycle of scientific machine learning. Concur. Comput. Pract. Exp. 34(14) (2022)
Google Scholar
Tardieu, F., Cabrera-Bosquet, L., Pridmore, T., Bennett, M.: Plant phenomics, from sensors to knowledge. Curr. Biol. 27(15), R770–R783 (2017)
Article Google Scholar
Valduriez, P., Porto, F.: Data and machine learning model management with Gypscie. In: CARLA workshop on HPC and data sciences meet scientific computing, pp. 1–2 (2022)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) (2010)
Google Scholar
Zhang, C., Ma, Y.: Ensemble Machine Learning, Methods and Applications. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-9326-7
Book MATH Google Scholar
Zorrilla, R., Ogasawara, E.S., Valduriez, P., Porto, F.: A data-driven model selection approach to spatio-temporal prediction. In: Brazilian Symposium on Databases (SBBD), pp. 1–12 (2022)
Google Scholar

Download references

Acknowledgement

This work is within the context of the HPDaSc associated team between Inria and Brazil. Some of us are supported by CNPq research productivity fellowships. C. Pradal has support from the MaCS4Plants CIRAD network, initiated from the AGAP Institute and AMAP joint research units, and EU’s Horizon 2020 research and innovation program (IPM Decisions project No. 817617, BreedingValue project No. 101000747).

Author information

Authors and Affiliations

Inria, Univ Montpellier, CNRS, LIRMM, Montpellier, France
Reza Akbarinia, Christophe Botella, Alexis Joly, Florent Masseglia, Esther Pacitti, Christophe Pradal & Patrick Valduriez
Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Marta Mattoso
CEFET/RJ, Rio de Janeiro, Brazil
Eduardo Ogasawara
Fluminense Federal University, Rio de Janeiro, Brazil
Daniel de Oliveira
LNCC, Petrópolis, Brazil
Fabio Porto
New York University, New York, USA
Dennis Shasha
CIRAD, AGAP Institute, Univ Montpellier, INRAE, Institut Agro Montpellier, Montpellier, France
Christophe Pradal

Authors

Reza Akbarinia
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Botella
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Joly
View author publications
You can also search for this author in PubMed Google Scholar
Florent Masseglia
View author publications
You can also search for this author in PubMed Google Scholar
Marta Mattoso
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Ogasawara
View author publications
You can also search for this author in PubMed Google Scholar
Daniel de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Esther Pacitti
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Porto
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Pradal
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Shasha
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Akbarinia .

Editor information

Editors and Affiliations

Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Technische Universität Wien, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Akbarinia, R. et al. (2023). Life Science Workflow Services (LifeSWS): Motivations and Architecture. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems LV. Lecture Notes in Computer Science(), vol 14280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68100-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-68100-8_1
Published: 28 September 2023
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-68099-5
Online ISBN: 978-3-662-68100-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Life Science Workflow Services (LifeSWS): Motivations and Architecture

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SWEL: A Domain-Specific Language for Modeling Data-Intensive Workflows

Big Data 2.0 Processing Systems: Taxonomy and Open Challenges

Scientific workflows

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Life Science Workflow Services (LifeSWS): Motivations and Architecture

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SWEL: A Domain-Specific Language for Modeling Data-Intensive Workflows

Big Data 2.0 Processing Systems: Taxonomy and Open Challenges

Scientific workflows

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation