Abstract
This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For example, the data produced by the CERN/LHC experiments is distributed to data centers all over Europe. See: https://home.cern/about/computing/worldwide-lhc-computing-grid, last accessed Nov 28th 2016.
- 2.
http://handle.net/, last accessed Nov 26th 2016.
- 3.
http://www.pidconsortium.eu/, last accessed Nov 26th 2016.
- 4.
There are online tools available for specifying a DMP, such as: https://dmponline.dcc.ac.uk/, last accessed Nov 25th, 2016.
References
Arora, R.: Data management: state-of-the-practice at open-science data centers. In: Khan, S.U., Zomaya, A.Y. (eds.) Handbook on Data Centers, pp. 1095–1108. Springer, New York (2015). doi:10.1007/978-1-4939-2092-1_37
Askhoj, J., Sugimoto, S., Nagamori, M.: Preserving records in the cloud. Rec. Manage. J. 21(3), 175–187 (2011). https://doi.org/10.1108/09565691111186858
Cox, A.M., Pinfield, S.: Research data management and libraries: current activities and future priorities. J. Librarian. Inf. Sci. 46(4), 299–316 (2014). http://dx.doi.org/10.1177/0961000613492542
DataCite: (2016). http://schema.datacite.org/. Accessed 6 Dec 2016
DFG: Safeguarding good scientific practice (2013). http://www.dfg.de/download/pdf/dfg_im_profil/reden_stellungnahmen/download/empfehlung_wiss_praxis_1310.pdf. Accessed 6 Dec 2016
EU: H2020 programme guidelines on FAIR data management in Horizon 2020 (2016). http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Accessed 6 Dec 2016
EU: European Cloud Initiative - Building a competitive data and knowledge economy in Europe (2016). http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15266. Accessed 6 Dec 2016
Faulhaber, P.: Investing in the future of tape technology. Presentation, HPSS User Forum, New York City (2015)
Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). http://doi.acm.org/10.1145/1107499.1107503
Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008). http://doi.org/10.1353/lib.0.0036
Helly, J., Staudigel, H., Koppers, A.: Scalable models of data sharing in earth sciences. Geochem. Geophy. Geosyst. 4(1) (2003). http://dx.doi.org/10.1029/2002GC000318
Hick, J.: HPSS in the Extreme Scale Era: Report to DOE Office of Science on HPSS in 2018–2022. Lawrence Berkeley National Laboratory (2010)
Hick, J.: The Fifth Workshop on HPC best practices: File systems and archives. Lawrence Berkeley National Laboratory. LBNL Paper LBNL-5262E (2013)
Jensen, U.: Datenmanagementpläne. In: Büttner, S., Hobohm, H.-C., Müller, L. (eds.) Handbuch Forschungsdatenmanagement. Bad Honnef: Bock u. Herchen (2011)
Jones, S.N., Strong, C.R., Parker-Wood, A., Holloway, A., Long, D.D.E.: Easing the burdens of HPC file management. In: Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW 2011, NY, USA, pp. 25–30 (2011). http://doi.acm.org/10.1145/2159352.2159359
Lautenschlager, M., Toussaint, F., Thiemann, H., Reinke, M.: The CERA-2 data model (1998). https://www.pik-potsdam.de/cera/Descriptions/Publications/Papers/9807_DKRZ_TechRep.15/cera2.pdf
Liang, S., Holmes, V., Antoniou, G., Higgins, J.: iCurate: a research data management system. In: Bikakis, A., Zheng, X. (eds.) MIWAI 2015. LNCS, vol. 9426, pp. 39–47. Springer, Cham (2015). doi:10.1007/978-3-319-26181-2_4
Malik, T.: Geobase: indexing NetCDF files for large-scale data analysis. In: Big Data Management, Technologies, and Applications, pp. 295–313. IGI Global (2014). http://doi.org/10.4018/978-1-4666-4699-5.ch012
Mattmann, C.A.: Computing: a vision for data science. Nature 493(7433), 473–475 (2013). http://dx.doi.org/10.1038/493473a
NSF: Grant proposal guide chapter ii.c.2.j (2014). https://www.nsf.gov/pubs/policydocs/pappguide/nsf15001/gpg_2.jsp#dmp. Accessed 6 Dec 2016
OAIS: Reference model for an Open Archival Information System. Technical report, CCSDS 650.0-M-2 (Magenta Book) Issue 2 (2012)
Parker-Wood, A., Long, D.D.E., Madden, B.A., Adams, I.F., McThrow, M., Wildani, A.: Examining extended and scientific metadata for scalable index designs. In: Proceedings of the 6th International Systems and Storage Conference, SYSTOR 2013, NY, USA, pp. 4:1–4:6 (2013). http://doi.acm.org/10.1145/2485732.2485754
Potthoff, J., van Wezel, J., Razum, M., Walk, M.: Anforderungen eines nachhaltigen, disziplinübergreifenden Forschungsdaten-Repositoriums. In: DFN-Forum Kommunikationstechnologien, pp. 11–20 (2014)
Acknowledgments
We would like to thank Wanda Spahn for proofreading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schembera, B., Bönisch, T. (2017). Challenges of Research Data Management for High Performance Computing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67008-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)