Abstract
The amount of data, its heterogeneity and the speed at which it is generated are increasingly diverse and the current systems are not able to handle on-demand real-time data access. In traditional data integration approaches such as ETL, physically loading the data into data stores that use different technologies is becoming costly, time-consuming, inefficient, and a bottleneck. Recently, data virtualization has been used to accelerate the data integration process and provides a solution to previous challenges by delivering a unified, integrated, and holistic view of trusted data, on-demand and in real-time. This paper provides an overview of traditional data integration, in addition to its limits. We discuss data virtualization, its core capabilities and features, how it can complement other data integration approaches, and how it improves traditional data architecture paradigms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The European Unions General Data Protection Regulation.
- 2.
Open Data Protocol.
References
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2012)
Armbrust, M., Ghodsi, A., Xin, R., Zaharia, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: Proceedings of CIDR (2021)
Behm, A., et al.: Photon: a fast query engine for Lakehouse systems. In: Proceedings of the 2022 International Conference on Management of Data, pp. 2326–2339 (2022)
Bogdanov, A., Degtyarev, A., Shchegoleva, N., Khvatov, V.: On the way from virtual computing to virtual data processing. In: CEUR Workshop Proceedings, pp. 25–30 (2020)
Bogdanov, A., Degtyarev, A., Shchegoleva, N., Khvatov, V., Korkhov, V.: Evolving principles of big data virtualization. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 67–81. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_6
Bogdanov, A., Degtyarev, A., Shchegoleva, N., Korkhov, V., Khvatov, V.: Big data virtualization: why and how? In: CEUR Workshop Proceedings (2679), pp. 11–21 (2020)
Chatziantoniou, D., Kantere, V.: Datamingler: a novel approach to data virtualization. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2681–2685 (2021)
Earley, S.: Data virtualization and digital agility. IT Professional 18(5), 70–72 (2016)
Eryurek, E., Gilad, U., Lakshmanan, V., Kibunguchy-Grant, A., Ashdown, J.: Data governance: the definitive guide. “O’ Reilly Media, Inc.” (2021)
Gartner: Definition of dark data - it glossary. https://www.gartner.com/en/information-technology/glossary/dark-data. Accessed 14 Apr 2022
Gorelik, A.: The enterprise big data lake: delivering the promise of big data and data science. O’Reilly Media (2019)
Gottlieb, M., Shraideh, M., Fuhrmann, I., Böhm, M., Krcmar, H.: Critical success factors for data virtualization: a literature review. ISC Int. J. Inf. Secur. 11(3), 131–137 (2019)
Guo, S.S., Yuan, Z.M., Sun, A.B., Yue, Q.: A new ETL approach based on data virtualization. J. Comput. Sci. Technol. 30(2), 311–323 (2015)
Halevy, A., Doan, A.: Zgi (autor). Principles of data integration (2012)
Hilger, J., Wahl, Z.: Graph databases. In: Making Knowledge Management Clickable, pp. 199–208. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-92385-3_13
Kukreja, M.: Data engineering with apache spark, delta lake, and Lakehouse. “Packt Publishing Ltd.” (2021)
Van der Lans, R.F.: Creating an agile data integration platform using data virtualization. R20/Consultancy technical white paper (2014)
Van der Lans, R.F.: Architecting the multi-purpose data lake with data virtualization. Denodo (2018)
Lennerholt, C., van Laere, J., Söderström, E.: Implementation challenges of self service business intelligence: a literature review. In: 51st Hawaii International Conference on System Sciences, Hilton Waikoloa Village, Hawaii, USA, 3-6 Jan 2018, vol. 51, pp. 5055–5063. IEEE Computer Society (2018)
LEsteve, R.: Adaptive query execution. In: The Azure Data Lakehouse Toolkit, pp. 327–338. Springer (2022). https://doi.org/10.1007/978-1-4842-8233-5_14
Menge, F.: Enterprise service bus. In: Free and open source software conference, vol. 2, pp. 1–6 (2007)
Miller, L.C.: Data Virtualization For Dummies, Denodo Special Edition. “John Wiley & Sons, Ltd.” (2018)
Mousa, A.H., Shiratuddin, N.: Data warehouse and data virtualization comparative study. In: 2015 International Conference on Developments of E-Systems Engineering (DeSE), pp. 369–372. IEEE (2015)
Mousa, A.H., Shiratuddin, N., Bakar, M.S.A.: Virtual data mart for measuring organizational achievement using data virtualization technique (KPIVDM). J. Teknologi 68(3), 2932 (2014)
Muniswamaiah, M., Agerwala, T., Tappert, C.: Data virtualization for analytics and business intelligence in big data. In: CS & IT Conference Proceedings. CS & IT Conference Proceedings (2019)
Offia, C.E.: Using logical data warehouse in the process of big data integration and big data analytics in organisational sector, Ph. D. thesis, University of the West of Scotland (2021)
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ.-Comput. Inf. Sci. 30(4), 431–448 (2018)
Papadopoulos, T., Balta, M.E.: Climate change and big data analytics: challenges and opportunities. Int. J. Inf. Manage. 63, 102448 (2022)
Raguseo, E.: Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. J. Inf. Manage. 38(1), 187–195 (2018)
Reinsel, D., Gantz, J., Rydning, J.: The digitization of the world from edge to core. Framingham: International Data Corporation, p. 16 (2018)
Sarkar, P.: Data as a service: a framework for providing reusable enterprise data services. John Wiley & Sons (2015)
Satio, K., Maita, N., Watanabe, Y., Kobayashi, A.: Data virtualization for data source integration. IEICE Technical Report; IEICE Tech. Rep. 116(137), 37���41 (2016)
Shraideh, M., Gottlieb, M., Kienegger, H., Böhm, M., Krcmar, H., et al.: Decision support for data virtualization based on fifteen critical success factors: a methodology. In: MWAIS 2019 Proceedings (2019)
Skluzacek, T.J.: Automated metadata extraction can make data swamps more navigable, Ph. D. thesis, The University of Chicago (2022)
Stein, B., Morrison, A.: The enterprise data lake: better integration and deeper analytics. PwC Technol. Forecast: Rethinking Integr. 1(1–9), 18 (2014)
Zaidi, E., Menon, S., Thanaraj, R., Showell, N.: Magic quadrant for data integration tools. Technical report G00758102, Gartner, Inc. (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Akermi, M., Hadj Taieb, M.A., Ben Aouicha, M. (2023). Data Virtualization Layer Key Role in Recent Analytical Data Architectures. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-031-35501-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-35501-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35500-4
Online ISBN: 978-3-031-35501-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)