×

A computational framework based on the dynamic pipeline approach. (English) Zbl 07870771

Summary: Stream processing has inspired new computational approaches to facilitate effectiveness and efficiency. One such approach is the dynamic pipeline, which serves as a powerful computational model for stream processing. It is particularly well suited for solving problems that require incremental generation of results, making it an approach for scenarios where real-time analysis and responsiveness are critical. This paper aims to address a family of problems using the Dynamic Pipeline approach, and as a first step, we provide a comprehensive characterization of this problem family. In addition, we present the definition of a Dynamic Pipeline framework. To demonstrate the practicality of this framework, we present a proof of concept through its implementation and perform an empirical performance study. To this end, we focus on solving the problem of enumerating or listing the weakly connected components of a graph within the proposed framework. We provide two implementations of this algorithm to demonstrate the computational power and continuous behavior of the Dynamic Pipeline framework. The first implementation serves as a baseline for our experiments, representing an ad hoc solution based on the Dynamic Pipeline approach. In contrast, the second implementation is built on top of the developed framework. The observed results strongly support the suitability and effectiveness of the Dynamic Pipeline framework for implementing graph stream processing problems, especially those where continuous and real-time result generation is essential.

MSC:

68-XX Computer science
Full Text: DOI

References:

[1] Acosta, M.; Vidal, M.-E.; Sure-Vetter, Y., Diefficiency metrics: measuring the continuous efficiency of query processing approaches, (International Semantic Web Conference, 2017, Springer), 3-19
[2] Ahmed, M.; Seraj, R.; Islam, S. M.S., The k-means algorithm: a comprehensive survey and performance evaluation, Electronics, 9, 8, 1295, 2020
[3] Bender, E. A., Partitions of multisets, Discrete Math., 9, 4, 301-311, 1974 · Zbl 0289.05014
[4] Blizard, W. D., The development of multiset theory, Mod. Log., 1, 4, 319-352, 1991 · Zbl 0744.03054
[5] Buisson, J.; Andre, F.; Pazat, J.-L., Performance and practicability of dynamic adaptation for parallel computing, (2006 15th IEEE International Conference on High Performance Distributed Computing, 2006), 331-332
[6] Dean, J.; Ghemawat, S., MapReduce: simplified data processing on large clusters, Commun. ACM, 51, 1, 107-113, 2008
[7] Dunning, T.; Friedman, E., Streaming Architecture: New Designs Using Apache Kafka and MapR Streams, 2016, O’Reilly Media, Inc.
[8] Fowler, M., Domain-Specific Languages, 2010, Pearson Education
[9] Gallardo, M.-d.-M.; Panizo, L., Trace analysis using an event-driven interval temporal logic, (International Symposium on Logic-Based Program Synthesis and Transformation, 2019, Springer), 177-192 · Zbl 1502.68183
[10] Gedik, B., Generic windowing support for extensible stream processing systems, Softw. Pract. Exp., 44, 9, 1105-1128, 2014
[11] Gordon, M. I.; Thies, W.; Amarasinghe, S., Exploiting coarse-grained task, data, and pipeline parallelism in stream programs, ACM SIGOPS Oper. Syst. Rev., 40, 5, 151-162, 2006
[12] Gross, J. L.; Yellen, J., Handbook of Graph Theory, 2003, CRC Press
[13] Harris, T.; Marlow, S.; Peyton-Jones, S.; Herlihy, M., Composable memory transactions, (Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005), 48-60
[14] Herath, J.; Yamaguchi, Y.; Saito, N.; Yuba, T., Dataflow computing models, languages, and machines for intelligence computations, IEEE Trans. Softw. Eng., 14, 12, 1805-1828, 1988 · Zbl 0745.68020
[15] Howard, W. A., The formulae-as-types notion of construction, (Hindley, J. R.; Seldin, J. P., To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus, and Formalism, 1980, Academic Press)
[16] Hueske, F.; Kalavri, V., Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications, 2019, O’Reilly Media
[17] Im, S.; Moseley, B.; Sun, X., Efficient massively parallel methods for dynamic programming, (Hatami, H.; McKenzie, P.; King, V., Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, 2017, ACM), 798-811 · Zbl 1370.68316
[18] Karloff, H.; Suri, S.; Vassilvitskii, S., A Model of Computation for MapReduce, (Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, 2010, ACM), 938-948 · Zbl 1288.68247
[19] Khezr, S. N.; Navimipour, N. J., Mapreduce and its applications, challenges, and architecture: a comprehensive review and directions for future research, J. Grid Comput., 15, 295-321, 2017
[20] B. Le Goff, P. Le Guernic, J. Araoz Durand, Semi-granules and schielding for off-line scheduling, Rapports de recherche- INRIA.
[21] Ledgard, H. F.; Marcotty, M., A genealogy of control structures, Commun. ACM, 18, 11, 629-639, 1975 · Zbl 0313.68024
[22] Lee, I.-T. A.; Leiserson, C. E.; Schardl, T. B.; Zhang, Z.; Sukha, J., On-the-fly pipeline parallelism, ACM Trans. Parallel Comput., 2, 3, 1-42, 2015
[23] Lee, K.-H.; Lee, Y.-J.; Choi, H.; Chung, Y. D.; Moon, B., Parallel data processing with mapreduce: a survey, ACM SIGMOD Rec., 40, 4, 11-20, 2012
[24] Leskovec, J.; Krevl, A., SNAP datasets: Stanford large network dataset collection, June 2014
[25] Li, J.; Maier, D.; Tufte, K.; Papadimos, V.; Tucker, P. A., Semantics and evaluation techniques for window aggregates in data streams, (Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, 2005), 311-322
[26] Haskell, S. Marlow, Async library, 2021
[27] Marlow, S., Parallel and concurrent programming in Haskell, (Zsók, V.; Horváth, Z.; Plasmeijer, R., CEFP 2011. CEFP 2011, LNCS, vol. 7241, 2012, O’Reilly Media, Inc.), 339-401
[28] Marlow, S.; Maier, P.; Loidl, H.-W.; Aswad, M. K.; Trinder, P., Seq no more: better strategies for parallel Haskell, ACM SIGPLAN Not., 45, 11, 91-102, 2010
[29] Marlow, S.; Newton, R.; Peyton Jones, S., A monad for deterministic parallelism, ACM SIGPLAN Not., 46, 12, 71-82, 2011
[30] Navarro, A.; Asenjo, R.; Tabik, S.; Cascaval, C., Analytical modeling of pipeline parallelism, (2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009, IEEE), 281-290
[31] Nguyen, T. T.; Weidlich, M.; Yin, H.; Zheng, B.; Nguyen, Q. H.; Factcatch, Q. V.H. Nguyen, Incremental pay-as-you-go fact checking with minimal user effort, (Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020), 2165-2168
[32] Pasarella, E.; Vidal, M.-E.; Zoltan, C., Comparing MapReduce and pipeline implementations for counting triangles, Electron. Proc. Theor. Comput. Sci., 237, 20-33, 2017
[33] Röger, H.; Mayer, R., A comprehensive survey on parallelization and elasticity in stream processing, ACM Comput. Surv., 52, 2, 1-37, 2019
[34] Royo-Sales, J. P.; Pasarella, E.; Zoltan, C.; Vidal, M.-E., Towards a dynamic pipeline framework implemented in (parallel) Haskell, (PROLE2021, 2021, SISTEDES)
[35] Smith, D. R., The design of divide and conquer algorithms, Sci. Comput. Program., 5, 37-58, 1985 · Zbl 0554.68021
[36] Tanenbaum, A. S.; Austin, T., Structured Computer Organization, 2013, Pearson
[37] Ukey, N.; Yang, Z.; Yang, W.; Li, B.; Li, R., knn join for dynamic high-dimensional data: a parallel approach, (Bao, Z.; Borovica-Gajic, R.; Qiu, R.; Choudhury, F.; Yang, Z., Databases Theory and Applications, 2024, Springer Nature Switzerland: Springer Nature Switzerland Cham), 3-16
[38] Van Dongen, G.; Van den Poel, D., Evaluation of stream processing frameworks, IEEE Trans. Parallel Distrib. Syst., 31, 8, 1845-1858, 2020
[39] Yazdanpanah, F.; Alvarez-Martinez, C.; Jimenez-Gonzalez, D.; Etsion, Y., Hybrid dataflow/von-neumann architectures, IEEE Trans. Parallel Distrib. Syst., 6, 25, 1489-1509, 2014
[40] Zoltan, C.; Pasarella, E.; Araoz, J.; Vidal, M.-E., The dynamic pipeline paradigm, (PROLE2019, 2019, SISTEDES)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.