×

Partial sums on the ultra-wide word RAM. (English) Zbl 1533.68041

Summary: We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic \(w\)-bit word RAM model with special ultrawords of length \(w^2\) bits that support standard arithmetic and Boolean operation and scattered memory access operations that can access \(w\) (non-contiguous) locations in memory. The ultra-wide word RAM model captures (and idealizes) modern vector processor architectures.
Our main result is a new in-place data structure for the partial sum problem that only stores a constant number of ultrawords in addition to the input and supports operations in doubly logarithmic time. This matches the best known time bounds for the problem (among polynomial space data structures) while improving the space from superlinear to a constant number of ultrawords. Our results are based on a simple and elegant in-place word RAM data structure, known as the Fenwick tree. Our main technical contribution is a new efficient parallel ultra-wide word RAM implementation of the Fenwick tree, which is likely of independent interest.

MSC:

68P05 Data structures
68Q09 Other nonclassical models of computation

Software:

heapsort

References:

[1] Bille, P.; Gørtz, I. L.; Skjoldjensen, F. R., Partial sums on the ultra-wide word ram, (Proc. 16th TAMC (2020)) · Zbl 1533.68042
[2] Fredman, M. L., The complexity of maintaining an array and computing its partial sums, J. ACM, 29, 1, 250-260 (1982) · Zbl 0477.68046
[3] Fredman, M.; Saks, M., The cell probe complexity of dynamic data structures, (Proc. 21st STOC (1989)), 345-354
[4] Fenwick, P. M., A new data structure for cumulative frequency tables, Softw. Pract. Exp., 24, 3, 327-336 (1994)
[5] Hon, W.-K.; Sadakane, K.; Sung, W.-K., Succinct data structures for searchable partial sums with optimal worst-case performance, Theor. Comput. Sci., 412, 39, 5176-5186 (2011) · Zbl 1226.68032
[6] Husfeldt, T.; Rauhe, T., New lower bound techniques for dynamic partial sums and related problems, SIAM J. Comput., 32, 3, 736-753 (2003) · Zbl 1046.68055
[7] Husfeldt, T.; Rauhe, T.; Skyum, S., Lower bounds for dynamic transitive closure, planar point location, and parentheses matching, (Proc. 5th SWAT (1996)), 198-211 · Zbl 0873.68096
[8] Raman, R.; Raman, V.; Rao, S. S., Succinct dynamic data structures, (Proc. 7th WADS (2001)), 426-437 · Zbl 0997.68520
[9] Pǎtraşcu, M.; Demaine, E. D., Logarithmic lower bounds in the cell-probe model, SIAM J. Comput., 35, 4, 932-963 (2006), announced at SODA 2004 · Zbl 1122.68044
[10] Dietz, P. F., Optimal algorithms for list indexing and subset rank, (Proc. 1st WADS (1989)), 39-46 · Zbl 0766.68057
[11] Yao, A. C., On the complexity of maintaining partial sums, SIAM J. Comput., 14, 2, 277-288 (1985) · Zbl 0564.68072
[12] Bille, P.; Christiansen, A. R.; Prezza, N.; Skjoldjensen, F. R., Succinct partial sums and Fenwick trees, (Proc. 24th SPIRE (2017)), 91-96 · Zbl 1454.68027
[13] Bille, P.; Christiansen, A. R.; Cording, P. H.; Gørtz, I. L.; Skjoldjensen, F. R.; Vildhøj, H. W.; Vind, S., Dynamic relative compression, dynamic partial sums, and substring concatenation, Algorithmica, 80, 11, 3207-3224 (2018), announced at ISAAC 2016 · Zbl 1410.68415
[14] Hampapuram, H.; Fredman, M. L., Optimal biweighted binary trees and the complexity of maintaining partial sums, SIAM J. Comput., 28, 1, 1-9 (1998) · Zbl 0914.68041
[15] Fredman, M. L., A lower bound on the complexity of orthogonal range queries, J. ACM, 28, 4, 696-705 (1981) · Zbl 0468.68049
[16] Frandsen, G. S.; Miltersen, P. B.; Skyum, S., Dynamic word problems, J. ACM, 44, 2, 257-271 (1997) · Zbl 0890.68060
[17] Burkhard, W. A.; Fredman, M. L.; Kleitman, D. J., Inherent complexity trade-offs for range query problems, Theor. Comput. Sci., 16, 3, 279-290 (1981) · Zbl 0522.68090
[18] Ben-Amram, A. M.; Galil, Z., Lower bounds for dynamic data structures on algebraic RAMs, Algorithmica, 32, 3, 364-395 (2002) · Zbl 1050.68025
[19] Ben-Amram, A. M.; Galil, Z., A generalization of a lower bound technique due to Fredman and Saks, Algorithmica, 30, 1, 34-66 (2001) · Zbl 0992.68034
[20] Ryabko, B. Y., A fast on-line adaptive code, IEEE Trans. Inf. Theory, 38, 4, 1400-1404 (1992) · Zbl 0775.94080
[21] Miltersen, P. B., Cell probe complexity-a survey, (Proc. 19th FSTTCS (1999)), 2
[22] Brodnik, A., Searching in constant time and minimum space (minimae res magni momenti sunt) (1995), University of Waterloo, Ph.D. thesis
[23] Brodnik, A.; Carlsson, S.; Fredman, M. L.; Karlsson, J.; Munro, J. I., Worst case constant time priority queue, J. Syst. Softw., 78, 3, 249-256 (2005)
[24] Hagerup, T., Sorting and searching on the word ram, (Proc. 15th STACS (1998)), 366-398
[25] Leben, R.; Miletic, M.; Špegel, M.; Trost, A.; Brodnik, A.; Karlsson, J., Design of high performance memory module on PC100, (Proc. ECSC (1999)), 75-78
[26] Brodnik, A.; Karlsson, J.; Munro, J. I.; Nilsson, A., An \(O(1)\) solution to the prefix sum problem on a specialized memory architecture, (Proc. 4th IFIP TCS (2006)), 103-114
[27] Farzan, A.; López-Ortiz, A.; Nicholson, P. K.; Salinger, A., Algorithms in the ultra-wide word model, (Proc. 12th TAMC (2015)), 335-346 · Zbl 1460.68014
[28] Stephens, N., The ARM scalable vector extension, IEEE Micro, 37, 2, 26-39 (2017)
[29] Reinders, J., AVX-512 Instructions (2013), Intel Corporation
[30] Lindholm, E.; Nickolls, J.; Oberman, S.; Montrym, J., NVIDIA Tesla: a unified graphics and computing architecture, IEEE Micro, 28, 2, 39-55 (2008)
[31] Chen, T.; Raghavan, R.; Dale, J. N.; Iwata, E., Cell broadband engine architecture and its first implementation—a performance view, IBM J. Res. Dev., 51, 5, 559-572 (2007)
[32] Williams, J. W.J., Algorithm 232: heapsort, Commun. ACM, 7, 347-348 (1964)
[33] Munro, J. I.; Suwanda, H., Implicit data structures for fast search and update, J. Comput. Syst. Sci., 21, 2, 236-250 (1980) · Zbl 0446.68045
[34] Salowe, J.; Steiger, W., Simplified stable merging tasks, J. Algorithms, 8, 4, 557-571 (1987) · Zbl 0641.68092
[35] Franceschini, G.; Muthukrishnan, S.; Pǎtraşcu, M., Radix sorting with no extra space, (Proc. 15th ESA (2007)), 194-205 · Zbl 1151.68391
[36] Chan, T. M.; Chen, E. Y., Optimal in-place algorithms for 3-d convex hulls and 2-d segment intersection, (Proc. 25th SOCG (2009)), 80-87 · Zbl 1388.68284
[37] Ladner, R. E.; Fischer, M. J., Parallel prefix computation, J. ACM, 27, 4, 831-838 (1980) · Zbl 0445.68066
[38] Blelloch, G. E., Prefix sums and their applications, (Synthesis of Parallel Algorithms (1990))
[39] Larsen, K. G.; Pagh, R., I/o-efficient data structures for colored range and prefix reporting, (Proc. 23rd SODA (2012)), 583-592 · Zbl 1422.68049
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.