Abstract
The throughput of a parallel execution of a DSP algorithm is limited by the iteration bound, which is the minimum period between the starts of consecutive iterations. It is given byT i∞=max (T i/D i), whereT i andD i are the total time of operations and the number of delays in loopi, respectively. The execution throughput of a DSP algorithm can be increased by reducing theT is, and this reduction can be realized by taking as many operations as possible out of loops without changing the semantic of the calculation. Since many DSP algorithms extensively use the four basic arithmetic operations, a simple and effective way of doing this reduction is to apply commutativity, associativity and distributivity on these operations. This paper presents an optimization technique, calledLoop Shrinking, which reduces the iteration bound by using the above method. Loop Shrinking is based on a heuristic method which is time-efficient for simple cases but can also tackle complex examples. An implementation of Loop Shrinking is presented in this article. The results show that it can yield a reduction in the iteration bound near or equal to careful hand-tuning.
Similar content being viewed by others
References
A.V. Aho, R. Sethi and J.D. Ullman, Compilers: Principles, Techniques and Tools, Reading, MA: Addison-Wesley, 1988.
G. De Michelli and D.C. Ku, “HERCULES—a system for high-level synthesis,”Proc. of the 25th ACM/IEEE Design Automation Conference, 1988, pp. 483–488.
T. Tanaka, T. Kobayashi and O. Karatsu, “HARP: fortran to silicon,”IEEE Transactions on Computer-Aided Design, vol. 8, 1989. pp. 649–660.
S.H. Lee and T.P. Barnwell III, “Optimal multiprocessor implementations from a serial algorithm specification,”Proc. ICASSP-88, vol. 3, 1988, pp. 1694–1697.
J. Bhasker and H. Lee, “An optimizer for hardware synthesis,”IEEE Design and Test of Computers, vol. 7, 1990, pp. 20–36.
R. Camposano and W. Rosenstiel, “Synthesizing circuits from behavioral descriptions,”IEEE Transactions on Computer-Aided Design, vol. 8, 1989, pp. 171–180.
H. Trickey, “Flamel: A high-level hardware compiler,”IEEE Transactions on Computer-Aided Design, vol. CAD-6, 1987, pp. 259–269.
W. Rosenstiel,Optimizations in High Level Synthesis, Microprocessing and Microprogramming (18), North Holland, 1986, pp. 347–352.
R. A. Walker and D.E. Thomas, “Design representation and transformation in The System Architect's Workbench,”Proc. of ICCAD-87, 1987, pp. 166–169.
M.C. McFarland, A.C. Parker and R. Camposano, “The high-level synthesis of digital systems,”Proceedings of the IEEE, vol. 78, 1990, pp. 301–317.
K.K. Parhi and D.G. Messesrschmitt, “Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding,”IEEE Transactions on Computers, vol. 40, 1991, pp. 178–195.
D.A. Schwartz and T.P. Barnwell III, “Cyclo-static multiprocessor scheduling for the optimal realization of shift-invariant flow graphs,”Proc. ICASSP-85, 1985, pp. 1384–1387.
M. Potkonjak and J. Rabaey, “Optimizing resource utilization using transformations,”Proc. of ICCAD-91, pp. 88–91.
R. Hartley and A. Casavant, “Tree-height minimization in pipelined architectures,”Proc. of ICCAD-89, pp. 112–115.
Y. Muraoka,Parallelism Exposure and Exploitation in Programs, Ph.D. Dissertation, Univ. of Illinois at Urbana-Champaign, Dept. of Comp. Science, 1971.
S. Kung, “On supercomputing with systolic/wavefront array processors,”Proceedings of the IEEE, vol. 72, 1984, pp. 867–884.
P.H. Winston,Artificial Intelligence, Reading, MA: Addison-Wesley, 1984.
Y. Miyanaga, Y. Yokoyama and K. Tochinai, “Automatic design system of parallel/pipelined VLSI architecture for adaptive signal processing,”Proceedings of the ISMM International Conference on Parallel and Distributed Computing, and Systems, 1990, pp. 24–28.
M. Tatibana, Y. Miyanaga and K. Tochinai, “Automatic design system of parallel VLSI architectures using neural network,”Proceedings of the Fourth Japanese-Sino Sapporo International Conference on Computer Applications, pp. 187–190.
H. Forren and D.A. Schwartz, “Transforming periodic synchronous multiprocessor programs,”Proc. ICASSP-S7, 1987, pp. 1406–1409.
Y. Miyanaga, N. Nagai and K. Nagata, “Parallel procesing methods for parametric modeling of stochastic signals,”Transactions of IEICE, vol. J 70-A, pp. 1395–1405.
K. Ito and H. Kunieda, “VLSI system compiler for digital signal processing: modularization and synchronization,”IEEE Transactions on Circuits and Systems, vol. 38, 1991, pp. 423–433.
S.Y. Kung, H.J. Whitehouse and T. Kaliath,VLSI and Modern Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1985, pp. 258–264.
F.B. Maciel, Y. Miyanaga and K. Tochinai, “Two optimization techniques for high-level synthesis,” Proceedings of the 1991 Joint Convention, The Hokkaido Chapters of the IEICE, 1991, p. 314.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Maciel, F.B., Miyanaga, Y. & Tochinai, K. An optimization technique for lowering the iteration bound of DSP programs. J VLSI Sign Process Syst Sign Image Video Technol 5, 273–282 (1993). https://doi.org/10.1007/BF01581301
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01581301