Abstract
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also introduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.
Similar content being viewed by others
References
Cardoso J M P. Dynamic loop pipelining in data-driven architectures. In: Proceedings of the 2nd conference on Computing frontiers. New York: ACM, 2005. 106–115
Baumgarte V, Ehlers G, May F, et al. PACT XPP - a selfreconfigurable data processing architecture. J Supercomput, 2003, 26(2): 167–184
Mei B, Vernalde S, Verkest D, et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In: Proceedings of Design, Automation, and Test in Europe. Washington: IEEE, 2003. 10296–10301
Barat F, Jayapala M, Op de Beeck P. Software pipelining for coarse-grained reconfigurable instruction set processors. In: Pr-oceedings of the 2002 Conference on Asia South Pacific Design Automation/VLSI Design. Washington: IEEE, 2002. 338–344
Hauser J R, Wawrzynek J. Garp: a MIPS processor with a reconfigurable coprocessor. In: Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines. Washington: IEEE, 1997. 16–21
Rau B, Iterative modulo scheduling: an algorithm for software pipelining loops. In: Proceedings of the ACM MICRO-27. New York: ACM. 63–74
Lee M H, Singh H, Lu G, et al. Design and implementation of the MorphoSys reconfigurable computing processor. J VLSI Signal Process Syst Signal Image Video Technol, 2000, 24: 147–164
Dennis J B and Gao G R. An efficient pipelined dataflow processor architecture. In: Proceedings of Supercomputing. Los Alamitos: IEEE, 1988. 363–373
Arvind, Nikhil R S. Executing a program on the MIT taggedtoken dataflow architecture. IEEE Trans Comput, 1990, 39(3): 300–318
Iannucci R A. Toward a dataflow/von Neumann hybrid architecture. In: Proceedings of ISCA-15, 1998. 131–140
Kahn G. The semantics of a simple language for parallel programming. In: Proceedings of the IFIP Congress, 1974. 471–475
Budiu M. Spatial Computation. CMU CS Technical Report, CMU-CS-03-217, 2003
Carr S, Kennedy K. Scalar replacement in the presence of conditional control flow. Softw Pract Exper, 1994, 24(1): 51–77
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (Grant No. 60633050, 60621003) and the National High Technology Research and Development Program of China (Grant No. 2007AA01Z06)
Rights and permissions
About this article
Cite this article
Dou, Y., Wu, G., Xu, J. et al. A coarse-grained reconfigurable computing architecture with loop self-pipelining. Sci. China Ser. F-Inf. Sci. 52, 575–587 (2009). https://doi.org/10.1007/s11432-008-0146-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-008-0146-6