×

Reciprocal and reciprocal square root units with operand modification and multiplication. (English) Zbl 1103.68306

Summary: Reciprocals and reciprocal square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents high-speed methods for computing reciprocals and reciprocal square roots. These methods use a table lookup, operand modification, and multiplication to obtain an initial approximation. This is followed by a modified Newton-Raphson iteration, which improves the accuracy of the initial approximation. The initial approximation and Newton-Raphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. The application of these methods is illustrated through the design of reciprocal and reciprocal square root units for operands in the IEEE single precision format. These designs are pipelined to produce a new result every clock cycle.

MSC:

68M07 Mathematical problems of computer architecture
Full Text: DOI

References:

[1] H. Kwan, R.L. Nelson, and E.E. Swartzlander, Jr., ”Cascaded Implementation of an Iterative Inverse-Square-Root Algorithm with Overflow Lookahead,” in Proceedings of the 12th Symposium on Computer Arithmetic, 1995, pp. 114–123.
[2] T.R. Halfhill, ”Beyond MMX x86 Extensions,” Byte, vol. 22, no. 12, 1997, pp. 87–92.
[3] V.K. Jain, G.E. Perez, and J.M. Wills, ”Novel Reciprocal and Square-Root VLSI Cell: Architecture and Application to Signal Processing,” in International Conference on Acoustics, Speech and Signal Processing, vol. 2, 1991, pp. 1201–1204.
[4] P. Soderquist and M. Leeser, ”Area and Performance Tradeoffs in Floating-Point Divide and Square Root Implementations,” ACM Computing Surveys, vol. 28, no. 3, 1996, pp. 518–564. · doi:10.1145/243439.243481
[5] AltiVec Technology Programming Environments Manual. Motorola, Inc., 1998.
[6] S. Oberman, F. Weber, N. Juffa, and G. Favor, ”AMD 3DNow! Technology and the K6-2 Microprocessor,” in Proceedings of Hot Chips 10, 1998, pp. 245–254.
[7] P. Montuschi and M. Messalama, ”Optimal Absolute Error Starting Value for Newton-Raphson Calculation of Square Root,” Computing, vol. 46, 1991, pp. 67–86. · Zbl 0719.65041 · doi:10.1007/BF02239012
[8] I. Koren, Computer Arithmetic Algorithms. Prentice Hall, 1993. · Zbl 0994.68186
[9] M.D. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, 1994. · Zbl 0806.68005
[10] S. Majerski, ”Square-Rooting Algorithms for High-Speed Digital Circuits,” IEEE Transactions on Computers, vol. C-34, 1985, pp. 724–733. · Zbl 0565.68040 · doi:10.1109/TC.1985.1676618
[11] M.D. Ercegovac and T. Lang, ”Radix-4 Square Root Without Initial PLA,” IEEE Transactions on Computers, vol. C-39, 1990, pp. 1016–1024. · doi:10.1109/12.57040
[12] V.K. Jain and L. Lin, ”Square-Root, Reciprocal, Sine/Cosine, Arctangent Cell for Signal and Image Processing,” in International Conference on Acoustics, Speech and Signal Processing, vol. 2, 1994, pp. 521–524.
[13] J. Cao and B. Wei, ”High-Performance Hardware for Function Generation,” in Proceedings of the 13th Symposium on Computer Arithmetic, 1997, pp. 184–189.
[14] M.J. Schulte and J.E.E. Swartzlander, ”Exact Rounding of Certain Elementary Functions,” in Proceedings of the 11th Symposium on Computer Arithmetic, 1993, pp. 138–145.
[15] N. Takagi, ”Generating a Power of an Operand by a Table Look-Up and a Multiplication,” in Proceedings of the 13th Symposium on Computer Arithmetic, 1997, pp. 126–131.
[16] IEEE Standard 754 for Binary Floating Point Arithmetic. IEEE, 1985.
[17] M.J. Schulte, J.E. Stine, and K.E. Wires, ”High-Speed Reciprocal Approximations,” in Proceedings of the Thirty First Asilomar Conference on Signals, Circuits and Systems, 1998, pp. 1178–1182.
[18] M.J. Schulte and K.E. Wires, ”High-Speed Inverse Square Roots,” in Proceedings of the 14th IEEE Symposium on Computer Arithmetic, IEEE Computer Society Press, 1999, pp. 124–131.
[19] M.J. Schulte and E.E. Swartzlander, Jr., ”Truncated Multiplication with Correction Constant,” in VLSI Signal Processing, VI, 1993, pp. 388–396.
[20] E.J. King and E.E. Swartzlander, Jr., ”Data-Dependent Truncated Scheme for Parallel Multiplication,” in Proceedings of the Thirty First Asilomar Conference on Signals, Circuits and Systems, 1998, pp. 1178–1182.
[21] M.J. Schulte, J.G. Jansen, and J.E. Stine, ”Reduced Power Dissipation Through Truncated Multiplication,” in IEEE Alessandro Volta Memorial International Workshop on Low Power Design, 1999.
[22] L. Dadda, ”Some Schemes for Parallel Multipliers,” Alta Frequenza, vol. 34, 1965, pp. 349–356.
[23] T.C. Chen, ”A Binary Multiplication Scheme Based on Squaring,” IEEE Transactions on Computers, vol. C-20, 1971, pp. 678–680. · Zbl 0217.53605 · doi:10.1109/T-C.1971.223325
[24] R.H. Strandberg, L.G. Bustamante, V.G. Oklobdzija, M. Soderstrand, and J.C. LeDuc, ”Efficient Realizations of Squaring Circuit and Reciprocal used in Adaptive Sample Rate Notch Filters,” Journal of VLSI Signal Processing, vol. 14, no. 3, 1996, pp. 303–309. · doi:10.1007/BF00929623
[25] K.E. Wires, M.J. Schulte, L.P. Marquette, and P.I. Balzola, ”Combined Unsigned and Two’s Complement Squarers,” in Conference Record of the Thirty-Third Asilomar Conference on Signals, Circuits, and Systems, IEEE Press, 1999, pp. 1215–1219.
[26] K. Bickerstaff, M.J. Schulte, and E.E. Swartzlander, Jr, ”Parallel Reduced Area Multipliers,” Journal of VLSI Signal Processing, vol. 9, 1995, pp. 181–192. · doi:10.1007/BF02407084
[27] M.J. Flynn, ”On Division by Functional Iteration,” IEEE Transactions on Computers, vol. C-19, 1970, pp. 702–706. · Zbl 0199.51705 · doi:10.1109/T-C.1970.223019
[28] W. James and P. Jaratt, ”The Generation of Square Roots on a Computer With Rapid Multiplication Compared with Division,” Mathematics of Computation, vol. 19, 1965, pp. 497–500. · Zbl 0133.39607 · doi:10.1090/S0025-5718-1965-0178587-3
[29] C.V. Ramamoorthy and J. Goodman, ”Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication,” IEEE Transactions on Computers, vol. C-21, 1972, pp. 837–847. · Zbl 0235.68021 · doi:10.1109/TC.1972.5009039
[30] M.J. Schulte and E.E. Swartzlander, Jr, ”Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Transactions on Computers, vol. C-44, 1994, pp. 964–973. · Zbl 1073.68504 · doi:10.1109/12.295858
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.