Simple yet efficient parallel signed multiplier design using
radix-8 structure
N.V.V.K. Boppana and S. Ren
The continued quest for finding a low-power and high-performance
hardware algorithm for signed number multiplication led to designing a
simple and novel radix-8 signed number multiplier with 3-bit grouping
and partial product reduction performed using magnitudes of the
multiplicand and the multiplier. The pre-computation stage constitutes
magnitude calculation and non-trivial computations required to generate
partial products. A new partial product reduction strategy is deployed
in the design to improve the speed with low cost. 8 X 8, 16 X 16, 32 X
32, and 64 X 64 designs are presented for the proposed architectures.
Performance results include area, power, delay, and power-delay-product
of synthesized and post-layout designs using 32 nm CMOS technology with
1.05 V supply voltage.
Introduction: Multiplication is the most used computer arithmetic
after addition and subtraction. DSPs use multipliers for frequently used
computationally intensive applications such as filtering, convolution,
fast Fourier transform (FFT), audio/video codecs etc. High performance
computer hardware, CPUs, and graphical processing units (GPU), for
scientific computing rely majorly on use of these fundamental digital
arithmetic. Digital signal processors (DSP) spend most of the time
multiplying and require significant chip area for multipliers to meet
performance requirements. Multipliers often are a dominant factor in
critical path delay which in turn effects the throughput in case of
pipelined designs while consuming significant power in applications such
as multimedia and DSP. Demand for low power consuming portable computing
and communication devices such as smart watches, Internet of Things
(IoT) devices, mobile phones, laptops, PCs etc., comprise of signal
processing algorithms and other multiplication intense algorithms, has
been increasing.
Modified Booth Multiplier: The conventional Booth multiplier
algorithm of radix-4 structure employing Booth encoding scheme reduces
the number of partial products (PP) to half and hence reduces
computational latency (D ), design area (A ), and overall
power consumption (P ). Whereas the modified Booth multiplier
design of size 8 x 8 is presented in [1] with the minimized Booth
encoder along with the replacement of adder/subtractor block with 9-bit
wide 2:1 multiplexer (MUX) in the first stage and replacing the
full-length adder/subtractor and MUX blocks with 9-bit wide
blocks in later stages to obtain the low power-delay product(PDP). The Booth multiplier was further modified/optimized in [2] to
achieve low-cost and high-performance by further optimizing theBinary two’s complement (B2C) and the Booth encoder and by
deploying the parallel addition to reduce the 3-stage PP reduction to
2-stages.
Proposed radix-8 based multiplier: A new, simple yet efficient,
radix-8 structure based parallel signed multiplier is presented in this
letter which is designed, synthesized, and assessed for performance
using 32nm CMOS technology at 1.05V supply voltage. The design of the
proposed radix-8 architecture is inspired by the low-cost 64-bit digital
comparator design presented in [3], with the redundant computations
taken out and performed at the beginning in the form of XOR-XNORto reduce P , D , and A . In the proposed
architecture, the magnitudes of the multiplicand (A) andmultiplier (B) are computed resulting inAp (or X ) and Bprespectively. The redundant non-trivial (NT) operations are computed in
the precomputation stage, shown in Fig. 1, to reduce the number of PPs
and thereby to reduce the computations, additions, required for the PP
reduction. This can be achieved by grouping 3-bits from LSB side towards
MSB side and by taking out the non-trivial computations from the PPs
which are combination of both trivial and non-trivial values. As shown
in Table 1, trivial computations are just shifters or doing nothing;
non-trivial computations include computing 3X (P), 5X (Q),
and 7X (R) in the precomputation stage. The fourth NTvalue, 6X , generation does not need any separate computation but
can be generated by left shifting the 3X value from NT
block by one-bit position. It can be seen that for higher multiplier
sizes, the proposed design is progressively more efficient compared to
reported state-of-the-art modified Booth multiplier designs.