Have a personal or library account? Click to login
Performance Estimation of Low Power and Area-Efficient Parallel Pipelined FFT Cover

Performance Estimation of Low Power and Area-Efficient Parallel Pipelined FFT

Open Access
|Jun 2025

Figures & Tables

Fig. 1.

FFT architecture with BI scheme.
FFT architecture with BI scheme.

Fig. 2.

Encoder module.
Encoder module.

Fig. 3.

A 3×3 multiplication using a Vedic multiplier.
A 3×3 multiplication using a Vedic multiplier.

Fig. 4.

Simulation waveform – BI scheme.
Simulation waveform – BI scheme.

Fig. 5.

Performance analysis of BIM-FFT for different FPGA families.
Performance analysis of BIM-FFT for different FPGA families.

Fig. 6.

Delay-FFT – Delay analysis for different FPGA family.
Delay-FFT – Delay analysis for different FPGA family.

Fig. 7.

Physical design of parallel and pipelined CTS-BIS FFT.
Physical design of parallel and pipelined CTS-BIS FFT.

j_msr-2025-0016_tab_008

A = 101 – 1 > 0 = 010,
B = 111 – 1 > 0 = 000

S0 = a0b0=> 0 × 0 = 1(CS = 0)
C1S1 = a0b1 + a1b0=> 0 × 0 + 1 × 0 = 01(CS = 1)
C2S2 = 0 + a1b1 + a0b2 + a2b0=> 1 + 1 × 0 + 0 × 0 + 0 × 0 = 01(CS = 1)
C3S3 = 0 + a1b2 + a2b1=> 1 + 1 × 0 + 0 × 0 = 01(CS = 1)
C4S4 = C3 + a2b2=> 0 + 0 × 0 = 0(CS = 0)

Total computational cost = 0+1+1+1+0 = 3

Performance analysis of the proposed BIM-FFT_

FamilyRegistersLUT’sSlices
Spartan-6 - xc6slx91179/114401126/5720475/1430
Virtex-4 - xc4vsx551246/491521618/491521187/24576
Virtex-5 - xc5vsx50t1239/326401548/32640710/8160
Virtex-6 - xc6vlx75t106/93129266/4656095/11640
Zynq - xc7z010106/35200265/17,60085/4400

j_msr-2025-0016_tab_006

A=a2a1a0,
B=b2b1b0

S0 = a0b0
C1S1 = a0b1 + a1b0
C2S2 = C1 + a1b1 + a0b2 + a2b0
C3S3 = C2 + a1b2 + a2b1
C4S4 = C3 + a2b2

Performance analysis of the proposed DPR-FFT architecture with frequency and delay_

FamilyFrequency [MHz]Delay [ns]Power [mW]
Spartan-6 - xc6slx9335.7682.9764.15
Virtex-4 - xc4vsx55190.815.24176.49
Virtex-5 - xc5vsx50t188.795.29775.38
Virtex-6 - xc6vlx75t246.774.05279.01
Zynq - xc7z010261.553.82364.23

j_msr-2025-0016_tab_007

Let
A = 101,
B = 111

S0 = a0b0=> 1 × 1 = 1(CS = 1)
C1S1 = a0b1 + a1b0=> 1 × 1 + 1 × 1 = 10(CS = 2)
C2S2 = C1 + a1b1 + a0b2 + a2b0=> 1 + 0 × 1 + 1 × 1 + 1 × 1 = 11(CS = 3)
C3S3 = C2 + a1b2 + a2b1=> 1 + 0 × 1 + 1 × 1 = 10(CS = 2)
C4S4 = C3 + a2b2=> 0 + 1×1 = 01(CS = 1)

where CS – computational cost
Total computational cost = 1+2+3+2+1 = 9

Performance comparison for FFT_

DeviceMethodologyLUTsFlip-flopsSlicesFrequency [MHz]Delay [ns]Power [mW]
ASAP 7 nmR2MDC1268674164005105.2342.835128
Proposed954872393634188.7901.91475.38

Different inversion schemes_

Input PatternInversion schemeOutput Pattern
11111110‘1’s – 7 / ‘0’s – 1 (1>0) Full inversion00000001
10101001‘1’s – 4 / ‘0’s – 4 (1=0) Odd inversion00000011
01010110‘1’s – 4 / ‘0’s – 4 (1=0) Even inversion00000011
00000001‘1’s – 1 / ‘0’s – 7 (0>1) No inversion00000001

High speed multipliers comparison_

MultiplierPower [mW]Area (LUT)Delay [ns]FFT suitabilityUtilization
Booth2.3515012.5ModerateModerate area and speed
Wallace tree2.1018010.3HighHigh-speed, but complex logic increases area
Dadda2.251609.8HighSimilar to Wallace
Baugh-Wooley2.4017011.0ModerateBest for 2’s complement
Approximate LOA1.501208.5MediumPower-efficient, may introduce slight computational error
Language: English
Page range: 134 - 140
Submitted on: Mar 18, 2024
|
Accepted on: May 7, 2025
|
Published on: Jun 17, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2025 P Surya, C Arunachalaperumal, S Dhilipkumar, published by Slovak Academy of Sciences, Institute of Measurement Science
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.