

International Research Journal of Modernization in Engineering Technology and Science Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

# A REVIEW ON IMPLEMENTATION OF FFT USING RTL

# Chandan Patel M \*1, Prahallada Raja Kg\*2, Pramodh R\*3, Ramkumar R\*4, Dr. Suchitra M\*5

\*1,2,3,4Research Scholar, Department Of Electronics And Communication Engineering, Vidyavardhaka College Of Engineering, Mysore, Karnataka, India. \*5Professor, Department Of Electronics And Communication Engineering, Vidyavardhaka College Of Engineering, Mysore, Karnataka, India.

# **ABSTRACT**

In Cooley–Tukey algorithm the Radix-2 decimation in-time Fast Fourier Transform is the least complex structure. DFT is utilized to change over a time period to frequency. The Fast Fourier Transform is generally utilized in Digital signal processing algorithms. FFT calculations utilizes numerous applications for instance, OFDM, Noise decrease, Digital sound and video broadcasting, picture handling and so on. FFT is an extravagance activity as far as MAC. To accomplish the FFT computation with numerous focuses and with the greatest number of tests, the MAC prerequisites can't be met by effective hardware like DSP. A decent arrangement is to utilize a processor with committed hardware to accomplish effective FFT work at a high example rate while the DSP could do it in less focused pieces of the handling. It is utilized to configure butterflies for various point FFT. This paper gives the writing review of design and power measurement of 16/32 point FFT by utilizing Verilog HDL. Simulation and synthesis of design is done using cadence tools.

Keywords: Cooley-Turkey, Radix-2, Fft, Dft, Dsp, Butterflies, Simulation, Synthesis, Cadence.

#### I. INTRODUCTION

There are a few specific approaches to calculate Discrete Fourier Transform (DFT) via way of methods for first fixing synchronous straight conditions or the connection strategy. second, utilizing the Fast Fourier Transform (FFT). In the event that it gives a similar outcome as the other methodology, it is incredibly more proficient and diminishes calculation time by hundreds. Without FFT, different methods depicted would not be down to earth. FFT just requires a few lines of code. It is one of the predominantly convoluted strategies in DSP. FFT are utilized in numerous fields like interchanges, picture preparing, radar signal handling, sonar, biomedicine and so forth Direct execution of DFT has O(N2) intricacy, which decreases to O(Nlog<sub>2</sub>N) finished FFT. The Cooley-Tukey calculation, regularly known as Fast Fourier Transform, is an assortment of calculations for quicker control of the DFT. The essential significance of DFT in reasonable requests is toward a great scope because of the presence of computationally effective calculations known as Fast Fourier Transform(FFT) to check the DFT. There are various segregated FFT calculations that range a wide scope of figured DFTs, from effectively complex math to set and analytics hypothesis.

# II. METHODOLOGY

# 2.1. ASIC DESIGN FLOW

ASIC are silicon chips that have been created for explicit applications. As such, it is a chip intended to play out a particular activity instead of universally useful coordinated circuits. An ASIC is certifiably not software programmable to play out some other assignment. Their estimations have made it favourable to design ASIC processors and do a broad assessment of their limits. The specific joined circuit (ASIC) plan stream gives the versatility to show hardware accelerators that can transform into the piece of the chip. In the ASIC stream, there are various cycles like sound or genuine association, static time examination, formal check, and plan for testability that gives nuances on area, power, and time information, dismisses the ways expected to design the block so it might be facilitated into the chip alongside various blocks. In regard to this the paper provides an insight to the brief literature survey of the proposed work, their working with respect to Synopsys's EDA tools and the dependency between these tools.



#### 2.2. RADIX-2 FAST FOURIER TRANSFORM

$$X(k) = \sum_{m=0}^{(N/2)-1} x(2m)W_N^{2mk} + \sum_{m=0}^{(N/2)-1} x(2m+1)W_N^{(2m+1)k}$$

Radix-2 Fast Fourier Transform. Radix-2 calculations remain the least complex FFT calculations. The radix-2 FFT obliteration on schedule (DIT) separates the DFT computation into odd and surprisingly recorded yields. All container remains determined by more limited distance DFT's of various blends of info tests. This is shown in equation (4), with the chief part of the comparison.

(1) samples. Let us consider the computation of the N = 2v point DFT by means of the divide-and-conquer approach. We divide the data sequence of N points into two data sequences of N / 2 points F1 (k) and F2 (k) which correspond to the odd and even samples of X (n),

Since F1(k) and F2(k) are periodic, with period N/2, we have F1(k+(N/2)) = F1(k) and F2(k+(N/2)) = F2(k). In addition, the factor  $Wn^{\left(K+\frac{N}{2}\right)} = -Wn^{K}$ . Hence (2) can be expressed as.

$$X(k) = F1(k) + Wn^{K}F2(k), k = 0,1,..., N/2-1(2)$$

$$X(k+N/2) = F1(k) - Wn^k F2(k), k = 0,1, ..., N/2 - 1(3)$$

The decimation of the data sequence can be reduced to one-point sequences [6, 8]. In order to be consistent with the common notation, we define:

$$G1(k) = F1(k), k = 0,1, ..., N/2-1(4)$$

$$G2(k) = Wn^k F2(k), k = 0,1, ..., N/2 - 1(5)$$

Then the DFT X(k) may be expressed as:

$$X(k) = G1(k) + G2(k), k = 0,1, ..., N/2-1$$
 (6)

$$X(k+N/2) = G1(k) - G2(k), k = 0,1, ..., N/2 - 1(7)$$

Observe that the basic computational method is to take two complex numbers, i.e., the pair (a, b), multiply "b" by WN, and then add and subtract the product from "a" to form two new complex numbers (A, B). This basic computation, shown in Fig. 1, is called a butterfly. The N-point DFT computation through DIT FFT algorithm requires  $(N/2) \log 2N$  complex multiplications and  $N\log 2N$  complex additions.



Fig 1: Basic butterfly in DIT FFT algorithm

#### 2.3. THE 32 POINT FFT

Below Figure shows the 32-point DIF FFT equipment design, which incorporate five phases. In each stage it covers a Radix-2 butterfly engineering with a few registers, the quantity of which relies upon the quantity of information signals. Between each and every other stage, the fidget issues remain multiplexed to the yield coming after the past period.



**Figure 2:** The Five stages of 32-point FFT architecture



International Research Journal of Modernization in Engineering Technology and Science Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

Take the main stage for instance, which measures two periods of the estimations. In the essential stage, the fundamental 16 info flags, that is the essential 1/2 of the information signs, might be moved and saved in the register 0 to enroll 15. In the subsequent stage, the leftover 16 information signs are sent to the main stage. The first information signal is done with the seventeenth info signal on the Radix-2 butterfly by expansion or deduction. The consequence of the deduction is then increased by the fidget factor. The aftereffects of the expansion are saved and hold on to be deducted in the following stage.

#### 2.4. VERILOG IMPLEMENTATION

Verilog HDL is a widespread language for hardware class with the aid of using IEEE. This is extensively received inside the plan of incorporated computerized circuits. Verilog is intended for use in recreation check, timing examination, confirmation investigation, and rationale amalgamation. Verilog HDL empowers engineers to plan at different degrees of reflection like register move level, door level and switch level. Verilog is utilized as contribution to blend programs that create an outline of the door level circuit. We utilize the Cadence Genus instrument to run the Verilog code and that we get the different reports with respect to the predefined region, force, time and number of entryways.

#### 2.5. THE PROPOSED DESIGN FLOW

The flow of the VLSI design from RTL to logic synthesis then generation of the gate-level netlist is shown in Figure III. The RTL stage consists of blocks for performing FFT written in Verilog code. The flow of the VLSI design from RTL to logic synthesis then generation of the gate-level netlist is shown in Figure III.

The RTL stage consists of blocks for performing FFT written in Verilog code. The behavioural and style function is defined in Hardware Description Language (HDL) form during this stage. The RTL consists of the main block for FFT and many other sub-blocks. All these sub-blocks are instantiated within the main block. The next stage being logic synthesis, which is a method of translating the Verilog codes of RTL into optimized netlist generation at the gate level.



Figure 3: ASIC Design Flow

**In Logic Level Optimization** structuring and flattening. The constituting procedure is constraint-based and is finest practical to non-critical judgment pathways. It complements middle variables and logic assemblies to a project which consequences in a slighter zone.

**In Gate level Optimization** the representing development usages entrances since mark skill public library to make a gate-level application of the enterprise to encounter judgment and part supplies.

**In Compilation Techniques** the RTL project container be amassed in two traditions in the mixture procedure and remain Top-Down accumulate and Bottom-Up accumulate. The Top-Down method be able to be recycled for a project that fixes not comprise remembrances.

The project contains of combinational logic pinecones to which the important facts remain bounded. It is essential to bring into line the pinecones of logic to compare two designs finished the procedure of key argument planning.

**In Mapping**, altogether the important points of golden design remain balancing through their complements in the studied design so that they container be likened later. [4].

**In Fabric Creation and Design Mapping flows** the backend stream is an altered ASIC stream. We partitioned the backend stream into two sub-flows.[5].



International Research Journal of Modernization in Engineering Technology and Science Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

Dynamic force can be diminished from multiple points of view including rationale union and rebuilding to lessen exchanging action, door measuring, innovation planning, retiming, voltage scaling, etc. Likewise, the employments of double inventory. Analyzing edge entryways as rationale natives comes from the way that they are computationally more remarkable than the norm AND/OR rationale natives. A limit capacity can be carried out similarly as any rationale work, i.e., as an organization of rationale natives or a draw up organization and pull-down. [6].

# III. LITERATURE SURVEY

In this paper [1], "High-Performance Low-Power FFT Cores", the office usage of IC's consumes remained drawing in expanding consideration Many innovations have been investigated towards recover the energy effectiveness of computerized sign preparing units, like Fast Fourier Transform (FFT) mainframes, which remain generally utilized in customary examination fields like satellite television interchanges and blossoming arenas of shopper hardware, like remote correspondences. Productive FFT center dependent on equal design. Different blends of low-power crossover innovations are utilized to decrease power utilization, e.g., Multiplier-less units that supplant complex multipliers in FFTs, low force switches dependent on cutting edge interconnections, and equal pipeline structures. A few FFT centres are carried out and assessed to decide their force and region execution

In this study [2] on," Design of Parallel FFT Architecture Using Cooley Tukey Algorithm" This paper addresses, an equal FFT engineering is future to offer an effective outturn and fewer vigour utilization through the assistance of Cooley Tukey calculation for radix 8. During this calculation the DFT of N size is parted interested in more limited sizes of N/2 and rehashed until last DFT units remain originate. Separation the DFT into an even record and an odd file term. The calculation time is given by the precharacterized recipe (Nlog2(N)) is diminished by the use of equal design. It assists with performing number of tasks at the same time. As fewer period is obligatory, the vigour productivity is expanded. The point of this paper is to see throughput and proficiency utilizing Cooley Tukey calculation for bigger radix. The new patterns of this calculation is advancement of FPGA in light of the fact that it can do sign preparing errands in equal, perform cylinder assembly likewise as accelerate the calculation of unvaried calculations. the most benefit of Cooley Tukey calculation is that it decreases number juggling estimations likewise as quick calculation. As this calculation isolates the DFT into more modest ones and it are frequently joined with different calculations at same time.

In this paper [3] radix-2 algorithm is utilized for FFT execution. Likewise, they future Pipe engineering is moreover alluded to as fell FFT design during which every period has its own handling component. The proposed

design capacities at secure and variable emphases FFT computer and bounces rapid, adaptability and extensibility. It upholds any N-point FFT. This paper future after 16-highlight 2048-point. It satisfies the need of different remote principles. ROM is utilized aimed at the putting away the fidget factors. The necessary clock cycles for the 64-point FFT processor is 254, which is very huge. The time needed for the calculation of processor is less, on account of radix-2 algorithm.

In paper [4] comparison of 32-point and 64-point FFT is completed by consuming radix-2 algorithm. Here the author used the Decimation-In-Time (DIT) algorithm for Radix-2. Here the separation of FFT facts is completed in such how that initially points are divided into 2, then into 4, then into 8 then 16 and so on. Because of this division of the FFT points, the number of stages required to calculate the butterflies is greater. Hence delay required is more. Therefore, delay required just in case of 32-point FFT is 31.522ns and therefore the delay required just in case of 64-point FFT is 30.412ns. It is found that 32-point require 5 stages and therefore the 64-point requires 6-stages. Hence delay required just in case of 64-point is more. This paper [5] on, Plan and Simulation of 32-Point FFT Using Radix-2 Algorithm for FPGA Implementation proposed on Fast Fourier Transform (FFT) is one of the important activities in arena of computerized sign and picture preparing. A portion of the significant uses of the quick Fourier change are Signal breaking down, Sound stressing, Data pressure, Partial differential conditions, Multiplication of enormous numbers, Image separating and so forth Quick Fourier change (FFT) is a proficient method to track down the discrete Fourier change (DFT). This article centres around the advancement of the Fast Fourier Transform (FFT), which depends on the



# International Research Journal of Modernization in Engineering Technology and Science Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

Radix-2 DIT (Decimation-In-Time) calculation and utilizations VHDL as the plan element language. The blend is completed with Xilinx. Device. The extensive outcomes show that, as far as speed, the estimation used to ascertain the 32-point FFT is viable. The proposed FFT module of 32 point is been mimicked and integrated. The RTL block is gotten for the obliteration in time space radix - 2 Fast Fourier change calculation. In the wake of mimicking a 32-point FFT block, the RTL perspective on the butterfly structure obtained.

This Paper [6] "Design of Fast Fourier Transform using Processing Element for Real Valued Signal" future an construction for inverse quick Fourier change (IFFT) calculation for genuine qualities. The future strategy depends on adjusted radix-2 calculation, which eliminates the superfluous activities after the rationale stream. The new stream diagram will have just genuine information ways instead of complex information ways in an ordinary stream chart. Another handling component (PE) is proposed which will have two radix-2 butterflies that can figure four sources of info flags at the same time. Another basic and well-disposed memory-addressing plan is given to guarantee the ceaseless activity of the FFT computer. The speaking plan is additionally used to help many equal preparing components. As they proposed PE figures the four equal information sources that limit calculation cycles and speed up contrasted with past work. The quantity of calculation cycles is diminished by expanding the quantity of PEs. As pointless tasks are eliminated, the expense is likewise decreased.

This paper [7] contains, "Design and Implementation of Parallel FFT on CUDA".

The parallel Divide and Conquer technique are used in the Fast Fourier Transform (FFT) technique, which is us ed in image processing and scientific computing. To get higher performance, they have broken the Compute Uni fied Device Architecture (CUDA) technology and contemporary graphics processing units (GPUs). They worked on two features of the standard FFT technique to improve it: multithreaded parallelism and reminiscence hiera rchy. They've also provided parallelism improvisation strategies for when data volume increases, as well as anti cipated what would happen if the data volume grows any more. The consequences show that the Parallel FFT al gorithm is additional efficient than the traditional FFT approach.

In this article [8] theFor the FFT application, a modified single-path postponement response construction was presented. A mixed decimation-in-frequency (DIF) or decimation-in-time (DIT) FFT procedure is calculated with this function. Because the DITFFT algorithm calculates the last stage and the DIFFFT algorithm calculates the other stages, the input and output data appear in the regular order, and no external clock is necessary to change the input or output order. In terms of performance, latency, and hardware complexity, this design is compared to the radix-4 DIF SDF and radix-4 multipath delay commutator (MDC) architectures using a 64-point FFT.

As a result, the suggested design has significantly less hardware complexity than the radix4 MDC while maintaining the same output and latency, and it achieves a significantly lower latency than the radix4 SDF architecture with rational hardware complexity enhancement.

In this paper [9], this component is a multi-mode, memory-based Fast Fourier Transform (FFT) processor for a clinical structure expected for Fourier optical clarity tomography that sponsorships multi-input far off features for even repeat division multiplexing and various yields. The given FFT processor allows the utilization of 2-move 4096/2048/1024-control FFTs and 1-toward 4-move 128/64-point FFTs for FD-OCT and OFDM applications, correspondingly. Using viable four-bank single-port SRAM working in four-word data width, the given arrangement surrenders data admittance to sixteen memory ways. As per a proposed FFT part imagined the use of hardware capable increase and store units, the proposed device permits in high-throughput multimode FFT assignments in an energy and district powerful game plan.

In this paper [10], they have proposed a novel extensible FFT processor plan which is equipped for controlling distinctive change sizes and information channels for multi-standard MIMO handling. The basic developing squares are parted radix 2/4/8 FFT modules in a changed MDC (multi-way postpone commutator) structure, pre-requesting records cradle, radix-2 butterfly module, and fidget component module. With the right information stream reconfiguration in addition to two elective working incidences, the computer can uphold 64-, 128-, 512-and 1024-direct changes more than 1 toward 4 MIMO channels. The equipment use stays 100% in different working modes. The consequences of the execution in the 0.18  $\mu$ m TSMC measure show that the plan has a centre size of roughly 5.5 mm and a force utilization of 320 mW at 100MHz.

In paper [11] A FFT processor appropriate for IEEE 802.16e (WiMAX) OFDM mode is introduced by not many of the creators. FFT/IFFT processors are significant in OFDM handsets which they regularly devour



International Research Journal of Modernization in Engineering Technology and Science Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

extensive force likewise as involve huge region. The proposed FFT processor joins the pipelined engineering and hence, the memory-based engineering all together that it can work at the example rate and accordingly accomplish power proficiency. The processor is predicated on the multipath postpone commutator design with high-radix math units and two fundamental recollections for input buffering, transitional capacity, and yield reordering. A proposed struggle free memory tending to system empowers streaming FFT handling. The reproduction results show that a force saving of 29% is accomplished.

In this paper [12], a cost productive multiprocessing FFT processor enrolling two-way pipelined isolated memory structure is introduced. In light of this design, an unpredictable memory association plot has been outlined to make one-port SRAM. Accordingly, a blended radix butterfly unit is utilized, which makes the processor competent to work in various mode activity. When contrasted and different models the proposed design can effectively diminish region. A check chip for DVB-T/H transformed into applied with the expressed construction and worked in 0.18-µm one-poly six-metallic CMOS measure. The primary space of this chip was discovered to be 2.83mm with the energy dispersal of 25.8mW at 20MHz.

The paper [13] shows the Design and Implementation of FFT were finished utilizing VHDL Code. This paper clarifies four distinct sorts of FFT techniques which incorporate Cooley-Tukey, Good-Thomas, Radix-2, and Rader strategy and in this paper, the creator portrays the helpful utilization of VHDL code for the execution of Radix-2 FFT structure. This paper gives FFT measure Butterfly activity which is the primary unit that rates up the entire cycle. This can be accomplished by the right drawing of timetable charts dependent on the condition given in Radix 2 that was clarified exhaustively.

In this paper [14] we in specific spotlight on the equipment execution of low strength multiplier-considerably less radix-four single-way postpone critic pipelined fast Fourier change processor construction of sizes 16, 64, and 256 focuses. The paper likewise focuses on force, region, and timing report's for FFT. Low force Butterfly takes the contributions from memory and processes the FFT. Pipelined quick Fourier change is the execution of a Cooley-Tukey FFT calculation. The instrument utilized is rhythm RTL compiler focusing the TSMC 0.18U CMOS innovation library. The squares of FFT are recreated and the yields are seen utilizing modalism device in Verilog. This sort of FFT processor design isonly used for smaller FFT.

The research work [15] means to contribute 32-digit DIT Radix-2 FFT utilizing butterfly strategy. The Radix-2's 32-bit DIT-FFT recursively isolates a DFT into two odd and surprisingly filed DFTs of medium length time tests. The butterfly activity is faster, the yield of little changes is again used to ascertain numerous yields hence the all-out calculations cost turns out to be less. The 32-cycle Radix-2 DIT-FFT is combined in Spartan 3E starter. The plan was created utilizing the equipment depiction language (VHDL/Verilog) in Xilinx 14.2xc3s500E.

#### IV. CONCLUSION

This paper reviews the different architecture and implementation of FFT. It depicts the effective utilization of Verilog HDL code for the execution of the radix 2 FFT calculation and hence the waveform aftereffects of the shifted stages have been gotten effectively. The precision in outcomes can be expanded with the assistance of productive coding in Verilog HDL. The got conditions decide the exactness of the outcomes that is acquired from the butterfly chart.

# V. REFERENCES

- [1] Wei Han, Ahmet T. Erdogan, Tughrul Arslan, and Mohd. Hasan, "High-Performance Low-Power FFT Cores", ETRI Journal, Volume 30, Number 3, June 2008
- [2] Ruchira Shirbhate, Tejaswini Panse and Chetan Ralekar," Design of Parallel FFT Architecture Using Cooley Tukey Algorithm", This full-text paper was peer-reviewed and accepted to be presented at the IEEE ICCSP 2015 conference.
- [3] Deepak Revanna, Omer Anjum, Manuele Cucchi, Roberto Airoldi, Jari Nurmi, "A Scalable FFT Processor Architecture for OFDM Based Communication Systems", 2013 IEEE.
- [4] K. Sowjanya, B. Leele Kumari, "Design and Performance Analysis of 32 and 64 Point FFT using RADIX-2 Algorithm", Proceedings of AECE-IRAJ International Conference, 14th July 2013, Tirupati, India.



- [5] Afreen Fatima, "Designing and Simulation of 32 Point FFT Using Radix-2 Algorithm for FPGA", Department: Electronics Affiliated To: JNTU (HYD).
- [6] Ankush R. Kuralkar, "Design of Fast Fourier Transform using Processing Element for Real Valued Signal".
- [7] Xueqin Zhang, Kai Shen, ChengguangXu, Kaifang Wang, "Design and Implementation of Parallel FFT on CUDA", 2013.
- [8] "Modified SDF architecture for mixed DIF/DIT FFT" Seungbeom Lee, Sin-Chong Park 2007 IEEE International Symposium on Circuits and Systems, 2007
- [9] "Multimode memory-based FFT processor for wireless display FD-OCT medical systems" Song-Nien Tang, Fu-Chiang Jan, Hui-Wen Cheng, Ching-Kai Lin, Guo-Zua Wu.
- [10] "Scalable FFT kernel designs for MIMO OFDM based communication systems" Yin-Tsung Hwang, Ying-Ji Chen, Wei-Da Chen, 2007.
- [11] "Power-efficient continuous-flow memory-based FFT processor for WiMax OFDM mode" Pei-Yun Tsai, Tsung-Hsueh Lee, Tzi-Dar Chiueh 2006.
- [12] "Low-cost reconfigurable VLSI architecture for fast fourier transform" Hao Xiao, A Pan, Yun Chen, Xiaoyang Zeng ,2013
- [13] S. J. Vaughan-Nichols: —OFDM: Back to the Wireless Future (Computer, 35(12), 19–21, (2002).
- [14] Han W., Arslan T., Erdogan A., and Hasan M., "Low Power Commutator for Pipelined FFT Processors," IEEE International Symposium on Circuits and Systems, 2005
- [15] J. Ankesh, V. Muthusubramanian, and P. Shanthi, "Analysis and design of a high speed continuous-time  $\delta \Sigma$  modulator using the assisted Opamp technique", IEEE Journal of Solid-State Circuits, July 2012.