Small Logo

SummaryArchitectureDynamic RangePerformanceScalingPrecisionScalable ASICLTEReferences

 

Fast Fourier Transform

Architecture

The architecture is based on a new matrix equation form of the DFT which is derived using decimation in time and frequency.  For a transform length M, this form is

 

Y=Wb•CM1 X

Z = CM2 Yt 

                                                                                 

 where Wb is an M/4 x M/4  coefficient matrix, CM1 is an M/4 x 4 coefficient matrix, X is a 4 x M/4 input matrix, CM2 is a 4 x M/4 coefficient matrix, Z is a 4 x M/4 output matrix containing the transform outputs,  and “” means element-by-element multiply.  Two computational advantages arise as a result of this new matrix from.  First, the matrix products C1X and C2Yt involve only exchanges of real and imaginary parts plus additions because the elements of CM1 and CM2 contain only ±1 or ±j.  Second, the usual Z=CX form of the DFT requires M2 complex multiplications, vs. the (M/4)2 in above.

Functionally, the implementation of this matrix form can be mapped to a simple systolic array containing two adder arrays for the matrix multiplies and a linear array of multipliers to do the element-by-element multiplies as shown in Fig. 1(a) and (b).  Here, CM1 and Z are stored internally and X and CM2 flow into the array in systolic fashion from the boundaries.  The length of the array is M/4, so it scales with transform size in the north-south direction in the figure.  Each PE contains either a complex adder or multiplier, a couple of registers and a small memory and communicates locally with its neighbor.  This simplicity in is contrast with typical pipelined FFT circuits which have complex permutation circuits or commutators, large butterflies, irregularities from stage to stage and global connections.

 

architecture

Fig.1 (a) Functional operation of base-4 architecture and (b) circuit implementation for M=16 showing inputs at different times t and internal matrix values.

 

The transform length is computed using the array in Fig. 1 to perform the well-known row/column factorization, N= Nr Nc, where N is the desired transform length and Nc and Nr are the number of columns/rows.  This approach requires calculation of two sets of DFTs, Nc  transforms of length Nr (referred to as “column” transforms) and Nr  transforms of length Nc  (referred to as “row” transforms).  In between column and row transforms it is necessary to multiply each of the N points by a corresponding twiddle factor,Wnk , n=0,1,..,Nc-1,  k=0,1,..Nr-1.  (Without the twiddle multiplication a 2‑D DFT is performed.)  The column and row DFTs are performed sequentially using the same hardware array with M=Nr (Nr is the number of real multipliers).