Fast Fourier Transform
Architecture
The architecture is based on a new matrix equation form of the DFT which is derived using decimation in time and frequency. For a transform length M, this form is
Y=Wb•CM1 X
Z = CM2 Yt
Functionally, the implementation of this matrix form can be mapped to a simple systolic array containing two adder arrays for the matrix multiplies and a linear array of multipliers to do the element-by-element multiplies as shown in Fig. 1(a) and (b). Here, CM1 and Z are stored internally and X and CM2 flow into the array in systolic fashion from the boundaries. The length of the array is M/4, so it scales with transform size in the north-south direction in the figure. Each PE contains either a complex adder or multiplier, a couple of registers and a small memory and communicates locally with its neighbor. This simplicity in is contrast with typical pipelined FFT circuits which have complex permutation circuits or commutators, large butterflies, irregularities from stage to stage and global connections.

Fig.1 (a) Functional
operation of base-4 architecture and (b) circuit implementation for
M=16 showing inputs at
different times
t and
internal matrix values.
The transform length is computed using the array
in Fig. 1 to perform the well-known row/column factorization,
N=
