Fast Fourier Transform
3GPP LTE SC-FDMA
The LTE protocol proposes an uplink that departs significantly from the LTE downlink (and WiMax uplink/downlink) in that it is single channel (3GPP TS 36.211 v8.1.0 2007-11). In this case a DFT proceeds the IFFT and its size is determined by the number of resource blocks (6≤#RB≤110) and is restricted to allow factors of only from the set {2,3,5}.
To implement the single carrier frequency division multiple access (SC-FDMA) DFTs the same architecture as for the FFTs can be used because it is programmable. For example, to do the "scalable" FFTs you just write a different program for each FFT size, rather the usual approach of picking an FFT size off a particular hardware pipeline stage. The SC-FDMA DFTs can be programmed in much the same way.
The programs for the SC-FDMA DFTs would all
use the well known "row/column factorization" method again for
computing the transform. Here, the transform size is
N=
The only programming differences between the FFTs for OFDM and the SC-FDMA DFTs is that the FFTs use base-4 processing for both column and row DFTs, whereas the SC-FDMA DFTs use the base-2 through base-6 forms.
Throughput cycle counts (cycles per DFT) for the SC-FDMA DFTs are shown below for a circuit consisting of approximately 2000 4-input LUT/FF pairs, 13 complex multipliers, and 13 memory blocks. Speeds of well over 400MHz have been demonstrated already. (The cycle counts for SC-FDMA DFTs come from a high level simulation so are estimates.) The same circuit can do all the FFTs as well.
| DFT Size | Cycle | DFT Size | Cycle | DFT Size | Cycle |
| N | Count | N | Count | N | Count |
| 1200 | 4805 | 192 | 469 | ||
| 1152 | 3461 | 540 | 1265 | 180 | 365 |
| 1080 | 2345 | 480 | 1045 | 144 | 373 |
| 972 | 3893 | 432 | 1381 | 120 | 270 |
| 960 | 3045 | 384 | 853 | 108 | 221 |
| 900 | 1805 | 360 | 905 | 96 | 197 |
| 864 | 1913 | 324 | 833 | 72 | 221 |
| 768 | 2469 | 300 | 730 | 60 | 145 |
| 720 | 3125 | 288 | 661 | 48 | 101 |
| 648 | 1481 | 240 | 565 | 36 | 36 |
| 600 | 1330 | 216 | 482 | 24 | 29 |
| 576 | 1157 | 12 | 17 |
Note that the signal-to-quantization-noise ratio (SQNR) of our architecture is much higher for a given word length than other fixed and block floating point designs. Our 85-90db SQNR for 16-bits (see "Dynamic Range" tab) is higher than what LTE needs, so a smaller word length might be used in which case all the resource/power numbers above would scale down and the clock speed would go up.
From the table it can be seen that the worst
case DFT (1200-points) is 3730 cycles (the largest FFT, 2048-points,
is 8357 cycles). At 426MHz, this corresponds to 8.7 and 19.6 usec or
28.3 usec total.
