Quick Links

- Variable FFT power-of-two
- Fixed Size FFT, power-of-two
- Variable FFT, non-power-of-two
- Floating Point (28-bit fixed)
- Floating Point (IEEE754)
- LTE SC-FDMA

**References**

- Summary of Technology and Results
- Rethinking the FFT
- Non-Power-of-Two Designs Do Not Have to Be Difficult
- Floating Point FFT with Miminal Hardware
- LTE SC DFT: Faster Circuits with Reduced LUT/Register Usage
- How to Reduce FPGA Cell Usage by >x5 for Floating-Point FFTs
- Centar FFT Design (Technical)
- Tools

FFT Circuitry for a *4G* Age

Centar LLC is a provider of fast Fourier transform (FFT) intellectual property (IP) for use in FPGA and ASIC-based embedded applications. It has developed a novel parallel matrix-based formulation of the discreet Fourier transform (DFT), which decomposes it into structured sets of *b*-point discreet Fourier transforms. All FFT circuits are constructed from synchronous, fine-grained, locally connected, regular arrays of small processing elements (PEs), consisting of a few registers, some multiplexors and an arithmetic element. Salient features of this technology are:
## Products

Example: comparative metrics for "streaming" circuits targeted to Stratix III, IV and Arria 10 FPGAs.

Example: LTE SC-FDMA relative performance, average over 35 different DFT sizes. (For details click here.)
## Tools

Because there are always a large number of (parallel) circuit architectures that can be obtained from an algorithm specification, Centar has developed an automated CAD tool, Symbolic Parallel Algorithm Development Environment (SPADE), to make the best choices. SPADE is the only such tool in existence that can find latency optimal circuits.

: The only FPGA FFT circuits with clock rates >500MHz using 65nm technology (e.g., Intel Stratix III).**Speed**: Data rates as high as ~10G complex samples per second**Throughput**: Combined block floating point and floating point architecture means smaller word lengths can be used for post processing operations such as equalization.**Dynamic Range**: Easy customization of FFT properties, functionality and I/O interface.**Programmability**: A single ROM memory can store control parameters to support any number or size FFTs**Non-powers-of-two transform sizes**: Faster transforms can be implemented without architectural changes by increasing the array size along one dimension or duplicating the array structure**Scalability**: Interconnects are entirely local, reducing parasitic routing capacitances to keep power dissipation low and speed high**Power****Cyclic Prefix****:**Circuit architecture is designed to support any prefix value (most FFT circuits require additional circuits to perform this functionIEEE754 single precision floating-point, fixed-size Stratix streaming circuits use far less of the FPGA fabric (1024-point FFT uses half the number of ALMs compared to the Intel's equivalent). Also, Centar's new Arria 10 circuits run at a 571 MHz sampling rate and, with hardwired floating-point DSPs, only use 2234 ALMs for an entire 1024-point transform!*Floating-Point (NEW)*:

: Fixed and variable (run-time selectable) size, 16 to 16,384 points.**Power-of-two FFT**

Category | Intel | Centar | Intel | Centar | Intel | Centar | Intel | Centar | Intel | Centar | Centar |

Transform Size | 256pts (fixed-point) | 1024pts (fixed-point) | 256pts (IEEE754) | 1024pts (IEEE754) | |||||||

ALMs/M9Ks | 4414/38 | 4024/31 | 4770/38 | 4357/31 | 10834/54 | 7834/30 | 13559/87 | 7186/62 | 4852/20 | 4106/30 | 2251/62 |

FFT Time (usec) | 0.68 | 0.48 | 2.72 | 1.92 | 0.86 | 0.60 | 3.6 | 2.7 | 2.37 | 1.75 | 1.79 |

SQNR/mean error | 87.8 | 86.7 | 81.3 | 82.9 | 2.4E-7 | 3.1E-8 | 2.9E-7 | 4.2E-8 | |||

uJ/FFT | 1.29 | 1.12 | 6.36 | 4.31 | |||||||

FPGA | Stratix III | Stratix IV | Arria 10 |

Example: comparative metrics for "streaming" circuits targeted to Stratix III, IV and Arria 10 FPGAs.

The reachable transform sizes for this class of designs are those that can be factored into a composite form based on small integers up to sizes of ~10. For example, the SC-FDMA LTE requirements would use the integers {2,3,5} to compute all 35 transform sizes, e.g.,**Non-power-of-two FFT:***N=2*, where^{n}3^{m}5^{q}*n*,*m*, and q are integers. These can be fixed-size or variable and selectable at run-time.

Design | FPGA | LUT | Registers | Block RAM (9/18K) |
Fmax | RB Average Throughput (cycles) |
Throughput (Normalized) |

Centar | Virtex-6 | 2915 | 2581 | 19 | 401 | 16.6N | 1 |

Xilinx | Virtex-6 | 3849 | 4326 | 10 | 407 | 23.4N | 0.72 |

Centar | Stratix III | 3816 | 3188 | 29 | 400 | 16.6N | 1 |

Intel | Stratix III | 2600 | N.A. | 17 | 260 | 32.9N | 0.31 |

Example: LTE SC-FDMA relative performance, average over 35 different DFT sizes. (For details click here.)