Small Logo

SummaryArchitectureReferences

Linear Algebra

Summary

Meeting latency and throughput requirements is a critical concern in many embedded signal and image processing applications. For example future 4G wireless applications will need to make extensive used of matrix computations in multiple-input-multiple-output (MIMO) protocols, estimation, multi-user detection, tracking, equalization, and adaptive antenna systems.

Here an architecture is described that is unique in that it is capable of performing a wide variety of such linear algebraic operations, e.g., matrix multiplication/addition, factorization, least squares. It is based on a scalable algorithm (Faddeev) and is suitable of building a parameterized design implementation. It uses a regular array of relatively simple processing elements (PEs), has nearest neighbor connections, requires only two different types of PEs, and has a simple control scheme with a single global clock.

Usage of this "systolic" architecture class has not been widespread in the past, in part because programmable hardware that supported this computing paradigm was not cost-effective to build.  However, the Faddeev based designs are simple, regular, and localized and thus make ASIC implementations much easier.  Also, modern FPGAs, which are constructed from tiling identical memory and logic blocks along with supporting mesh interconnection networks and hardwired multipliers, ideally matches this class of architectures.