## Author

### P. V. Ananda Mohan

CDAC, Bangalore

In an earlier version of our paper [1], a general introduction to Residue Number System based arithmetic [2] [3] [4] [5] [6] has been presented. In this continuation article, we give a tutorial on the design of a RNS based processor for specific application in Digital filtering. The most popular application of RNS is in the design of FIR (Finite Impulse Response) filters.

In the case of FIR filters (see Figure 1), we need to perform several multiplications of present sample and previous input samples *x(n)* with fixed coefficients *h(n)* defining the impulse response of the filter and accumulate the results of these multiplications to obtain *y(n)*:

Note that N is the number of taps of the FIR filter.

We need to convert input samples to RNS form whenever sample is available and these residues need to be stored in a RAM. The binary to RNS conversion takes a time of Δ_{FC}. (FC stands for Forward Converter). The coefficients which are in binary form also need to be converted into residue form and stored in ROM. This is done only once, after the FIR filter is designed using some DSP filter design programs like in MATLAB tool box. We need to read the RAM and ROM contents and multiply the residues in separate *i* Residue processors. This needs modulo multipliers to compute (a×b) modulo *m _{i}* where

*m*is

_{i}*i*-th modulus. After each multiplication, we need to accumulate the residues in modulo

*m*adders. This MAC (Multiply and Accumulate) operation needs to be carried out

_{i}*N*times for a

*N*-tap FIR filter. After this computation we need to convert the resulting residues corresponding to

*y(n)*into conventional binary form using a RNS to Binary converter. Since in each sample clock period,

*N*modulo multiplications and modulo accumulations (additions with previous stored sum) need to be carried out, we need a computation time of N× Δ

_{MACR}where subscript R stands for residue Multiply and Accumulate (MAC) operation. The total computation time is thus Δ

_{FC}+ N×Δ

_{MACR}+ Δ

_{RC}where Δ

_{FC}is the time or forward conversion and Δ

_{RC}is the time for reverse conversion. This may be compared with computation using conventional DSP which is N×Δ

_{MAC}. Typically modern DSPs have 16 bit × 16 bit multipliers and an accumulator of width 40 bits. This means that 256 MAC operations can be performed very accurately and after the computation the result can be truncated or rounded off and taken as 16 bit word. To be competitive, RNS processors also shall have similar dynamic range for the accumulated result of 40 bits whereas the operands to be multiplied are 16 bits. We start with this design problem.

*Choice of Moduli set*

For having a 40 bit dynamic range, we can choose RNS of several moduli. If we consider three moduli set {2^{n}-1, 2^{n}, 2^{n}+1}, we have dynamic range slightly more than 40 bits by choosing *n* = 14. In other words we have three processors working with bit lengths 14, 14 and 15 bits for the three moduli {2^{14}-1, 2^{14}, 2^{14}+1}. You can see the decrease in word length from 40 bits to 14 bits for each accumulator. If we consider this to be large, we can choose a four moduli set among several options available for example

{2^{n}-1, 2^{n}, 2^{n}+1, 2^{n+1}-1} with *n* = 10 i.e. the moduli are {1023,1024,1025, 2047} which reduces the word length of each residue processor from 14 bits to 10 bits. Alternatively, we can choose a five moduli set

{2^{n}-1, 2^{n}, 2^{n}+1, 2^{n+1}-1, 2^{n-1}-1} with *n* = 8 i.e. the moduli are {255,256,257,511,127}. Evidently, the word length is 8 to 9 bits. Note that the moduli need to be mutually prime. We need thus modulo adders of 8, 8, 9, 9 and 7 bits respectively for the five moduli set. We also need modulo multipliers for these word lengths. We will now describe each of the building blocks.

**Figure 1: A N-tap FIR filter**