

Volume 8. No. 10, October 2020 International Journal of Emerging Trends in Engineering Research Available Online at http://www.warse.org/IJETER/static/pdf/file/ijeter1208102020.pdf https://doi.org/10.30534/ijeter/2020/1208102020

# An Efficient FIR Filter Design using DA based Speculative Residue and Reverse Computation RNS system

Reddy Hemantha. G<sup>1</sup>, Varadarajan. S<sup>2</sup>, Giriprasad. M. N<sup>3</sup>

<sup>1</sup> Research Scholar, Department of ECE, JNT University, Andhra Pradesh, India; hemanthag75@gmail.com
<sup>2</sup> Professor, Department of ECE, SVU College of Engineering, SV University, Andhra Pradesh, India, varadasouri@gmail.com

<sup>3</sup> Professor, Department of ECE, JNTUA College of Engineering, Andhra Pradesh, India;

mahendragiri1960@gmail.com

### ABSTRACT

Finite impulse response (FIR) filter is prominently used in many digital signal processing (DSP) systems for various applications. In this paper, we present high-performance RNS based finite impulse response (FIR) filters design for ECG signal classification. In general, the residue number system (RNS) gives significant metrics over FIR implementation with its inherent parallelism and data partitioning mechanism. But increased bit width cause considerable performance trade-off due to its residue computation and the reverse conversion. In this paper optimized Residue Number System (RNS) arithmetic is proposed which includes distributed arithmetic based residue computation during RNS multiplication followed by speculative delay optimized reverse computation to mitigate the FIR filter trade-off characteristics with filter length. The proposed RNS design utilizes built-in block RAMs available in FPGA devices to accomplish the reverse conversion process. A distinctive feature of our FIR filter implementation with core optimized RNS is to minimize hardware complexity overhead with the improved operating speed. Initially fetal ECG signal detection is carried out to validate the functionality of FIR filter core and FPGA hardware synthesis is carried out for various input word size and FIR length. From the experimental, it is proved that the trade-off exists in conventional RNS FIR over filter length is narrow down along with considerable complexity reduction with our proposed optimized RNS system

**Key words :** DA arithmetic, FIR filter, RNS system, Speculation.

# **1. INTRODUCTION**

Highlight In recent years with the growing demand for sensor computing and wireless communication devices, how to design compact processing units to reduce the hardware cost becomes a major concern of many digital designers. In real-time Digital signal processing (DSP) systems are prominently used in the sensor network and communication area. Among various processing units, the Finite impulse response (FIR) filter is widely considered as one of the primary components in DSP applications. In recent years many works have been investigating the complexity reduction, high throughput, and low power FIR design. Recently, the usage of sensor devices in wireless body area networks renders all existing FIR architectures less suitable for sensor node deployments. On the other hand, high-precision filter coefficients with improved frequency response have gained much attention for accurate filtering operations. In light of this trend, both the hardware complexity reduction and high-end filter order need be taken into account in FIR filter design. Due to the extension of filter length to improve the filter coefficient precision for a wide range of applications, it is nearly impossible to design a unified FIR structure without incurring overhead in hardware complexity. Moreover, higher-order FIR filter implementation causes a linear increase in the hardware complexity overhead alongside associate operating speed constraint. In many existing FIR design implementation, only this hardware penalty gap is considered as a tolerated component. This provides several prominent solutions, where the underlying hardware utilization rate is reduced with the appropriate optimization techniques. In general, two different methodologies were widely adapted to design RNS computation. (i) LUT based models (ii) conventional binary modules. Lookup table-based RNS implementation offers significant metrics when the moduli sets are smaller whereas the binarized RNS system exhibits better performance over large moduli which can accommodate large size input operands. FIR filters always require highly précised coefficients for improved accuracy which is using a lateral approach to incorporate the RNS system. Recently, [1] proposed a binary serial implementation of the RNS system for an adaptive FIR filter to eliminate the complex scaling.

LUT based RNS arithmetic is developed in [2] to utilize built-in FPGA characteristics which have 6-input Look-Up Tables (LUTs) as logical blocks. In this design, each LUT module consists of a wide range of parameter configurations to accommodate a different number of taps and coefficient word lengths.

In [3] invented binary coded format to compute the residues and the thermometer code encoded residues to compute the modular inner products. This Distributed arithmetic involves no carry propagation in accumulation and pre-computed LUT blocks to attain maximum operating speed and least possible hardware complexity overhead in FIR filter design. In [4] integrates the RNS accumulator with a radix-4 high-performance booth multiplier to accomplish flexibility and low complexity in FIR filter design. In [5] proposed an error compensated FIR filter using a redundant residue number system that utilizes decomposed residue to binary converters. Here soft errors are generated based on single event upset (SEU) during MAC computation and appropriate down convert moduli set are used for FIR computation to mitigate the faults. In [6] proposed a binary number to residue number converter less modular multiplication to reduce the hardware complexity and power consumption. This method also includes a pre-loaded product block to minimize the computational cost and delay during partial product generation for each FIR taps. In [7] proposed novel end-around carry units (EAC) to optimize the complication in modular addition to solve the performance trade-off that exists in any RNS FIR filter design with an increased number of FIR taps. In [8] used constant shifting accumulation to mitigate the path delay limitations that exist in the reverse conversion process of RNS arithmetic.

Several research works proved that the residue number system (RNS) can give a notable hardware utilization rate as compare to all other optimization models. In this paper, we propose DA based speculative RNS MAC unit for FIR filer design which has metrics as follows:

- DA based residue computation unit will offer considerable complexity reduction over a wide range of moduli set.
- It can be used for higher-order FIR design to solve tradeoff constraints related to the conventional RNS system.
- Speculative reverse conversion at the final stage can improve the speed of accumulation with direct RAM-based data accessibility.

This paper is organized as follows: Section 2 discusses various FIR filter implementation and its RNS arithmetic structure. Various stages of RNS computation and characteristics have been analyzed in Section 3. Section 4 explains the speculation and DA model used for the RNS MAC process and its associated FIR structure, followed by the conclusion.

# 2. RNS SYSTSEM

In Residue number system arithmetic computation is carried out using predefined moduli set, which comprises prime integers as moduli's. The range of input operands that can be accommodated by the RNS number system without causing any truncation in the results is statistically formulated based on moduli set values and the number of elements used as moduli's. RNS system for given dynamic range M can accommodate the results in the range of [0, M - 1]irrespective of the arithmetic used. In addition to this, during RNS computation each moduli's and associated computations are carried out as isolated channels in L parallel paths which reduce the path propagation delays considerably. Moreover, this path delay is further optimized using prefix topology-based accumulation within the RNS system in the proposed method as shown in Figure. 2.

# 2.1 Moduli conversion

Considered modulis  $\{m1, m2,...mn\}$  and its associated residue  $\{r1,r2,...rn\}$  are related to the input operands X as given below equations:

ri= |X|mi,

$$|X||M = \left|\sum_{i=1}^{n} r_{i}|M_{i-1}|m_{i}M_{i}|\right| M$$
(1)

Where M is the product of all modulis, and Mi=M/mi. This can be rewritten as:

$$X = \{r_1, r_2, ..., r_n\} = \{r_1, 0, ..., 0\} + \{0, r_2, ..., 0\} + \{0, 0, ..., r_n\}$$

$$X = X_1 + X_2 + X_n$$

(4)

Finally, the post-processing unit called the reverse conversion process computes Xi's as follows:

$$X_{i} = r_{1} \times \{0, 0..1, ..0, 0\} = r_{i} \times X_{i}$$
(3)
Where Xi is computed in such a way that |Xi |mi = 1. The

Where Xi is computed in such a way that |Xi |mi =1. The Equation that relates ri and its inverse ri-1 is as follows:  $(r_i \times r_{i-1}) \mod m_i = 1$ 

Mi is defined as M/Mi, Then:

$$\|M_{i-1}\|m_iM_i\|m_i = 1$$
(5)

All mi's are relatively prime, the inverses exist:

Reddy Hemantha. G et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 7412-7417

$$X_{i} = |\overline{M}_{i-1}| m_{i} \overline{M}_{i}$$

$$X = \sum_{i=1}^{n} X_{i} = \sum_{i=1}^{n} r_{i} |\overline{M}_{i-1}| m_{i} \overline{M}_{i}$$
(6)
$$(7)$$

According to the dynamic range evaluated prior to the computation both input operands and moduli's are selected within the required range and finally modulo reduction is performed on both sides of the Eq.(7).

### 2.2 Memory efficient post computation

After residue computation which follows conventional arithmetic operation reverse conversion is performed as post computation to convert the residue number to an integer value. This process involves numbers with a maximum range of values depends on the size of each moduli's. Here all possible results are pre-computed and stored in memory as readily available blocks to perform the reverse conversion process. It will reduce the hardware complexity of the reverse conversion unit since all these memory units are transformed into dedicated onboard block RAMs during hardware synthesis. As compared to other computations this on-chip memory access not only minimize the computation cost the access time is also minimized with the least critical path as shown Figure.1 in



Figure 1: RAM based RNS reverse conversion processing unit

# **3. FIR FILTER BASED ON RNS**

# **3.1. DA based residue computation using approximated speculation**

The inclusion of speculation during accumulation allows overall path delay propagation with appropriate carry approximation. Here it is incorporated to perform multiplication as a sequential of addition as given in [10].Here multiplier less DA arithmetic includes with most appropriate pre-processing units to narrow down the carry propagation path and this will keep the critical path delay as constant using prior computations. Without using any inner stage pipelining units accumulation speed is increased with inherent metrics of low complexity. Here MAC unit designed with speculative delay optimized accumulation unit helps to meet the demand requirements of FIR filter design, and reduce the performance penalty gap that arises with FIR tap extension. Moreover, speculative units compute accumulation in identical blocks as compared to conventional parallel prefix computation methods which lead to significant path delay optimization. Implementation of FIR filter using Residue Number System (RNS) with DA based arithmetic has the following advantages: improved data rate with inherent parallelism, modularity, and path optimized speculation operation. Here the entire MAC operation is performed using RNS in each FIR tap as shown in Figure 2. MAC is the core operation used in FIR filters in which speculation features of prefix topology with the RNS system beneficial for high implementation [9].

### **3. EXPERIMENTAL RESULTS**

Experimental results are carried out over different sets of moduli's to validate the performance metrics of the proposed RNS FIR design. Here digital design is carried out using Verilog HDL and its functionality is verified using ModelSim simulation and metrics are evaluated using FPGA QUARTUS II hardware synthesis as shown in Table 1 and Table 2. From the experimental results, it is well proved speculative DA model exploits gives considerable path delay reduction with significant resource optimization level. Hardware complexity is evaluated as logical element utilization during hardware synthesis which showed the superiority of speculative DA based RNS system in various dynamics of FIR filter characteristics.

### 3.1 Fetal ECG detection using extended Tap unit

Here fetal ECG signal detection methodology is presented to validate the efficiency FIR filter design with extended length and its functionality. The fetal detection process is carried out using high-precision filter coefficients with improved frequency response and accurate filtering operations. As shown in figure 3 the decomposed output samples are lesser in magnitude which is distinct from ECG patterns.

#### 3.2. Delay optimization

The major limitations come out with improved FIR impulse response is mitigated with carry approximated accumulation and speculation driven reverse conversion in RNS multiplication. Moreover, error propagation in DA based sequentially stage-wise shifting operation during FIR computation is solved with an error correction unit. Both error correction and reverse computation in the RNS system plays a prominent role in overall performance in terms of complexity reduction as well as critical path reduction as shown in Figure 4. And the performance metrics of DA based residue computation both in terms of complexity reduction and performance retention is increased with the FIR order as shown in Figure 5.







Figure 3: Fetal ECG detection reconstructed output

| Fr | nax Summary           Fmax         Restricted Fmax         Clock Name         Note           71.25 MHz         clk |            |                              |                                                      |            |  |  |  |  |
|----|--------------------------------------------------------------------------------------------------------------------|------------|------------------------------|------------------------------------------------------|------------|--|--|--|--|
|    | Fmax                                                                                                               |            | Clock Name                   | Note                                                 |            |  |  |  |  |
| 1  | 71.25 MHz                                                                                                          | 71.25 MHz  | clk                          |                                                      |            |  |  |  |  |
| 2  | 156.69 MHz                                                                                                         | 110.08 MHz | Moduliconversion:Cin15 r1[0] | limit due to low minimum pulse width violation (tcl) |            |  |  |  |  |
| 3  | 231.16 MHz                                                                                                         | 96.62 MHz  | Moduliconversion:Cin18 r2[1] | limit due to hold check                              |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              |                                                      |            |  |  |  |  |
|    |                                                                                                                    |            |                              | <b>TI</b> ( <b>T</b>                                 |            |  |  |  |  |
|    |                                                                                                                    |            |                              | Figure 4: Fma                                        | ax report. |  |  |  |  |

| Input word length<br>size | Moduli<br>set(2n+1,2n,2n-1) | RNS with speculation and<br>conventional residue<br>computation |          | RNS with speculation<br>and DA based residue<br>computation |          |
|---------------------------|-----------------------------|-----------------------------------------------------------------|----------|-------------------------------------------------------------|----------|
| Size                      |                             | Area (LEs)                                                      | Fmax     | Area(LEs<br>)                                               | Fmax     |
| 8 bit                     | (7,8,9)                     | 4281                                                            | 57.3MHz  | 4088                                                        | 75.48MHz |
| 16 bit                    | (31,32,33)                  | 14374                                                           | 24.96MHz | 12165                                                       | 29.78MHz |

Table 1: Performance trade off comparison over input word length

 Table 2: Performance analyzes of speculative DA based RNS FIR design.

| FIR length      | RNS with s<br>and convention<br>compute | onal residue | RNS with speculation<br>and DA based residue<br>computation |          |  |
|-----------------|-----------------------------------------|--------------|-------------------------------------------------------------|----------|--|
|                 | Area (LEs)                              | Fmax         | Area(LEs<br>)                                               | Fmax     |  |
| 4 tap           | 2096                                    | 63.46MHz     | 2009                                                        | 73.2MHz  |  |
| 4 tap<br>16 tap | 8623                                    | 57.3MHz      | 8193                                                        | 71.25MHz |  |



(a) Hardware complexity overhead

(b) Performance penalty gap

Figure 5: Improved performance trade off comparison of DA based arithmetic in RNS over FIR length.

# **5. CONCLUSION**

This paper focused on the implementation of high-end FIR filters using optimized RNS units for fetal ECG signal detection process. The hardware synthesis results presented in this work proved that each level of optimizations carryout in RNS computation has a direct impact on hardware rate and performance retention of the FIR filter design. Here both the RAM-based speculative reverse conversion and DA based residue computation used for path delay reduction in the RNS system which can able to reduce the performance penalty gap in FIR filter design. This work restores consistent performance metrics with the extension of the FIR filter tap by incorporating an optimized RNS MAC and memory-efficient reverse converter unit.

# REFERENCES

- G. L. Bernocchi, G. C. Cardarilli, A. D. Re, A. Nannarelli, and M. Re. Low-power adaptive filter based on RNS components, *In International Symposium* on Circuits and Systems, New Orleans, pp. 3211-3214, IEEE, May 2007
- S. Pontarelli, G. C. Cardarilli, M. Re, and A. Salsano. Optimized implementation of RNS FIR filters based on FPGAs. *Journal of Signal Processing Systems*, Vol. 67, No. 3, pp. 201-212, 2012.
- 3. C. H. Vun, A. B. Premkumar, and W. Zhang. A new RNS based DA approach for inner product computation." *IEEE Trans. on Circuits and Systems I: Regular Papers*, Vol. 60, No. 8, pp. 2139-2152, 2013.
- 4. J. B. Pari and S. P. Rani. Reconfigurable architecture of RNS based high speed FIR filter, 2014

- 5. Z. Luan, X. Chen, N. Ge, and Z. Wang. Simplified fault-tolerant FIR filter architecture based on redundant residue number system, *Electronics Letters*, Vol. 50, No. 23, pp.1768-1770, 2014.
- R. Srinivasa, Kotha, and S. K. Sahoo. An approach for fixed coefficient RNS-based FIR filter. International Journal of Electronics, Vol. 104, No. 8, pp.1358-1376, 2017.
- A. Belghadr and G. Jaberipur. FIR filter realization via deferred end-around carries modular addition. *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 65, No.9, pp. 2878-2888, 2018.
- 8. P. Patronik and S. J. Piestrak. **Design of RNS reverse converters with constant shifting to residue data path channels.** *Journal of Signal Processing Systems*, Vol. 90, No.3, pp. 323-339, 2018.
- G. R. Hemantha, S. Varadarajan, and M. N. Giriprasad. DA Based Systematic Approach Using Speculative Addition for High Speed DSP Applications. International Journal of Engineering & Technology, Vol.7, No. 2, pp. 197-199, 2018.
- G. R. Hemantha, S. Varadarajan, and M. N GiriPrasad. FPGA Implementation of Speculative Prefix Accumulation-Driven RNS for High-Performance FIR Filter. In Innovations in Electronics and Communication Engineering, Singapore, pp. 365-375, Singapore: Springer, 8 February 2019.