# Counter Based Numerically Controlled Oscillator – A New Architecture



Shachi P<sup>1</sup>, R. Mishra<sup>2</sup>, J. Ravi Kumar<sup>3</sup>

<sup>1</sup> National Institute of Technology, Warangal, India, <u>shachip88@gmail.com</u>
 <sup>2</sup> ISRO Satellite Centre, Vimanapura PO, Airport Road, Bangalore, India, <u>rahulm@isac.gov.in</u>
 <sup>3</sup> National Institute of Technology, Warangal, India, <u>ravikumar@nitw.ac.in</u>

**Abstract** - The paper is about a new architecture for NCO different from the conventional LUT (Look Up Table)-based architecture. The key difference in this technique, as compared to other well known approaches, is the incorporation of non-uniform sampling intervals so that the amplitude values of the sinusoidal signal are uniformly spaced. The advantages of this method are that output frequency range can be increased without altering the design and any re-computation. Also the number of samples within the signal remains constant irrespective of the output frequency of the signal. This approach saves in hardware and is comparable to other approaches with respect to the accuracy of the signal reproduced.

**Key words:** Counter, LUT, Non-uniform sampling, Quantization Error, SFDR (Spurious Free Dynamic Range).

# **INTRODUCTION**

So far there have been many methods proposed for implementing a numerically controlled oscillator (NCO) i.e., LUT (Look Up Table)-based NCO, CORDIC (COordinate Rotation DIgital Computer) algorithm based NCO and so on [1]. A conventional LUT-based NCO is shown in Fig.1. The basic principle in conventional method is that, the phase accumulator output increments with equal desired increments to produce desired frequency [2]. The sampling is uniform in conventional methods and the values loaded into LUT are the corresponding amplitude values. In LUTbased approach, the output frequency is decided by the Frequency Control Word (FCW) whose width in turn decides the number of output frequencies that can be synthesized. The number of possible output frequencies is fixed for a design and to change it, the values are to be recomputed. As the frequency increases, the number of samples decreases increasing the quantization noise and hence decreasing the SNR (Signal to Noise Ratio).

# THE CONTRIBUTIONS OF THIS WORK

A new method for NCO has been proposed here with the objectives to reduce the architecture complexity, to maintain constant number of samples irrespective of the output frequency. This approach reduces the rigidity in increasing



Fig 1: Conventional LUT-based NCO

the number of possible output frequencies. Here, we have presented the simulation results of counter based technique and briefly accounted for the comparisons between various approaches of NCO and our approach.

#### THE NEW ARCHITECTURE

Consider one quadrant of the sine wave with eight sampling intervals with unit amplitude. In the new architecture, non-uniform sampling is employed so as to get amplitude values uniformly spaced as shown in Fig.2.

A generalized block diagram of the proposed architecture with 8 samples per quadrant is shown in Fig.3. It mainly consists of an LUT and three counters – Counter1, Counter2 and Counter3. The inputs are frequency control word (FCW) and master clock ( $f_{clk}$ ) and the output is digitized sine amplitude values.



Fig 2: Quarter wave plot with the new approach



Fig 3: Block diagram of proposed NCO architecture

Suppose counter1 is L-bit wide, counter2 is K-bit wide (K chosen accordingly that sum of LUT values is  $2^{M}$ ) and counter3 is (N+1)-bit wide (1 bit used for signed representation). So there are  $2^{N}$  values in the LUT each M-bit wide and digitized sinusoidal output has  $(4*2^{N})$  samples. The LUT and counters' functions have been detailed in the following text.

# LUT:

This contains the binary values proportional to the sampling intervals. The LUT word is K-bit wide and totally  $2^{N}$  values are stored. The values are computed as follows:

$$t = \sin^{-1}a \tag{1}$$

$$\Delta t = t_{i+1} - t_i \tag{2}$$

$$b = \frac{2^M * \Delta t}{\Delta t_{max}} \tag{3}$$

Where,

 $t_i$  – sampling instant corresponding to i<sup>th</sup> sample

- $\Delta t$  Sampling interval
- $\Delta t_{max}$  Maximum sampling interval

a – Amplitude value

The binary value corresponding to *b* is loaded into LUT. The address to the LUT is generated by count\_out of counter3 and the word at corresponding address is the word\_out.

### **Counters:**

The counters here are edge-triggered and will have two inputs and two outputs – clock, count\_value on reset, count\_out, flag which sets when the count\_out goes "0".

#### 1. Counter1

This is a down-counter. The inputs are frequency control word (FCW) and master clock frequency  $f_{clk}$  and output is flag1 which is the clock driving the counter2. The flag1 is set when the count\_out goes "0".

#### 2. Counter2

This is a down-counter. The inputs are LUT word\_out and flag1 and output is flag2 which is the clock driving counter3.

#### 3.Counter3

This is a up-counter which counts from "0" to "7". The input is flag2 as clock and output is count\_out which gives the sine amplitude values.

For full sine wave, counter3 is made up-down counter which counts from "0" to "7" and back to "0". A simple logic is used to get negative values. Once the counter comes back to "0", a 'sign' flag is set to take negative values of the counter3 count\_out values. Hence in alternate cycles (from 0 to 7 and back to 0), the 'sign' flag keeps toggling.

The maximum possible output frequency is,

$$f_{out\_max} = \frac{f_{clk}}{4 * 2^M} \tag{4}$$

The range of output frequencies is  $(f_{out\_max}/2^L)$  to  $f_{out\_max}$ . Resolution of the output frequency is in terms of time period,  $(1/f_{clk})$ s i.e., the output frequencies are  $f_{out\_max}/f_{out\_max}/2$ ,  $f_{out\_max}/3$  and so on.

The frequency control word (FCW) for desired frequency  $f_{out}$  is calculated as follows:

$$FCW = \left(\frac{f_{out\_max}}{f_{out}}\right) - 1 \tag{5}$$

#### SIMULATION AND TEST RESULTS

The NCO is simulated using MATLAB Simulink. The Simulink model is shown in the Fig.4. The model is simulated for output frequency of 10 KHz using the following design values:

Master clock frequency = 10MHz Number of samples per quadrant = 32 LUT word-length = 5 bits FCW = 1

The simulation output and the output spectrum of the NCO Simulink model are shown in the Fig.5 and Fig.6 respectively. The SFDR of the designed NCO model is 42dbc. In Fig.5 (a), the generated sinusoidal signal and analog version of sinusoidal signal is shown and the error difference between these two signals is shown in Fig. 5 (b). The RMS quantization error is 0.02024. For comparison, the output spectrum of sampled version of a continuous sinusoidal signal of frequency 10 KHz is shown in Fig.7.



Fig 4: Simulink model of the NCO



Fig 5: Simulink model output (a) Sine output, (b) Error plot



Fig 6: Output spectrum of Simulink model of the NCO



Fig 7: The output spectrum of a sampled sinusoidal signal without noise

The simulation is also carried out using Altera Quartus II, with the NCO design implemented using VHDL, choosing Cyclone II FPGA as the target device. The total logic elements used are 64/4608 (1%). The simulation result is shown in Fig. 8.



|            |        | 69.275 ns | 89.663 ns | 90.051 ns | 90.439 ns | 90.827 ns     | 91.2,15 ns | 91.603 ns | 91.991 ns | 92.379 ns | 92.767 ns | 93.155 |
|------------|--------|-----------|-----------|-----------|-----------|---------------|------------|-----------|-----------|-----------|-----------|--------|
|            | Nar    |           |           |           |           |               |            |           |           |           |           |        |
| 0          | ck     |           |           |           |           |               |            |           |           |           |           | IIIIM  |
| 1          | 🗄 sine | 27(-28)(- | 29 (-30 ) | -31       |           | -30 (-29 )-28 | (7)2232(   | 3222100   | XICODIO   | 1660636   | 13404483  | 0560   |
| <b>9</b> 8 | ₩ W    |           |           |           |           |               | [1]        |           |           |           |           |        |

(b)

Fig 8: Simulation result of the NCO design using Altera Quartus II (a) Actual output (b) Zoomed-in version of (a)

#### PERFORMANCE EVALUATION

The proposed method has several advantages which are as follows. The number of samples is fixed irrespective of the desired frequency; the architecture is simple – only counters and LUT are used; there is no redundancy as all the stored values are used for every frequency. Only on changing the size of counter1, the output frequency range can be changed. The LUT size need not be altered unless accuracy required changes. Also the LUT values need not be recomputed.

However, the master clock frequency is scaled down by  $4*2^{M}$  to attain the maximum possible output frequency, hence limiting the application of this architecture only for low frequency applications. If the quantization error has to be reduced, there is need to increase the LUT word-length, which limits the highest output frequency possible for a given clock frequency.

The performance evaluation metrics used are RMS (Root Mean Square) quantization error and SFDR (Spurious Free Dynamic Range). SFDR is the ratio of the RMS value of the signal to the RMS value of the worst spurious signal regardless of where it falls in the frequency spectrum. The worst spur may or may not be a harmonic of the original signal. SFDR is an important specification in communication systems because it represents the smallest value of signal that can be distinguished from a large interfering signal (blocker). The quantization error gives the measure of noise power in the output and hence the SNR can be calculated.

Tuning the parameters to the below mentioned values, SFDR and quantization error are evaluated:

| Master clock frequency = $10 \text{ MHz}$   |                               |  |  |  |  |  |  |
|---------------------------------------------|-------------------------------|--|--|--|--|--|--|
| Desired output frequency = $10 \text{ KHz}$ |                               |  |  |  |  |  |  |
| FFT length = 128 points                     | Spectrum scope specifications |  |  |  |  |  |  |
| Buffer size = 128 bits                      | to view the output spectrum   |  |  |  |  |  |  |

The results for different clock counts per quadrant  $(2^{M})$  are tabulated in Table 1, Table 2 and Table 3.

**Table 1**: SFDR and Quantization error evaluation for FCW = 3, total clock counts per quadrant =  $2^{6}$ 

| No. of samples per<br>quadrant | SFDR (in dbc) | RMS quantization<br>error |  |  |
|--------------------------------|---------------|---------------------------|--|--|
| 8                              | 28            | 0.07612                   |  |  |
| 16                             | 37            | 0.03704                   |  |  |
| 32                             | 30            | 0.03787                   |  |  |

**Table 2**: SFDR and Quantization error evaluation for FCW = 1, total clock counts per quadrant =  $2^7$ 

| No. of samples per<br>quadrant | SFDR (in dbc) | RMS quantization<br>error |  |  |
|--------------------------------|---------------|---------------------------|--|--|
| 8                              | 27            | 0.07636                   |  |  |
| 16                             | 34            | 0.03574                   |  |  |
| 32                             | 42            | 0.02024                   |  |  |
| 64                             | 28            | 0.0399                    |  |  |

No. of samples per **RMS** quantization SFDR (in dbc) quadrant error 8 24 0.07883 32 0.03853 16 32 40 0.01927 42 0.01279 64

**Table 3:** SFDR and Quantization error evaluation for FCW = 0, total clockcounts per quadrant =  $2^8$ 

From the above tables, we can conclude that for a particular LUT word-length there is an optimum value for the number of samples which gives maximum SFDR and least quantization error. For a particular LUT word-length, if the number of samples is less than the optimum value, then the quantization error is more because of reduced number of quantization levels. If the number of samples is more than the optimum value keeping M constant, then too quantization error increases because the accuracy in LUT values corresponding to the sampling intervals is lost.

Considering that the number of samples per quadrant is  $2^{N}$  (i.e., the total number of samples is  $4*2^{N}$ ) and total clock counts per quadrant (i.e., sum of the values in the LUT) is  $2^{M}$ , it is observed that the maximum SFDR is achieved when  $M = log_{2}(4 * 2^{N})$ .

The other cost functions for a sinusoidal signal like RMS value, average value over half-cycle and variance are also evaluated. The standard deviation (SD) is a measure of how far the signal fluctuates from the mean. The variance (square of SD) represents the power of this fluctuation. The term RMS value is frequently used in electronics. By definition, the standard deviation only measures the AC portion of a signal, whereas the RMS value measures both the AC and DC components. If a signal has no DC component, its RMS value is identical to its standard deviation. Since mean value of the sinusoidal signal over a period is zero, the RMS value is equal to standard deviation. The statistics related to sinusoidal signal are tabulated is Table 4.

#### **COMPARISON WITH OTHER METHODS**

A comparative study of various methods (relation between system clock frequency  $f_s$ , frequency control word FCW and the output clock frequency  $f_o$ ) is given below, from which one can deduce what is the maximum possible output frequency given the master clock frequency.

LUT based NCO – output frequency is (where, W is FCW bit-length),

$$f_o = \frac{f_s * FCW}{2^W} \tag{6}$$

Proposed method –  $(FCW * 4 * 2^M)$  clock cycles to produce one cycle of sinusoidal signal with,

$$f_o = \frac{f_s}{FCW * 4 * 2^M}$$
(7)

Taylor series method [3]– FCW clock cycles to produce a single sinusoidal signal with,

$$f_o = \frac{f_s}{FCW} \tag{8}$$

The Simulink model is simulated for 7-bit counter2, LUT-word length of 5-bits and 32 samples per quadrant so as to get SFDR approximate to the values obtained in previous methods. The objective here is to show that hardware requirement is less compared to other methods. Table 5 shows comparison of hardware requirement of various methods to achieve SFDR around 45dbc.

| Table 4: Statistical | data | of the | output | sine | wave | of | unit | amp | olitude |
|----------------------|------|--------|--------|------|------|----|------|-----|---------|
|----------------------|------|--------|--------|------|------|----|------|-----|---------|

|                    | RMS value   | Average value | SD     | Variance |  |
|--------------------|-------------|---------------|--------|----------|--|
| Ideal              | 0.7071      | 0.6366        | 0.7071 | 0.4999   |  |
| Proposed<br>method | 0.6966      | 0.6270        | 0.6966 | 0.4770   |  |
| <b>T</b>           | II E EDGA ' | 1             | • • •  | 43       |  |

Table 5: FPGA implementation synthesis report [4]

| Approach                          | Slice register | Slice<br>LUTs | SFDR<br>(dbc) |  |
|-----------------------------------|----------------|---------------|---------------|--|
| Standard NCO                      | 31             | 60            | 45.11         |  |
| NCO with phase dithering          | 40             | 61            | 48.98         |  |
| NCO with Taylor series correction | 30             | 116           | 46.99         |  |
| Proposed method                   | 24             | 44            | 42            |  |



Fig 9: Modified Block Diagram of the NCO

# FUTURE SCOPE

In the proposed method, the frequency resolution is not constant but a method to keep it constant is proposed below for further implementation. The modified block diagram is shown in Fig. 9. The resolution and the output in this case is similar to that in case of conventional LUT-based method i.e.,

$$Resolution = \frac{f_s}{2^W}$$
(9)

$$f_o = \frac{f_s * FCW}{2^W} \tag{10}$$

Where, W - FCW dimension (in bits)  $f_s - Master clock frequency$ 

Also, considering the limitation on maximum output frequency in the proposed method with the number of samples remaining the same, future work can be towards a method combination of conventional LUT-based method and the proposed method.

- [1] S. Kadam, D. Susidaran, A. Amjud, L. Johnson and M. Soderstrand, "Comparison of Various Numerically Controlled Oscillators," in *Circuits and Systems, MWSCAS - 2002*, 2002, pp.200-202.
- [2] A. Devices, "A Technical Tutorial on Digital Signal Synthesis," 1999.
  [Online]. Available: http://www.analog.com/static/imported-files/tutorials/450968421DDS\_Tutorial\_rev12-2-99.pdf.
- [3] M. Kampik and G. Popek, "Low-Spur Numerically Controlled Oscillator Using Taylor Series Approximation," in XI International PhD Workshop OWD 2009, 17-20 October 2009.
- [4] G. Wang, "An FPGA-based Spur-reduced Numerically Controlled Oscillator," in *International Conference on System Science and Engineering*, Dalian, China, June 30-July 2, 2012, pp.187-192.