Dual-Tone Multiple Frequency Detection and Estimation

Introduction

Overview

When making a touch-tone phone call in the United States, your phone will generate a dual-tone corresponding to a given key-pad press. The corresponding key-pad to dual-tone pairs consist of a four by four grid representing 16 possible symbols. The corresponding row of low-frequencies, lf, is 687, 770, 852, and 941 Hz. The corresponding column of high-frequencies, hf, is 1209, 1336, 1477, and 1633 Hz. The key 1, on a touch-tone phone will consists of the lf-hf pair [lf(1),hf(1)] or [687 Hz, 1209 Hz]. A more complete description on the DTMF standard is given by Stevenson. You can also refer to Wikipedia for additional information. In this paper, we will focus on detecting dual-tones with a minimum tone duration, td, of 40 milliseconds, and a minimum tone-pause duration, tp, of 40 milliseconds. There is no additional synchronization constraint on maintaining a fixed spacing or duration of the tones. We will evaluate the performance on detecting the tones by measuring the number of correct tones detected under various signal-to-noise ratios (SNRs). As an additional test we will add speech in to the signal in order to simulate digits and cause a false detection.

In the first part of this paper, we will show how to generate dual-tones using a marginally stable IIR filter. A brief theoretical derivation will be given for generating the dtmf-tones. A more detailed discussion will be given on the detection algorithm with supporting theoretical references for the reader. Next we will present results showing that our DTMF detector exhibits excellent voice rejection performance while correctly detecting the dual-tones in signal to noise ratio levels as low as -3 dB. Our results will show we can detect tones with SNR's as low as -10 dB; however, not with 100% reliability. A discussion of the results will proceed, with concluding remarks. A detailed program listing is included with all available source code for further reference.

Derivation of DTMF Generator

see derivation.pdf

Derivation of DTMF Detector

see derivation.pdf

Presentation of Results

Baseline Results

In order to evaluate our DTMF detector, we used our keypad generator, to test various dual-tone combinations. What follows is a brief summary of testing in order to focus our discussion on the unknown-dual-tones (UDTs) we are required to identify for this project. Near 100% detection was obtained for asynchronous tones patterns with SNRs nearing 0 dB. All sample tones were generated in which each tones frequency would vary as a uniform random variable +/- 1.5% of center frequency. The UDTs we were given, appear to have a tone duration of 80 milliseconds and a pause duration of 120 milliseconds. The following results are for a randomly generated tone with a td of 80 milliseconds, and a tp of 120 milliseconds. The following table lists the test-tone generated, the SNR, the additional voice corruption, the tones detected, and percent correct. A tone missed will be filled in with a X in the tones detected column. A link to corresponding filtered PSD plots, and final detection plots are included in the table as well.
Test TonesSNR (dB)Voice Corruption Tones DetectedPercent Correct PSD PlotFinal Detection Plot
115599##00**863 Inf male_voice.au 115599##00**863100 Figure 1 Figure 1a
115599##00**863 -0.1 none 115599##00**863100 Figure 3 Figure 3a
115599##00**863 -0.1 male_voice.au 115599##00**863100 Figure 2 Figure 2a
115599##00**863 -3.1 none 115599##00**863100 Figure 4 Figure 4a
115599##00**863 -3.1 male_voice.au 115599##00**863100 Figure 5 Figure 5a
115599##00**863 -3.1 female_voice.au 115599##00**863100 Figure 6 Figure 6a
115599##00**863 -4.9 male_voice.au 1X55X9##00**86386 Figure 7 Figure 7a
115599##00**863 -10.0 male_voice.au X1X59XX#X0**X6X53 Figure 8 Figure 8a

Unknown Dual-Tone (UDT) Results

The following table summarizes our UDT detection results.
Unknown Tones FileSNR (dB)Voice Corruption Tones Detected PSD PlotFinal Detection Plot
1.au Inf none 2264810 Figure 9-1 Figure 9-1a
2.au Inf yes 12196318308 Figure 9-2 Figure 9-2a
3.au 20 none 12196315480 Figure 9-3 Figure 9-3a
4.au 20 yes 2342591 Figure 9-4 Figure 9-4a
5.au 10 none 8621*# Figure 9-5 Figure 9-5a
6.au 10 yes 05448757 Figure 9-6 Figure 9-6a
7.au 0 none 05551212 Figure 9-7 Figure 9-7a
8.au 0 yes 8318314 Figure 9-8 Figure 9-8a
9.au -10 none no-tones Figure 9-9 Figure 9-9a
10.au -10 yes 8318314 Figure 9-10 Figure 9-10a

Discussion of Results

Baseline Results Discussion

Overall we see that the DTMF detector works exceptionally well. This performance can be attributed to the following sections of the DTMF detector. First, filtering the data, in order to allow only the critical tones through allows a classical PSD estimation method to be implemented. Using the Welch method for PSD estimation allows us to further reduce the variance in our Spectral estimation and smooth the remaining noise. Furthermore, using a hamming window in the Welch method allows us to set an efficient threshold level in order to separate tones on the time axis. Lastly using a maximum-likelihood technique to separate equally likely tones from more likely noise is accomplished by adjusting a threshold level for reliable tone detection and noise rejection. Finally, taking modest 4 millisecond overlapping steps (32 samples) of a 32 millisecond wide window (256 samples) gave us an efficient step size and frequency resolution to separate tones 40 milliseconds wide with 40 millisecond spacing.

We reliably detected all tones for SNR's as low as -3.1 dB. We saw a graceful degradation in performance as the SNR was decreased to -10 dB. In which the detector was still able to correctly detect 53% of the tones, while making no incorrect detections. Performance was essentially the same, regardless of voice corruption for our base-line tests down to SNR levels as low as -10 dB. We did notice that at low SNR levels around -10 dB, a strong voice component during a valid tone could cause a false-tone translation. We did notice that using the differential technique, in which we subtracted the noise power of the second tone harmonics, we could eliminate these false detections; however, at a loss of detection range, for asynchronous randomly generated tones. For the UDT results, the tones detected were the same with or without differential detection. Keeping a modest threshold detection level of 3 dB, allowed us to eliminate the need for additional filtering of the harmonic bands, while allowing reliable performance.

Unknown Dual-Tone Results Discussion

Due to the strong baseline performance results, we have a high confidence in files 1.au through 8.au in being properly decoded. These tones also matched our initial visual estimate, based on the PSD plots. Likewise, the 10.au result matches 8.au which was also a sample test file given without noise corruption. The result for 9.au makes sense, since tones can't be heard on the file nor seen on the PSD plot, Figure 9-9. Comparing Figure 9-10 to 9-9, clearly, no dial-tone is present. It also looks like the random "dark-spots" in the filter bands match the pattern generated by the background laughing and talking in Figure 9-4. It seems quite likely that our detector successfully detected nothing in 9.au. To further our confidence we successfully detected nothing, when opening the voice corruption files male_voice.au and female_voice.au. The corresponding PSD's, fig2-male_voice.png, fig2-female_voice.png, illustrate how they get energy into the filter bands; however, without exceeding the threshold time width in both a corresponding low and high frequency band for incorrect detection, fig4-male_voice.png, fig4-female_voice.png.

General Discussion

In the initial, paper we had suggested ways to improve performance by optimizing the FFT element in a separate C-program. Although, we initially had calculated the individual frequency bands power levels using a filter bank, we were missing the big picture. Furthermore when we wanted to start looking at the harmonics for voice detection, it became clear that an efficient FFT was worth completing. Profound performance improvements were made, such that a 4096 point FFT can be completed in 1 millisecond. The first improvement was taking the highly Matlab inefficient looping code and placing it in a Matlab executable. The second performance improvement was eliminating any repeated file input-output calls to the dsp_fft routine. Since most these computers run on the network afs space file i/o operations are time-consuming. Lastly, pre-calculating the appropriate shuffle index for the FFT eliminates any other time lags in execution. The remaining discussion is related to problems developing the mex file (feel free to skip), which we believe is a problem IT can address (but we haven't had a chance to address).

The problem, is that the gnu-runtime-library for the Linux-cluster PC's, ironically, will crash the executable if an assert is called from the mexErrMsg command. There also seems some memory management issues as well. No core dumps occurred using the DTMF generator on the Linux machines. However, when the spec_plot command was called, which uses our mex-runtime-code, matlab would occasionaly assert a stack violation and crash. No crashes have occured on the Sun workstations when using this mex-file. See The Mathworks for more details. Regardless, we got the mex code up-and running reliably on the Sun work-stations, and it runs great on the Linux PC's as long as you don't call the mex function incorrectly (which you shouldn't have to since it is filtered by the dsp_fft.m function). Also if you want to load a new version of an already existing mex executable into Matlab, shut Matlab down first and save yourself the inevitable core-dump/crash of Matlab.

Although, the program is essentially running real-time. It can run much quicker, as can be seen by the monumental improvement with the FFT code. The big time lag is in the inefficient filter implementation. It is a big loop which could be better allocated to a mex file. Also, double-precision is completely unnecessary and could be eliminated. I'm sure other refinements could be made and implemented as well. Such as simplifying it to a multi-stage filter-bank design problem. Although, we cautiously chose to use a linear-phase bandpass filter in order to avoid a non-uniform group delay, and causing tone mis-alignment on the time axis. There are modest filtering schemes which use low-order IIR filtering schemes such as those discussed by Popovic for example.

Conclusion

  • This paper has successfully demonstrated a novel way to reliably detect-tones without relying on information from voice harmonics.
  • Using sufficient filtering, an efficient power-spectral-density, PSD, estimation method combined with a maximum-likelihood threshold detector can successfully detect tones with SNRs as poor as -3 dB for DTMF detection.
  • For this project, we have most likely detected all appropriate tones and rejected all noise and voice disturbances generated.
  • One efficient PSD estimator for DTMF detection, is the classical Welch estimator with the use of a Hanning window to reduce the effect of data on the edges and determine tone spacing.
  • When using networked computers for running Matlab, try to minimize file i/o.

    Program Listing

    All necessary files can be extracted from the following file. dsp_proj2_src.tar.gz

    DTMF Generator Code

    DTMF Detector Code

    Analysis Code


    © 2004 by Nicholas Kottenstette