When listening to music, the listener might desire to hear the vocals which are getting "drowned-out" by a strong bass section. This can be accomplished by respectively attenuating the low-frequency bass section while amplifying the higher-frequency vocal section. This process is known as audio equalizing. This paper will show how to implement a digital audio equalizer. The equalizer will be implemented using a linear-phase finite impulse response (FIR) filter and a Hamming window to shape the frequency response of a piece of music.
Reverberation, or reflection of sound occurs when sound waves are bounced back from surfaces which don't fully absorb the audio energy. The time in which a sound decays in a room to -60dB of its original amplitude, is known as the reverberation time (RT). The spacing between the initial sound and its reflections is known as the reverberation density (RD). These two parameters can be used to adequately simulate the reverberation of various rooms. We will create an artificial reverberation unit, which combines the direct output of the input signal with a FIR filter, which generates "early echos", and a delayed (infinite impulse response (IIR) filter, as described by Stevenson.
150Hz | 300Hz | 600Hz | 1.2kHz | 2.4kHz | 4.8kHz | 9.2kHz | 18.4kHz | equalized audio | analysis plot |
---|---|---|---|---|---|---|---|---|---|
0dB | 0dB | 0dB | 0dB | 0dB | 0dB | 0dB | 0dB | audio1.au | Figure 1 |
+10dB | -10dB | +10dB | -10dB | +10dB | -10dB | +10dB | -10dB | audio2.au | Figure 2 |
+10dB | +10dB | +10dB | +10dB | +10dB | -10dB | -10dB | -10dB | audio3.au | Figure 3 |
+10dB | +10dB | +10dB | -10dB | -10dB | -10dB | -10dB | -10dB | audio4.au | Figure 4 |
-10dB | -10dB | -10dB | -10dB | -10dB | -10dB | -10dB | +10dB | audio5.au | Figure 5 |
-10dB | -10dB | -10dB | +10dB | -10dB | -10dB | -10dB | -10dB | audio6.au | Figure 6 |
RD (echos/sec) | RT (sec) | reverberation audio | analysis plot |
---|---|---|---|
10 | 0.42 | male1.au | Figure 7 |
30 | 0.42 | male2.au | Figure 8 |
100 | 0.42 | male3.au | Figure 9 |
330 | 0.42 | male4.au | Figure 10 |
19 | 0.45 | male5.au | Figure 11 |
19 | 1.0 | male6.au | Figure 12 |
19 | 1.92 | male7.au | Figure 13 |
19 | 5.0 | male8.au | Figure 14 |
19 | 1.92 | weakest_link1.au | Figure 15 |
19 | 5.0 | guitar1.au | none |
100 | 5.0 | guitar2.au | Figure 16 |
325 | 5.0 | guitar3.au | none |
1000 | 5.0 | guitar4.au | none |
Overall we see that the equalizer works exceptionally well. It still was able to get 5 dB of separation between the first, second, and third bands (as Figure 2 shows). A rectangular window would have done better at the lower frequency; however, the resulting ripples were unacceptable. Also, we could increase the length of the filter to get smaller transition bandwidths, and more closely approximate the desired low frequency filter shape. The hamming window provided a nice compromise for a slight increase in transition bandwidth, and substantial ripple reduction. The baseline figure, Figure 1, as well as the others will indicate a zero at the Nyquist frequency Fs/2. This is due to using an even symmetric linear minimum phase IIR filter realization instead of using an odd length (which would have had no zeros at either 0 or Fs/2 Hz). The spectrograms in Figure 2 clearly indicate the effectiveness of the filter showing distinct bands the width of the desired cutoff frequency bands set at -10 dB. Figures 3 and 4 show the effects of using a low-pass filter. The first low-pass, as indicated in figure 3, allows a listener to hear the xylophones, horns and the tubas; however, filtering out frequencies above 800 Hz, see figure 4, the sound of the tubas is most distinct. Applying a high-frequency high pass filter, as indicated in figure 6, actually has created a fairly equalized response of the music, note how all the regions, up to 15 kHz, in the spectrogram are red. Note that all frequency content greater than 15 kHz is absent in this original piece of music, this must have been lost in the mp3 encoding of the original piece. It also shows how strong the low frequency components in this recording are. Finally, figure 6 illustrates a band-pass filter at the 1.2 kHz center frequency in which the horn section is more distinct.
The AR filter worked as specified. One slight modification to the original signal before processing; however. Since the IIR filter will have a coefficient quite close to 1.0 for large reverberation times, any dc offsets in the signal will cause the signal to have a large drift. This was corrected by subtracting out the DC offset before processing. Other low frequency components in some recordings caused noticeable drift (see figure 14 at 1.3 seconds). The most effective pre-processing to do to the audio would be to apply a high-pass filter with a 5 to (Fs/2) Hz passband. You could do this by modifying the equalizer code to implement the desired filter. With the short RT of 0.42 seconds, Figures 7-11, give us a good sense of how the RD scales in the first .160 seconds of the AR response. We can hear, and see how increasing the RD can make the harmonics of a persons voice sound more "artificial". With an RD of 30 the male voice sounds like a voice reflecting off a wall in a gymnasium. The RD of 100 sounds like a voice reflecting off the walls in a tunnel. With an RD greater than 300 it's not clear what room we could get that effect in! Increasing RT gave more energy to the response. Concluding, that an RD around 19 gave a suitable voice enhancement. We simulated various "rooms" with an RD of 19. The living room, see figure 12, had an RD of 19 and an RT of 1.0. The best concert hall, see figure 13, had an RD of 19 and an RT of 1.92. The cathedral, see figure 14, had an RD of 19 and an RT of 5.0. My biased opinion was that the male voice sounded best in the cathedral. We also got a fairly improved response, processing the weakest link audio clip with a RD of 19 and RT of 1.92, see figure 15. Finally, varying the RD for the guitar sound clip, resulted in some surprising sound effect enhancements.