Ever heard of “multiband dynamics processing” (MBDP)? It’s a technique that modifies the volume of an audio signal across frequency bands, often with the effect of improving echo cancellation. But it’s not without its drawbacks. Namely, it doesn’t always cleanly separate signals into their component frequencies and it tends to use fixed bands, which in practice impact loudness and bass response and cause unpleasant distortions.
Fortunately, scientists at Amazon’s Hardware Technology and Architecture group have made headway in addressing MBDP schemes’ technical limitations. In a newly published paper (“Reconfigurable Multitask Audio Dynamics Processing Scheme”) scheduled to be presented at the International Conference on Acoustics, Speech and Signal Processing later this year, they detail a novel, compact model design that not only enhances loudness and bass, but significantly improves performance on speech recognition tasks.
It began shipping in Alexa-enabled devices in 2017, they say.
As senior research scientist Jun Yang explains, MBDP has two primary functions: compression, or keeping the ratio of an audio signal’s maximum and minimum volumes within a certain range, and peak limiting, or cutting off volume spikes that can cause distortion (a phenomenon known as “brownout”). Amazon’s system features a configurable design comprising several filters, which can be applied simultaneously or individually to an incoming signal.
Said signal is first split into two parts. One passes to two sequential high-pass filters that filter out frequencies below a cutoff, and the other moves through a pair of sequential low-pass filters that filter out frequencies above the same cutoff. The signal from the high-pass filter might be split and passed to separate filters an arbitrary number of times before moving through an “all-pass” filter that synchronizes all the bands. Then, the signal in each band passes to a compressor and then a limiter, at which point the frequency-specific signals are recombined and passed to full-band limiter.
These and other techniques reduce clipping, which occurs when the electrical signal becomes distorted as a result of the amplifier outputting a voltage beyond a safe range, and total harmonic distortion while preserving the overall loudness and bass response of the audio signal. Additionally, it ensures that the loudspeaker producing the audio stays in its “linear dynamic range” — in other words, that the sound pressure level doesn’t exceed the threshold at which it will begin to cause distortion.
In experiments, Jun Yang and colleagues found that people reported audio filtered with the reconfigurable MBDP scheme to be much “better” and “louder” than samples processed using traditional MBDP schemes. Moreover, analyses showed that the system increased bass response by about five decibels, and when tested on an Echo speaker, it significantly reduced the number of false rejects (instances in which the Echo failed to recognize the wake word) at higher output volumes.