Microphone Array
To increase a microphone’s directionality and Signal to Noise Ratio (SNR), we can add another or more microphones to form a microphone array. An array is used only in higher end products as the design complexity and cost is relatively high. The benefit of increased directionality is to be able to focus only on one direction where the sound source is. By doing this, we eliminate or attenuate any noise from other directions. On the other hand, the benefit of increased SNR is to be able to get cleaner signal. The cleaner the signal, the easier the post signal processing algorithm can utilize it. Those algorithms are typically used in voice recognition and other Human Machine Interface (HMI) applications.
Certainly, we can increase a single microphone’s directionality and SNR, without the help of arrays. However, it’s not easy. A single component’s directionality is achieved by opening holes on the back and side of the microphone component. This configuration lets sounds coming from many directions interfere with and cancel out each other. The sound from the desired direction is less disturbed and thus kept intact. The downside is that these kinds of microphones are usually larger in size and hard to implement in any mechanical design flexibly. In such design, adequate-sized and quantities of venting holes have to be perfectly situated at the right places, otherwise the design will likely fail. On the other and, a microphone’s SNR could be improved, albeit limited to a maximum of 65~70dB. Above this threshold, engineering costs will rise rapidly because it’s difficult to handle thermal noise, acoustic noise and electrical noise.
Thus, directionality and SNR are best attained by using arrays. The focus then is on the placement of each microphones including the distance between each other. Also, array software design and the integration of hardware and software are important. A successful implementation of array can create beamforming to enhance directionality, and can raise SNR by some noise reduction mechanism.
In principle, when sound travels from a certain direction, all microphones in the array will more or less capture some signal. The key is the difference between all those signals. Each microphone is distanced differently from the source, so that each receives signals of different magnitude and phase. Based on these information, the algorithm can now calculate and find out where the sound comes from. As soon as the source is located, the algorithm can reduce sound reception from other directions. This mechanism requires each microphone to have exact same parameters, including frequency response, phase response, group delay and so on. If parameters aren’t uniform, for example if sensitivity varies a lot, then the measured signal might not represent the true signal. If so, the algorithm might misinterpret the signal magnitude and miscalculate the distance between the microphone and the source. On the other hand, each microphone’s delay has to be minimal and also identical, to facilitate signal processing.
Usually two pcs of microphones are sufficient under budget and mechanical constraints. More than three pcs are used in mid-to-high end products like earphones, laptops and smart speakers. If there’re only two microphones, then the distance between the two is the most critical. If there’s more than three, then the array geometry is also important. To maintain signal integrity, sometimes digital microphones (DMIC) are necessary, instead of more traditional analogue microphones (AMIC). For example, microphones used in laptops are usually located on the upper and middle rim of the screen. Traveling from the tip to the bottom of the screen, the signal is interfered by the switching noise of the screen. The larger the screen, the longer the travel path, thus the more the noise. In this scenario, DMIC is typically preferred as digital PDM signals are much more immune to analogue noise.
If we know in advance where the sound source comes from, then we don’t have to use too many microphones. In this case, the simplest and most convenient configuration is end-firing, one in front (closer to the user) and the other in the back. The sound reaches the back microphone with a slight delay, and by that the algorithm calculates the exact location of sound source. End-firing is commonly used in intercoms and kiosks with voice interaction, where the user is apparently right in front the products. Another application is automotive use, where the driver and passenger locations are already pre-determined. Leveling up from one microphone to an end-firing setting may increase SNR by up to 2dB.
In some scenarios where the mechanical enclosure is very narrow and thin, there’s no space for end-firing setting. In this case, we are forced to choose broadside-firing, where the microphones align as a straight line in front of the user. At most times the distance between each microphones to the sound source is very similar, lowering the algorithm’s efficacy. This issue is common in panel products, such as panel TV on the wall. A possible solution is to use more than two microphones, for example four, then in this way the SNR may be increased by 4~7dB.
When it comes to smart speakers, as it needs to receive sound coming from 360 degrees around, the usual configuration is circular. The array may be on top of the speaker, far away from the speaker as possible to prevent interference. The circular array should be at least 40mm, which might result in 2~5dB increase in SNR.
When it comes to smart soundbar, it usually receives sound from only 180 degree direction. It’s because soundbars in most times are sitting against the wall, leaving only one side to the sound source. In these kinds of scenarios, arrays are placed on the top and middle of the soundbar, diameter ideally at least 60mm. This can contribute to SNR increase of 5 to 7dB.
When it comes to smart TV, it usually receives sound from only 180 degree direction. It’s because TVs in most times are sitting against the wall, leaving only one side to the sound source. In these kinds of scenarios, arrays are placed on the top and middle of the soundbar, diameter ideally at least 70mm. This can contribute to SNR increase of 2 to 4dB.
When it comes to tablet on wall, it usually receives sound from only 180 degree direction. It’s because TVs in most times are sitting against the wall, leaving only one side to the sound source. In these kinds of scenarios, arrays are placed on the top and middle of the soundbar, diameter ideally at least 70mm. This can contribute to SNR increase of 2 to 4dB.
Please note that we can also measure frequency response, SNR and directionality of microphones or arrays.