Intelligibility in Video Conferencing Systems

by Daniel Fattorini, MInstSCE

The concepts and science of speech intelligibility have long been established in the fields of Public Address and Voice Alarm. While this is a complex science, it has provided a useful benchmark to set standards to adhere to and with which to judge system performance. With the current large uptake of Unified Communications and Video Conferencing, the concept of intelligibility has become increasingly relevant to a new field.

As someone who has worked in the field of video communication since it’s infancy, I’ve seen the audio-visual industry focus on the visible. It’s an easier sell for integrators to push for a bigger screen or a sharper camera. Yet audio is the unsung hero of such communication; a face-to-face conversation is all but useless without clear, intelligible speech. Recently, advanced microphone array systems and digital signal processors have made improvements, yet in many ways these have just muddied the waters. Increasingly the best technology is used to overcome negative factors such as incorrect microphone placement or poor room acoustics rather than enhancing a system from good to even better.


Such a picture has made it more important than ever to be able to define and measure intelligibility within video conferencing systems. In general, the current picture is one of subjective handover to a client whereby approval is based upon their perceived experience. While a happy client should always be the goal, having objective criteria to work towards would be hugely beneficial. It would allow engineers to work towards a standard rather than just experience and could allow those with the highest standards in the industry to promote quality and excellence.

Looking at how this might be achievable starts with a consideration of the factors involved. The net output of such a system is a result of the microphones, their placement, background noise, reverberation, signal path, processing and conversion. On top of these factors, the quality of compression in the point-to-point network path to the far site inevitably has an impact. Then all the aforementioned factors come into play when reproduced remotely.

How then do we go about measuring this?

Using STIPA testing in PA/VA as a basis to start considering a measurement, the huge difference comes in the fact that video communications are two-way. Even if you can work from a signal in one room to a microphone in another, how would you isolate this measurement to prove the performance of one room? To answer these questions, right now I simply have ideas and concepts. There’s the possibility of measuring the incoming and outgoing characteristics of one room only. Or even the idea of sharing a common test destination in the form of either an “ideal room” or even a system in an anechoic chamber. There is however, a definite need to push towards a standard measurement in this field. Doing so would require the work of a team, not just one person. This is therefore something of a call to arms for those keen to pursue such a standard. It would inevitably benefit those in our industry who work to the highest quality and put an end to some of the subjectivity that currently reigns. 

Daniel Fattorini,  MInstSCE is the Audio Visual Services Director for TransAct Technology Solutions.

With the current large uptake of Unified Communications and Video Conferencing, the concept of intelligibility has become increasingly relevant to a new field.