I Can’t Heeeaaaar Yoouuuuuu
The Processing of Sounds in Noisy Environments
You are in a busy coffee shop on a Monday morning as your friend across the table is attempting to talk to you. You see their lips moving, but any perceived meaning is drowned out by a barista yelling about a jalapeno cheddar bagel for some other patron. Gracie Abrams blasts from speakers (that always seem too loud?). Before you’ve even had time to process what they’ve said, you automatically respond:
“Huh?”
Nailed it.
If you are anything like me, you are familiar with this exact situation, and have wondered to yourself: “Why is hearing so messy?”. However, despite this satirical example, the brain is in actuality quite capable of making sense of speech amidst a sea of noise. We can listen to our professors in lecture halls filled with the sounds of typing and coughing, pick our names out of a noisy crowd, and follow along with conversations even at the loudest dinner table. In fact, we often take for granted the brain’s remarkable ability to understand words in noisy environments.
The function and mechanisms of this process of complex hearing is relentlessly studied by teams of auditory neuroscientists all over the world. In this endeavor, they employ precise tools that allow us to probe into the human brain to uncover how a series of firing neurons can give rise to our complex perceptual reality. How these firing neurons allow us to hear in noisy, challenging environments is central to an investigation by Dr. Sonia Yasmin conducted during her time at Western University.
Dr. Yasmin first performed an electroencephalography (EEG) study, where small electrodes capable of detecting the electrical activity of firing neurons were placed on the scalp of participants, who then listened to spoken-word stories from a popular podcast. Concurrently, “babble noise” - jumbled audio from many conversations - played at volumes ranging from barely audible to incredibly loud. The EEG allowed the researchers to probe the brain responses of participants during listening. In the second experiment, instead of EEG participants were asked to type out the last sentence that they heard word-for-word at random intervals, to evaluate whether they were accurately hearing the stories. The percentage of words/phrases that were correctly typed and reported gave the researchers their estimate of “speech intelligibility”.
In this experiment, the EEG data was used to generate what we call Temporal Response Functions (TRFs), mathematical models that describe how the brain responds to a stimulus. When you stare into a light too long and close your eyes, you’re left with a brief afterimage that you can continue to “see”. TRFs are like afterimages within the brain that unfold over hundreds of milliseconds and can be analyzed to give us more information as to the activity occurring in the brain as it processes sound. The “bigger” a TRF is (higher amplitude), the more the brain is reacting to the sound, and the longer the TRF is (longer latency), the slower the brain is in processing it.
The researchers generated two TRFs for each noise condition. Acoustic TRFs show us how the brain responds to changes in sound (e.g. speech volume and rhythm). Semantic TRFs on the other hand show us how the brain is reacting to changes in meaning, such as when there is an unexpected word in a sentence. For example, a semantic TRF might have a small signal in hearing “the President of the United States”, but would have a LARGE signal (i.e. longer “afterimage”) if I were to mishear you as saying “the Princess of the United States”, which would be unexpected. The analyses of the researchers combined the acoustic and semantic TRFs from the first experiment with the speech intelligibility estimates from the second experiment, seeking to compare them across the different levels of background noise and evaluate the processes by which the brain processes real speech in messy, noisy environments.
Figure 1: Amplitudes and Latencies of Reported TRFs from Yasmin et al. 2023
Here we see the amplitude (left) and latency (right) of the semantic (blue) and acoustic (red) TRFs generated from the EEG data of the first experiment, compared to the speech intelligibility estimates (yellow) at various levels of “noise”. The first thing that you might notice is that speech intelligibility - that is, participants’ ability to hear and repeat the spoken words from the audio stories - remains remarkably stable through most noise conditions. It isn’t until the loudest condition that there was trouble in reporting. These results highlight the remarkable ability of the brain to perceive and understand speech even in the loudest environments. But the question remains - how?
The researcher’s TRF analyses might shine some light on that answer. First, the acoustic TRF amplitude score has a U-shaped pattern, peaking at moderate levels of noise intensity. This likely reflects an increase in listening effort, and more cognitive resources dedicated to hearing at these moderate levels. At clear levels, this effort is unnecessary, and in the noisiest environments, the brain runs out of resources and “gives up”, leading to a steep decrease in our ability to understand speech. The semantic TRF is stable throughout except at the highest noise level, indicating that the brain is constantly trying to build meaning from context. Even in moderate noise when we can only catch say, 80% of spoken words, our brains are still able to reconstruct the missing 20% by filling in the gaps with context clues. Lastly, the changes in latency reflect the fact that when listening is harder, the brain takes more time to process the sound to make sure it understands – longer afterimages to make sure our listening brain “gets the picture”.
All this is to say - no matter how many baristas are yelling, or how loud Gracie Abrams might be blaring over that speaker, know that your brain is working overtime to hear and to fill in gaps to make sure you are still able to make sense of what your friend is trying to tell you. Even when your first instinct might be to blurt out “Huh?”
Original Article: Yasmin, S., Irsik, V. C., Johnsrude, I. S., & Herrmann, B. (2023). The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia, 186, 108584. https://doi.org/10.1016/j.neuropsychologia.2023.108584