TToW #4 : Auditory Scene Analysis

Timbre Term of the Week #4 :
Auditory Scene Analysis


This is a spectrograph, a way of visualizing sound in which the y axis represents frequency, the x axis represents time, and darkness or colour represents concentration of energy. Looking from bottom to top shows how the sound energy is distributed on the continuum from low to high, and looking from left to right shows how that distribution changes over time.

This particular spectrograph represents a 15-second excerpt from ACTOR partner Robert Normandeau’s large-scale, multimovement electroacoustic composition Clair de Terre (1999). The movement from which this excerpt is taken is called “Micro-montage” and the music lives up to its title: many brief sound events are juxtaposed or superimposed in a short amount of time. Listen to this clip and you will hear a large, almost overwhelming number of sound events from many different sources, deployed so rapidly that it can be hard to keep track of them all.

Which raises the interesting question: how do we keep track of them all? How does the auditory system make sense of the amazingly complex and ever-changing air vibrations that reach the ear? In this excerpt, I can hear crashing chords, whistling wind, chirping birds, a revving motorcycle, and many more sound sources that I vaguely recognize but am hard-pressed to name. Although this particular panoply occurs in the context of a piece of electroacoustic music, the experience of being bombarded with many different sounds is familiar from what the American psychologist William James called 'the blooming, buzzing confusion' of everyday life. How does the auditory system separate all of these sources into discrete perceptual units?

The process of parsing the incoming sound signal into a meaningful representation of the environment is called auditory scene analysis. A complete explanation of auditory scene analysis is beyond the scope of any blog post—the book in which Albert Bregman gave the idea its fullest exposition is nearly 800 pages long!—but we can introduce some basic concepts here.

As the spectrograph above makes clear, we often process many incoming frequencies at the same time, and the auditory system must decide which ones go together (integration) and which ones should be separated (segregation). For example, in the noisy scene of a city street at any given time, some of the sound components reaching your ears may belong to a motorcycle driving by, others to ambient traffic noise, and still others to voices of people on the sidewalk next to you: your auditory system deciphers which is which. Additionally, the auditory system must group incoming sound components into units that are delimited in time (segmentation), for example musical notes, and decide which ones to group together into extended sequences such as melodies. This is called auditory streaming.

Complicated though it may be, there are fortunately relatively few principles that guide the auditory system through this task. They are:

·      Harmonicity: Frequencies related by simple integer ratios tend to group together. For example, if the auditory scene contains frequencies at 110 Hz, 220 Hz, and 330 Hz (n, 2n, 3n), the auditory system will tend to fuse them together into a single complex sound, whereas frequencies at 110 Hz, 201 Hz, and 350 Hz, which are not related by simple ratios, are less likely to fuse. For a more detailed explanation, see “Timbre Term 1: Partial.”

·      Onset synchrony: Sound components that begin within a very short time window of about 30 milliseconds tend to group together.

·      Frequency comodulation: Sound components that get higher or lower in parallel tend to group together.

·      Amplitude comodulation: Sound components that get louder or softer in parallel tend to group together.

·      Source location: Sound components that originate from the same physical location in space tend to group together.

For the most part, we are unaware that this process is happening, and take it for granted. But before the auditory scene makes it into your conscious awareness, an amazing feat of pre-attentive analysis has already converted the dizzying complexity of air vibrations around you into a coherent picture of the world.