Sound localization

From Wikipedia, the free encyclopedia

Sound localization refers to a listener's ability to identify the location or origin of a detected sound in direction and distance. It may also refer to the methods in acoustical engineering to simulate the placement of an auditory cue in a virtual 3D space (see binaural recording).

The sound localization mechanisms of the human auditory system have been extensively studied. The human auditory system uses several cues for sound source localization, including time- and level-differences between both ears, spectral information, timing analysis, correlation analysis, and pattern matching.

These cues are also used by animals, but there may be differences in usage, and there are also localization cues which are absent in the human auditory system, such as the effects of ear movements.

 Sound localization by the human auditory system

Lateral information (left, ahead, right)

For determining the lateral input direction (left, front, right) the auditory system analyzes the following ear signal information:

For frequencies below 800 Hz, mainly interaural time differences are evaluated (phase delays), for frequencies above 1600 Hz mainly interaural level differences are evaluated. Between 800 Hz and 1600 Hz there is a transition zone, where both mechanisms play a role

Evaluation for low frequencies

For frequencies below 800 Hz the dimensions of the head (ear distance 21.5 cm, corresponding to an interaural time delay of 625 µs), are smaller than the half wavelength of the sound waves. So the auditory system can determine phase delays between both ears very precisely. Interaural level difference are very low in this frequency range, so that a precise evaluation of the input direction is nearly impossible on the basis of level differences. As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound's lateral source, because the phase difference between the ears becomes too small for a directional evaluation (i.e. the phase difference is great enough that the lagging wave sensed in the offside ear coincides with the next wave which is being sensed by the nearer ear).

Evaluation for high frequencies

For frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves. An unambiguous determination of the input direction based on interaural phases is not possible at these frequencies. However, the interaural level differences become bigger, and these level differences are evaluated by the auditory system. Also, group delays between the ears can be evaluated; this is more pronounced at higher frequencies. This means, if there is a sound onset, the delay of this onset between both ears can be used to determine the input direction of the corresponding sound source. This mechanism becomes especially important in reverberant environment. After a sound onset there is a short time frame, where the direct sound reaches the ears, but not yet the reflected sound. The auditory system uses this short time frame for evaluating the sound source direction, and keeps this detected direction as long as reflections and reverberation prevent an unambiguous direction estimation.

The mechanisms described above cannot be used to differentiate between a sound source ahead of the hearer or behind the hearer; therefore additional cues have to be evaluated.

Sound localization in the median plane (front, above, back, below)

The human outer ear, i.e. the structures of the pinna and the external ear canal, form direction-selective filters. Depending on the sound input direction in the median plane, different filter resonances become active. These resonances implant direction-specific patterns into the frequency responses of the ears, which can be evaluated by the auditory system (directional bands). Together with other direction-selective reflections at the head, shoulders and torso, they form the outer ear transfer functions.

These patterns in the ear's frequency responses are highly individual, depending on the shape and size of the outer ear. If sound is presented through headphones, and has been recorded via another head with different-shaped outer ear surfaces, the directional patterns differ from the listener's own, and problems will appear when trying to evaluate directions in the median plane with these foreign ears. As a consequence, front–back permutations or inside-the-head-localization can appear when listening to dummy head recordings.

Distance of the sound source

The human auditory system has only limited possibilities to determine the distance of a sound source. In the close-up-range there are some indications for distance determination, such as extreme level differences (e.g. when whispering into one ear) or specific pinna resonances in the close-up range.

The auditory system uses these clues to estimate the distance to a sound source:

Signal processing

Sound processing of the human auditory system is performed in so-called critical bands. The hearing range is segmented into 24 critical bands, each with a width of 1 Bark or 100 Mel. For a directional analysis the signals inside the critical band are analyzed together.

The auditory system can extract the sound of a desired sound source out of interfering noise. So the auditory system can concentrate on only one speaker if other speakers are also talking (the cocktail party effect). With the help of the cocktail party effect sound from interfering directions is perceived attenuated compared to the sound from the desired direction. The auditory system can increase the signal-to-noise ratio by up to 15 dB, which means that interfering sound is perceived to be attenuated to half (or less) of its actual loudness.

Localization in enclosed rooms

In enclosed rooms not only the direct sound from a sound source is arriving at the listener's ears, but also sound which has been reflected at the walls. The auditory system analyses only the direct sound, which is arriving first, for sound localization, but not the reflected sound, which is arriving later (law of the first wave front). So sound localization remains possible even in an echoic environment.

In order to determine the time periods, where the direct sound prevails and which can be used for directional evaluation, the auditory system analyzes loudness changes in different critical bands and also the stability of the perceived direction. If there is a strong attack of the loudness in several critical bands and if the perceived direction is stable, this attack is in all probability caused by the direct sound of a sound source, which is entering newly or which is changing its signal characteristics. This short time period is used by the auditory system for directional and loudness analysis of this sound. When reflections arrive a little bit later, they do not enhance the loudness inside the critical bands in such a strong way, but the directional cues become unstable, because there is a mix of sound of several reflection directions. As a result no new directional analysis is triggered by the auditory system.

This first detected direction from the direct sound is taken as the found sound source direction, until other strong loudness attacks, combined with stable directional information, indicate that a new directional analysis is possible. (see Franssen effect)


Since most animals have also two ears, many of the effects of the human auditory system can also be found at animals. Therefore interaural time differences (interaural phase differences) and interaural level differences play a role for the hearing of many animals. But the influences on localization of these effects are dependent on head sizes, ear distances, the ear positions and the orientation of the ears.

Lateral information (left, ahead, right)

If the ears are located at the side of the head, similar lateral localization cues as for the human auditory system can be used. This means: evaluation of interaural time differences (interaural phase differences) for lower frequencies and evaluation of interaural level differences for higher frequencies. The evaluation of interaural phase differences is useful, as long as it gives unambiguous results. This is the case, as long as ear distance is smaller than half the length (maximal one wavelength) of the sound waves. For animals with a larger head than humans the evaluation range for interaural phase differences is shifted towards lower frequencies, for animals with a smaller head, this range is shifted towards higher frequencies.

The lowest frequency which can be localized depends on the ear distance. Animals with a greater ear distance can localize lower frequencies than humans can. For animals with a smaller ear distance the lowest localizable frequency is higher than for humans.

If the ears are located at the side of the head, interaural level differences appear for higher frequencies and can be evaluated for localization tasks. For animals with ears at the top of the head, no shadowing by the head will appear and therefore there will be much less interaural level differences, which could be evaluated. Many of these animals can move their ears, and these ear movements can be used as a lateral localization cue.

Sound localization in the median plane (front, above, back, below)

For many mammals there are also pronounced structures in the pinna near the entry of the ear canal. As a consequence, direction-dependent resonances can appear, which could be used as an additional localization cue, similar to the localization in the median plane in the human auditory system. There are additional localization cues which are also used by animals.

Head tilting

For sound localization in the median plane (elevation of the sound) also two detectors can be used, which are positioned at different heights. In animals, however, rough elevation information is gained simply by tilting the head, provided that the sound lasts long enough to complete the movement. This explains the innate behavior of cocking the head to one side when trying to localize a sound precisely. To get instantaneous localization in more than two dimensions from time-difference or amplitude-difference cues requires more than two detectors.

Localization with one ear (flies)

The tiny parasitic fly Ormia ochracea has become a model organism in sound localization experiments because of its unique ear. The animal is too small for the time difference of sound arriving at the two ears to be calculated in the usual way, yet it can determine the direction of sound sources with exquisite precision. The tympanic membranes of opposite ears are directly connected mechanically, allowing resolution of sub-microsecond time differences[2][3] and requiring a new neural coding strategy.[4] Ho[5] showed that the coupled-eardrum system in frogs can produce increased interaural vibration disparities when only small arrival time and sound level differences were available to the animal’s head. Efforts to build directional microphones based on the coupled-eardrum structure are underway.

Bi-coordinate sound localization in owls

Most owls are nocturnal or crepuscular birds of prey. Because they hunt at night, they must rely on non-visual senses. Experiments by Roger Payne [6] have shown that owls are sensitive to the sounds made by their prey, not the heat or the smell. In fact, the sound cues are both necessary and sufficient for localization of mice from a distant location where they are perched. For this to work, the owls must be able to accurately localize both the azimuth and the elevation of the sound source.


Owls living above ground must be able to determine the necessary angle of descent, i.e. the elevation, in addition to azimuth (horizontal angle to the sound). This bi-coordinate sound localization is accomplished through two binaural cues: the interaural time difference (ITD) and the interaural level difference (ILD), also known as the interaural intensity difference (IID). The ability in owls is unusual; in ground-bound mammals such as mice, ITD and ILD are redundant cues for azimuth.

ITD occurs whenever the distance from the source of sound to the two ears is different, resulting in differences in the arrival times of the sound at the two ears. When the sound source is directly in front of the owl, there is no ITD, i.e. the ITD is zero. In sound localization, ITDs are used as cues for location in the azimuth. ITD changes systematically with azimuth. Sounds to the right arrive first at the right ear; sounds to the left arrive first at the left ear.

In mammals there is a level difference in sounds at the two ears caused by the sound-shadowing effect of the head. But in many species of owls, level differences arise primarily for sounds that are shifted above or below the elevation of the horizontal plane. This is due to the asymmetry in placement of the ear openings in the owl's head, such that sounds from below the owl reach the left ear first and sounds from above reach the right ear first.[7] IID is a measure of the difference in the level of the sound as it reaches each ear. In many owls, IIDs for high-frequency sounds (higher than 4 or 5 kHz) are the principal cues for locating sound elevation.

Parallel processing pathways in the brain

The axons of the auditory nerve originate from the hair cells of the cochlea in the inner ear. Different sound frequencies are encoded by different fibers of the auditory nerve, arranged along the length of the auditory nerve, but codes for the timing and level of the sound are not segregated within the auditory nerve. Instead, the ITD is encoded by phase locking, i.e. firing at or near a particular phase angle of the sinusoidal stimulus sound wave, and the IID is encoded by spike rate. Both parameters are carried by each fiber of the auditory nerve.[8]

The fibers of the auditory nerve innervate both cochlear nuclei in the brainstem, the cochlear nucleus magnocellularis (mammalian anteroventral cochlear nucleus) and the cochlear nucleus angularis (see figure; mammalian posteroventral and dorsal cochlear nuclei). The neurons of the nucleus magnocellularis phase-lock, but are fairly insensitive to variations in sound pressure, while the neurons of the nucleus angularis phase-lock poorly, if at all, but are sensitive to variations in sound pressure. These two nuclei are the starting points of two separate but parallel pathways to the inferior colliculus: the pathway from nucleus magnocellularis processes ITDs, and the pathway from nucleus angularis processes IID.


Parallel processing pathways in the brain for time and level for sound localization in the owl

In the time pathway, the nucleus laminaris (mammalian medial superior olive) is the first site of binaural convergence. It is here that ITD is detected and encoded using neuronal delay lines and coincidence detection, as in the Jeffress model; when phase-locked impulses coming from the left and right ears coincide at a laminaris neuron, the cell fires most strongly. Thus, the nucleus laminaris acts as a delay-line coincidence detector, converting distance traveled to time delay and generating a map of interaural time difference. Neurons from the nucleus laminaris project to the core of the central nucleus of the inferior colliculus and to the anterior lateral lemniscal nucleus.

In the sound level pathway, the posterior lateral lemniscal nucleus (mammalian lateral superior olive) is the site of binaural convergence and where IID is processed. Stimulation of the contralateral ear excites and that of the ipsilateral ear inhibits the neurons of the nuclei in each brain hemisphere independently. The degree of excitation and inhibition depends on sound pressure, and the difference between the strength of the inhibitory input and that of the excitatory input determines the rate at which neurons of the lemniscal nucleus fire. Thus the response of these neurons is a function of the difference in sound pressure between the two ears.

The time and sound-pressure pathways converge at the lateral shell of the central nucleus of the inferior colliculus. The lateral shell projects to the external nucleus, where each space-specific neuron responds to acoustic stimuli only if the sound originates from a restricted area in space, i.e. the receptive field of that neuron. These neurons respond exclusively to binaural signals containing the same ITD and IID that would be created by a sound source located in the neuron’s receptive field. Thus their receptive fields arise from the neurons’ tuning to particular combinations of ITD and IID, simultaneously in a narrow range. These space-specific neurons can thus form a map of auditory space in which the positions of receptive fields in space are isomorphically projected onto the anatomical sites of the neurons.[9]

Significance of asymmetrical ears for localization of elevation

The ears of many species of owls are asymmetrical. For example, in barn owls (Tyto alba), the placement of the two ear flaps (operculi) lying directly in front of the ear canal opening is different for each ear. This asymmetry is such that the center of the left ear flap is slightly above a horizontal line passing through the eyes and directed downward, while the center of the right ear flap is slightly below the line and directed upward. In two other species of owls with asymmetrical ears, the saw-whet Owl and the long-eared owl, the asymmetry is achieved by different means: in saw whets, the skull is asymmetrical; in the long-eared owl, the skin structures lying near the ear form asymmetrical entrances to the ear canals, which is achieved by a horizontal membrane. Thus, ear asymmetry seems to have evolved on at least three different occasions among owls. Because owls depend on their sense of hearing for hunting, this convergent evolution in owl ears suggests that asymmetry is important for sound localization in the owl.

Ear asymmetry allows for sound originating from below the eye level to sound louder in the left ear, while sound originating from above the eye level to sound louder in the right ear. Asymmetrical ear placement also causes IID for high frequencies (between 4 kHz and 8 kHz) to vary systematically with elevation, converting IID into a map of elevation. Thus, it is essential for an owl to have the ability to hear high frequencies. Many birds have the neurophysiological machinery to process both ITD and IID, but because they have small heads and low frequency sensitivity, they use both parameters only for localization in the azimuth. Through evolution, the ability to hear frequencies higher than 3 kHz, the highest frequency of owl flight noise, enabled owls to exploit elevational IIDs, produced by small ear asymmetries that arose by chance, and began the evolution of more elaborate forms of ear asymmetry.[10]

Another demonstration of the importance of ear asymmetry in owls is that, in experiments, owls with symmetrical ears, such as the screech owl (Otus asio) and the great horned owl (Bubo virginianus), could not be trained to locate prey in total darkness, whereas owls with asymmetrical ears could be trained.[11]

Neural interactions

In vertebrates, inter-aural time differences are known to be calculated in the superior olivary nucleus of the brainstem. According to Jeffress,[12] this calculation relies on delay lines: neurons in the superior olive which accept innervation from each ear with different connecting axon lengths. Some cells are more directly connected to one ear than the other, thus they are specific for a particular inter-aural time difference. This theory is equivalent to the mathematical procedure of cross-correlation. However, because Jeffress' theory is unable to account for the precedence effect, in which only the first of multiple identical sounds is used to determine the sounds' location (thus avoiding confusion caused by echoes), it cannot be entirely used to explain the response.

Neurons sensitive to ILDs are excited by stimulation of one ear and inhibited by stimulation of the other ear, such that the response magnitude of the cell depends on the relative strengths of the two inputs, which in turn, depends on the sound intensities at the ears.

In the auditory midbrain nucleus, the inferior colliculus (IC), many ILD sensitive neurons have response functions that decline steeply from maximum to zero spikes as a function of ILD. However, there are also many neurons with much more shallow response functions that do not decline to zero spikes.

Processing of head-related transfer functions for biological sound localization occurs in the auditory cortex.