The Sound Attributes

(Mario Bon – may 2012)



Since the speaker will be placed in a room, we are interested with the sound that of a couple of diffusers produces in the "actual use condition": in our home, with our music, when the diffusers are connected to our amplifier and player.  With regard to the diffusers, the problem are: how to choose them and how to settle the room to achieve the desired result. The thecnical tests published by the hifi magazines should help to choose the speakers (and other devices) according to our needs. But how much technical tests do rapresent the sound quality? Or: which is the probability to obtain the sound quality, described by reviewers, in our living room?

The answer involves sound quality, measurements and the room acoustic.


We will see that, due to some important limitations regarding the loudspeakers, the most we can hope for is to determine, not certainty, but the probability of obtaining, at home, the desired sound quality.


In Figure 1 there are three sets: Devices, Measurements and Perceptions. The devices are players, amplifiers, speakers, etc.. It seem natural to read figure 1 starting from left to right:


- first pick a  device (an amplifier)

- then execute a series of tests (frequency response, distortion, power, etc..) 

- finaly, relate the test results with the sound quality.


This method does not work completely because the correspondence between sound quality and the results of the measures is not strong enough. When the results of the measures are very poor the sound quality is  very poor… but even in this case there may be exceptions (like zero feedback mono triode amplifier).


Let us try to read figure 1 from right to left:


- first determine the "attributes of sound perception" or "attributes of sound"

- then identify the physical quantities that represent them (and design the tests to quantify them)

- finaly choose the suitable tests for the device under test. 


Figure 1: Correspondence between devices, Measurement and Perception (Amar G. Bose - Technology Review, Volume 75, Number 7 June 1973 and 8 July / August 1973)


The "attributes of sound" are adjectives we use to describe "the sound."


In the article "Subjective Rank-orderings and Acoustical Measurements for Fifty-Eight Concert Halls" ( Beranek explains that, historically,  the sound attributes were selected through listening sessions of music recorded in empty theaters (in order to characterize the sound of  the theaters). But it is important to note that the attributes of the sound are defined through the listening of a pair of stereo speakers. With a perfect recording, the attributes describe the sound of the speakers.


Beranek has defined a large number of attributes. Over the years other authors reduced the number till a minimum of threeor four (Ando). Today, the ISO standards define the measurement procedures for 5 quantities directly related to the attributes of sound. That means that the road started by Beranek is good and shared.


With regard to the sound produced by a pair of speaker, the attributes are (in thid order): Warmth, Listening Fatigue, Loudness, Clarity, Spaciousness  and Brightness. To fully characterize the speaker the "interface properties" must be added even if  the interfacing is not a characteristic of the sound produced by the speaker, but it is important when the speaker is placed in a room and connected to the amplifier.


Since the attributes of the sound should correspond to measures of physical quantities, it is important that they are, as far as possible, independent from each other: one (or more) measures may correspond to a single attribute but the opposite should not appen. It would be  interesting to spend more time talking about the orthogonality of the attributes (a relevant methodological aspect ). It is enough to say that the attributes should be evaluated in the order in which they were exposed and that each attribute is a necessary condition for the following one.


Before proceeding to the description of any single attributes remember that the direct sound is the one which reaches the listener without undergoing to any reflections. The reflected sound, instead, reaches the listener after suffering at least one reflection. To measure the direct sound it is sufficient to put a microphone in front of the speaker (in an anechoic room). To measure the reflected sound a lot of measures in the space around the speaker are needed. From this point of view the attributes can be divided into one-dimensional (direct sound: Warmth, Listening Fatigue, Loudness, Clarity) and three-dimensional (reflected sound: Spaciousness  and Brightness.).

Let us now see the attributes one by one.



The Warmth is related to the direct sound balance (or timbre). Too much bass makes the sound "warmer", an excess of treble makes it "cold". The right Warmth is also obtained by an appropriate extension of the frequency response (maximum and minimum frequency reproduced). The measure representing the Warmth is the frequency response on the axis of the loudspeaker taken at appropriate distance.


In actual use conditions the listener receives the direct sound first, and after a short interval or ITG (Initial Time Gap) the first reflected sound. The direct sound is the same in every room. The reflected sound depends on several factors:


- the dispersion of the loudspeaker,

- the characteristics of the room,

- the relative position between speaker and listener in the room.


The effects due to the direct sound and to the reflected sound are "separable" as it is easy to experiment  by moving the speakers in different rooms (or even in the same room). The direct sound is always the same while the overall sound quality changes and this change is due to the different  "boundary conditions".


Listening fatigue

Listening fatigue is a consequence of the work that the brain makes to understand a sound message which is not clear or ambiguous. The Listening fatigue is caused by any phenomenon with a masking effect on the original signal: long reverb, non linear distortion, a boost at the high frequency or a too limited frequency range (intercom, megaphone, etc.). If the Warmth is decent, the Listening Fatigue is caused largely by non-linear distortion of the diffuser (loudspeakers and cross-over).


The nonlinear distortion can be "audible" or "inaudible." Audible distortion may be "tolerable" or "intolerable" then the tolerable distortion may be "stationary" or "shape distortion" (the distorsion which round or clip the peaks of the signals) .



The "inaudible" and "intolerable" distortion are the less important: both these situations are immediately apparent. The tolerable distortion defines a wide "gray area" characterized by an infinite number of shades. The Listening Fatigue produced by a speaker depends on the music genre too.


Consider the piece "Also Spake Zarathustra." At the beginning you hear only the persistent 32 Hz C, then the trumpets come etc.etc. . With a 5” woofer  two-way system,  the  32 Hz C asks the woofer a wide displacement that cause strong intermodulation distortion in the midrange. The same speaker may beautyfully reproduce a string quartet.


The same passage reproduced with a three-way system will sound much better because the excursion of the woofer does not interfere with the work of the midrange. For these reasons, the harmonic distortion measurement is not enough: we must measure intermodulation distortion.


Unfortunately, for the speakers, you can not measure the distortion in the "actual operating conditions" using the music as a stimulus (which is possible with amplifiers). What we can do to is to create a stimulus (a test signal) that is as close as possible to the music. The present trend is to use multi-tone stimuli (overlapping of a number of pure sine waves). This measure (which measures  distortion in stationary conditions) shows that, with a "complicated" stimulo, the intermodulation distortion is predominant and it heavily depends on the low frequencies level and low frequency  extention. For the moment we leave aside the "shape distortion" that is, fortunately, easily tolerated with good quality recordins (if peaks are short and separated in time).


The distortion measurement, rather than classifying speakers as "good" or "bad", identifies the musical genre best suited for a particular diffuser.



The Loudness is represented by the maximum SPL produced by the loudspeaker. If the speaker does not have enough Loudness will not be able to reproduce the dynamic of the recording and the weaker sound (the details) will be masked by environmental noise. In simple terms if the Loudness is not enough, you cannot turn up the volume as needed (or as you like). Even for this, the first requirement of the listening room is silence. Before changing speakers or amplifier you should try to reduce the environmental noise (with double-glazed windows). This will reduces the required Loudness and you will also save on heating costs. An other quantity directly related to the Loudness  is the volume displacement (the amount of air "moved" by the loudspeaker). Note that the dynamic is not a speaker quality but a quality of the recording. The diffuser must have enough Loudness to reproduce the dynamics of the recording.



The Clarity has two aspects:


- Horizontal Definition (ability to distinguish notes played quickly)

- Vertical Definition (ability to distinguish different instruments playing at once).


To subjectively evaluate the clarity just listen to a piece of music with very fast passages and/or with many different instruments playing together.


The Clarity depends upon the correct reproduction of transient (or phase response). The resonances of the cabinet, membrane break up, edge diffractions, they all lower the Clarity. The Clarity depends on how you use a loudspeaker  in the context of the diffuser. The measurement related to the Clarity is the waterfall (or the wavelet analysis).

In a minimum phase system, following the linear systems theory, the Warmth and the Clarity are the two sides of the same coin. Unfortunately, for the speakers, even under the best conditions, the minimum phase can not be completely matched (because of the edge diffraction).




The Spaciusness represents the ability of a speakers to reproduce the sound event in three dimensions (width, height and depth). In order to perceive the Spaciusness, Warmth, Listening Latigue, Loudness and Clarity must be excellent, but, first of all, Spaciusness informations must be present in the recording.


First of all the virtual center channel must be in the center of the stereo speakers pair.

This happens if the two speakers are equal (same sensitivity and frequency response) and mirrored.


This ensures the symmetry of the direct sound field and the correct reconstruction of the horizontal distribution of the virtual sources between speakers. The reconstruction of the depth depends on the recording while the reconstruction of the height of the virtual sources is typical of the diffuser used (the stereo recording contains no information on the height of the sources).


If the Spaciusness is not as it should be, that means that either the Wormth or Listening fatigue or Clarity are not optimal (assuming that the room and the position in the room are not a limitation).


The speakers can be classified, in terms of radiation, in five categories:


- Omni directional

- Dipole

- Direct radiation with auxiliary sources

- Direct radiation

- Directional (line array, horn)


Each of these proposes a different spaciusness model. For example, the omnidirectional speakers tend to expand the sound stage beyond  the distance between the left and right speakers.

The goal is not to reproduce a model of space but the one present in the recording. However many people prefer the dipole or omni directional systems even if they are very different from those used to monitor the recordings. The Spaciusness can not be measured directly, but you can give two necessary conditions to achieve it: a pair of speakers must be mirrored and the left and right speakers must be "identical" (same sensitivity and frequency response).


With regard to the horizontal and vertical dispersion, Toole raccomend to maintain a smooth frequency response (free of peaks and holes) also in the off-axis region. This ensures the homogeneity of the spectral content of the first lateral reflections. At the same time Toole says that the faults of a speaker can be mitigated by limiting the lateral reflection (making the lateral walls sound absorbing).


This is one of  the aspects that prevents to predict the quality of the reproduction in the domestic environment in a satisfactory way. And this is why, all we can hope for, is to say that, if certain conditions are met, the probability of obtaining a good reproduction in a normal environment is more or less high.


Some advantage, both in terms of Spatiality and Clarity is obtained by using auxiliary sources provided that it does not degrade the direct sound.



The Brightness is quality that depends on the persistence of high frequencies in the room. One way to accomplish this is to exploit the non-homogeneous distribution of absorbent material in the room through the use of auxiliary sources (located on the rear pannel of the diffuser). One of the first to adopt this solution, though with different purposes, was Amar G. Bose with the 901 Series (9 wideband speakers: one on the front pannel and 8 faceing the rear wall). The aim was to increase the  Listener Envelopment, even at the cost of a bad impulse response. The Brilliance heavily depends on the room  and is not easy to predict. Subjectively, we can evaluate the effect of the auxiliary source by comparing the reproduction with and without it. Whatever we do we must preserve the integrity of the direct sound.



The “interface”  is not an attribute of sound, but an important feature of the speaker that determines its quality. The most important aspects of interface are "upstream" to the amplifier, such as electrical impedance and sensitivity and "downstream" to the listening environment. Each speaker has its own characteristics which may make more or less favorable pairing with a given environment. For example, a loudspeakers in a sealed box is more "simple" to set than a reflex systems. For brevity, talking about the electrical impedance, let's remember the old DIN 45500 standard which sets the lower impedance limit allowed for a loudspeaker. It would be enough to respect it.


The definition of the attributes of sound provides the "road map" for evaluating the quality of a loudspeaker system, fixes the corrispondence with measurements results, limits the number of adjectives used to describe the sound. It is not difficult to evaluate each attributes with a vote and  obtain a numerical evaluation.  This is not enough because the hearing is an olistic experience while measurements are deterministic.

In the derterministic world 1+1=2, in the olistic world 1+1=3 or more. Measurements evaluate quantities one by one. Our auditory sistem evaluet sound as a whole. This “whole” is the message carried by the sound. So we have an infinite number of different way to recognise the “message”.


For example we can recognize a voice from a wonderful loudspeaker, a small radio or an intercom.  The definition of the attributes of sound is a step forward but, at the moment, any measurement must be validate by a listening session. The opposite is also true.