SHS 311 Exam 3 Review

Ace your homework & exams now with Quizwiz!

Psychoacoustic Tests

Comodulation Masking Release (CMR) Auditory Streaming The ability to detect a signal tone (S) in a target band of noise (TB) is improved by adding another cue band of noise (CB) with the same temporal modulation as the target band. The comodulated cue band may provide additional information about when the noise level is low, making it easier to detect the signal.

Masking

In the everyday world, many sound sources exist at or nearly at the same time and the sound form one source can interfere with our ability to process the sound from another source. Masking is one aspect of that interference. Masking is usually defined as the increase in the detection threshold of one sound (SIGNAL) due to the presence of another or other sounds [MASKER(S)].. The detection threshold of the signal is determined without the masker (in quiet), and then in the presences of the masker. If the presence of the masker increases the detection threshold of the signal over that obtained in quiet, the masker masked the signal. Masking happens when a sound increases the detection threshold of another sound. When this happens one of the sounds is harder to hear due to the other one interfering with it. Masking can also be measured as the level of a masker required to barely mask a short-duration, soft signal, or for the listener to obtain a threshold level of signal detection performance. More masking means a lower masker level is required to mask the signal. Note that in the measurement of masking pattern, masker is fixed, while signal is varied in amplitude and frequency. However, in the measurement of tuning curve, signal is fixed, while masker is varied in amplitude and frequency.

Auditory Streaming

In this demonstration, a 400-Hz tone (tone A) is alternated with a tone of a higher frequency (tone B: 504 and 713 Hz). The tones are presented in the temporal sequence shown in the figure. For each comparison you are to decide if the perception is one of a single source whose pitch changes in a "galloping" manner or two sources, each with its own pitch, are running side by side. If Freq. diff is small you perceive a single string, but if Freq diff is large then you perceive two strings. slower rate = 1 source faster rate = 2 sources

Psychometric Function

Psychometric functions relate a measure of listener's performance (e.g., percent of time that a stimulus is detected) to a measure of the stimulus (e.g., stimulus intensity level). Thresholds are derived from psychometric functions as indicated above (e.g., the level at which the stimulus is detected 50% of the time).

Receiver Operating Characteristic ROC

The Hits vs. False Alarms are plotted on an ROC curve. Each ROC curve represents a different Sensitivity, while points on an ROC curve represent different Response Bias. Thus, a measure, like the area under an ROC curve, is a measure of Sensitivity that does not change with Bias.

Theory of Signal Detection (TSD)

The Theory of Signal Detection, perhaps the most successful theory of decision making, generates a method that can produce a measure of sensitivity (e.g., a threshold) that is not influenced by Bias. Let's change the Method of Constant Stimuli: half of the trials (randomly selected) contain no signal, while the other half contain signal. The trials without signal are "catch" trials that can be used to detect response bias.

Threshold Increases with Short Duration

The auditory system seems to be a constant-energy detector. Energy = Power x Duration EnergydB = PowerdB + 10logDuration; When duration is halved, 10logDuration is reduced by 3 dB, thus Power must be increased by 3 dB to keep Energy constant. The energy detector seems to have a time window of ~300 ms.

Objective Psychophysics Measures

The change that lifts the sensation over consciousness.

Auditory Filter Banks

The filters have the same shape on the log frequency scale (left) but not on the linear frequency scale (right). The bandwidth in Hz increases with center freq.

Auditory Distance Estimation

The level of sound decreases as the distance between the source and ear increases. Under ideal conditions, far away sounds are less intense than near sounds. However, the sound level can only be a cue for distance if one knows a priori what the level of the sound source is. For example, if one knows how loud a person is talking and the sound level decreases, it is reasonable to assume that the person has moved further away. The other possible cue for distance is based on the observation that a near sound source in a reverberant room will reach the ear without too much contribution from the reflection. On the other hand, a far away sound will reach the ear along with a lot of the reflected sound. If the ratio of the direct to reverberant sound level is high, the sound source is likely to be close. Research shows that distance judgments are better when reflections occur. Overall, judging the distance of a sound source is much worse than judging either its azimuth or vertical location.

Difference threshold:

the minimum change from a constant stimulus parameter value to produce a noticeable perceptual difference. In audition, the threshold measure applies to sound features such as frequency, intensity, amplitude modulation, minimal audible angle, and etc.

Absolute threshold

the stimulus parameter value that an observer can barely detect. In audition, the threshold measure applies to sound features such as frequency, intensity, amplitude modulation, minimal audible angle, and etc.

Shift of the Residue Pitch

(2) The pitch of a complex sound is also not determined by the constant spacing between spectral components. Shift of the Residue Pitch The "shift of the residue" stimulus is created by shifting the harmonics of 100-Hz fundamental frequency by 25 Hz. The new stimulus has a component spacing of 100 Hz, and a fundamental frequency of 25 Hz. However, pitch of this stimulus is 104 Hz. (Pitch shift is only 4Hz)

Iterated Ripple Noise Pitch

(3) The periodicity of temporal envelope is also not required for pitch perception of a complex sound. The iterated ripple noise (IRN) has no periodic structure in the time domain (a), while its spectrum (b) has noisy peaks at 500 Hz, 750 Hz, 1000 Hz, etc. Yet this stimulus has a 250-Hz pitch, because all these spectral peaks are harmonics of 250 Hz.(All are from 250 Hz)

Noise Masked Tonal Thresholds

1) Masked Thresholds increase with increasing signal (tonal) frequency. 2) Each 10-dB increase in the spectrum level (No) of the noise results in about a 10-dB increase in masked threshold. In general, if the noise spectrum level is increased by x dB, the signal energy must also be increased by x dB to be detected. As such, the dB difference between signal energy (E) and noise spectrum level (No) remains a constant for signal detection. 10log(E) - 10log(No) = Constant 10log(E/No) = Constant

Two Types of Localization Tasks

1. Identification - how well does the perceived location align with the presented location of a sound source? Perceived location of a broadband noise as a function of the presented location. For a broadband noise containing a range of frequencies, both ILDs and ITDs are useable cues. The straight line represents perfect performance. Sound localization of normal hearing listeners in the azimuth plane is pretty good. 2. Discrimination - How well can subjects detect a small change in the sound location? To determine the smallest separation between two otherwise identical sound sources that can be discriminated. This smallest discriminable separation is called the Minimal Audible Angle (MAA).

Cancellation Method

A clever technique called cancellation method can be used to measure the amplitude, frequency, and starting phase of such audible distortion products. In order for the cancellation tone (CT) to cancel the cubic difference tone (2f1-f2), the CT must have the same frequency and level of and be 180o out of phase with the cubic difference tone (CDT). If the frequency of the CT is 400 Hz, its level is 30 dB SPL and its starting phase is 60 degrees when the cubic difference tone disappears, the cubic difference tone must have a frequency of 400 Hz, a level of 30 dB SPL, and a starting phase of 240 degrees (180o + 60o = 240o).

Auditory Filters

A general theory has been proposed to understand and explain the masking data, regardless of the measurements and maskers used. For the detection of a signal, listener monitors the output of an "internal auditory filter" centered on the signal frequency. The amount of masking for the signal is determined by the amount of masker power coming through the "internal auditory filter". Signal detection performance is determined by the signal to masker ratio at the output of the "internal auditory filter" The amount of masking is shown after the masker and signal come out of the internal auditory filter.

Loudness Recruitment

A person with hearing loss would require a higher sound level to just detect the sound than a person with normal hearing. For the person with hearing loss, less intense sounds are difficult to hear, while more intense sounds are just as loud as for a person with normal hearing. Loudness recruitment (curve B) is the steeper than normal (curve A) growth of loudness with increased sound level. The dynamic range over which loudness changes is reduced due to recruitment, which has important implications for hearing aid fitting. Loudness recruitment is usually associated with Outer Hair Cell loss or damage. The sudden change in sound reception experienced by people with hearing loss. There's a sudden jump in what the person is hearing.

Ambiguity of ITDs for High Frequencies

A sinusoid with a frequency of 1666 Hz is presented from 75o on the right. The left-ear signal arrives later than the right-ear signal by 0.6 msec, which is one period of 1666 Hz. After the first period, the left- and right-ear signals look the same (in phase), which may cause a confusion that the sound is directly in front. Similar ambiguity exist for all frequencies above 1666 Hz.

Vertical Localization Performance

Acuity(sharpness in hearing) is not as great in judging vertical locations as it is for judging azimuth locations.

Loudness Scaling

Another method to measure loudness is a Scaling procedure. A subject was told that a 40-dB SPL, 1000-Hz tone (40 Phons) should be assigned a loudness number of 1. In each trial, the subject was presented with the standard tone followed by a comparison tone of a different level. The task was to indicate the loudness of the comparison tone with a number. If it was "n" times as loud as the standard tone, one should assign a loudness number of "n". Loudness measured on such a ratio scale is in a unit called Sone. Example: If a 1000-Hz Comparison tone of 50-dB SPL is judged to be twice as loud as the 1000-Hz, 40-dB SPL Standard tone, this 1000-Hz Comparison tone of 50-dB SPL is said to have a loudness of 2 Sones. Sones = number of times louder For sound levels above 40 dB SPL (or 40 Phons), the loudness doubles when the intensity is increased by 10 dB. For sound levels below 40 dB SPL (or 40 Phons), the loudness decreases with the intensity more rapidly. How much does the physical level of a sound have to change in order for its loudness to be doubled? - Twice in micro-pascals (6-dB difference, e.g., from 34 to 40 dB SPL) - Twice in decibels (e.g., from 20 to 40 dB SPL) - 10-dB Difference (e.g., from 30 to 40 dB SPL)

Missing Fundamental Pitch

Arriving at a theory of pitch perception has involved eliminating one or more of the peripheral codes that can account for pitch perception. (1) Physical presence of the fundamental frequency is not critical for pitch perception of a complex sound. The pitch of a harmonic complex tone is that of the fundamental frequency even when the fundamental and other lower harmonics are removed from the stimulus. Lower harmonics are not critical. We may perceive pitch based on the lowest frequency component available.

Spectral Spread with Short Duration

As the duration decreases below 30 msec, the spectral spread is so great that some energy falls out of the frequency region of detection. Thus, the power of the sound must be increased to maintain enough energy within the frequency region of detection. Need more sound power when duration decreases to have enough energy in the frequency region of detection.

Forward Masked Tuning Curve

By using forward masking, the possibility for the signal and masker to physically interact is eliminated, since only one tone is presented at a time. Forward masked psychophysical tuning curves might better reveal what tuning actually occurs in the auditory nerve than simultaneous masked psychophysical tuning curves. Indeed, forward masked psychophysical tuning curves are about as narrow (sharp) as those measured neurally, while simultaneous masked psychophysical tuning curves are wider than neural tuning curves.

Musical Definitions

Circularity in pitch perception: notes separate by an octave (a ratio of 2) are perceived with a high degree of similarity. They are said to have the same pitch chroma but different pitch heights. Within an octave, there are 12 notes with different pitch chromas but the same pitch height. The notes (C, D, E, F, G, A, B, and the flats and sharps) are logarithmically spaced by a frequency difference of one semitone (a ratio of 21/12). Thus, the set of 12 pitch chromas repeats at different pitch heights for each new octave. Each semitone contains 100 cents (i.e., 1200 cents to an octave).

Audiogram

Commonly used in Audiology. Threshold is determined relative to the average threshold at each frequency for young, healthy listeners with "normal" hearing. The unit is dB HL rather than dB SPL. Assume that the normal hearing threshold at 500 Hz is 4 dB SPL. 0 dB HL in the audiogram at 500 Hz means that the person has the normal threshold, and can detect a 4 dB SPL 500 Hz tone. 20 dB HL in the audiogram at 500 Hz means that the person has 20 dB hearing loss, and can only detect a 24 dB SPL 500 Hz tone. This audiogram shows high-frequency hearing loss in both ears.

Pitch of Complex Sounds

Complex sounds with temporal regularity (e.g., periodicity) and/or intense spectral components often evoke a pitch perception. For example, the following two sounds both generate a 100-Hz pitch. What determines pitch? Temporal periodicity or spectral structure?

Cones of Confusion

Cones of Confusion are source locations that produce sounds with the same ILDs and ITDs, but the sources are located in different positions in 3-D space. The equal locations of the sound produce a cone looking figure in a 3-D space

Measuring Critical Bands

Critical bands are often measured using the "notched noise" technique. Signal tone is placed at the spectral center of a band-reject noise masker. Signal threshold is measured as a function of the width of the spectral notch (gap). Wide notches would provide very little noise power in the critical band filter, so signal thresholds would be low. As the notch is made more narrow, more noise power would come through the filter and signal thresholds would rise. Narrower notches = higher signal thresholds

Pitch Due to Nonlinearity

Due to the nonlinear processing of peripheral auditory system, audible difference tones (such as the cubic difference tone, CDT) may be generated when two tones of different frequencies are presented at the same time at a high intensity level.

Equal Loudness Contours

Each contour (curve) represents the level (in dB SPL) and frequency of a Comparison tone that is judged equally loud to a Standard 1000-Hz tone presented at 20, 40, 60, 80, or 100 dB SPL (labeled as Phons on the figure). A Comparison tone at 100 Hz is judged to be equally loud to the 40-dB SPL (40 Phons) 1000-Hz Standard tone when the comparison tone has a level of 52-dB SPL. But, a 7000-Hz Comparison tone must have a level of 50-dB SPL to be judged equally loud to the 40-dB SPL (40 Phons) 1000-Hz Standard tone. In both cases, Loudness is 40 Phons. (all have 20 Phones because it's the standard fixed tone) Thus, loudness depends on both the frequency and the level of a sound. The effect of frequency on loudness is greater for soft tones (e.g., at 20 Phons) than for loud tones (e.g., at 80 Phons). Phons is a measure of Loudness Level. A sound that is "n" Phons loud is equally loud as a 1000-Hz tone presented at "n" dB SPL. Depending on the frequency, the actual physical level (dB SPL) of a "n"-Phon sound can be less than, equal to, or greater than "n"-dB SPL, as determined by the "n"-Phon equal-loudness contour. In this demo, there are three comparison tones - 150 Hz, 1000 Hz, and 7000 Hz. The standard tone is 1000-Hz. For each condition, the comparison tone is paired five times with the standard tone. The level of the comparison tone starts at 5 dB less intense than the 1000-Hz standard tone and then increases by 5 dB for each of the remaining four comparisons. You are to decide for which pairing (1-5) do the comparison and standard tones appear to be equally loud? (In slide 6 of Loudness PPT)

MAP Procedure

Example: At Calibration, 1 volt of input to the headphone produces 200 micropascals of pressure measured by the microphone connected to a coupler inside the manikin. Given this example calibration, if a listener's threshold for detecting a sound is 0.1 volts going to the headphones (i.e., 1/10 of 1 volt), then his/her threshold in terms of sound pressure is 1/10 of 200 micropascals (i.e., 20 micropascals).

Pitch of Harmonic Complex Tone

Excitation pattern shows the average output of each auditory filter as a function of the center frequency. Low-frequency harmonics create clear peaks and are thus resolved in the excitation pattern, while high-frequency harmonics are unresolved. Spectral pitch: the excitation peaks from low-frequency resolved harmonics may be combined to calculate the F0. Periodicity pitch: the F0 may also be extracted by pooling the temporal periodicity information from all channels (including temporal fine structures from low-frequency resolved harmonics and periodic temporal envelopes from high-frequency unresolved harmonics).

Pitch of a Pure Tone

For a pure tone, pitch is closely related to frequency. A frequency change produces energy in different spectral regions and a frequency change alters the period of the time domain waveform. The changes in spectrum and waveform are equivalent to each other via the Fourier Analysis. Thus, pitch may be due to a change in spectrum and/or a change in the period of the sound in the time domain waveform.

Sound Sources in the Azimuth Plane

For a sound coming from one side of a listener, the sound will reach the ear closest to the sound source before it reaches the other ear, producing an Interaural (between ears) Time Difference (ITD). The sound will also be less intense at the "far" ear (mainly due to the "head shadow"), producing an Interaural Level Difference (ILD).

Interaural Time and Phase Differences

For tonal stimuli, an ITD also yields an Interaural Phase Difference (IPD). The equation for obtaining the IPD from an ITD for tonal signals is: IPD = (ITD/Period) x 360o On the other hand, ITD = (IPD/360o) x Period In this case, the ITD equals the period of the 1666-Hz tone, so IPD = .6/.6x360 = 360o. The left-ear signal is shifted 360o relative to the right-ear signal. Suppose for a 500-Hz tone, a discriminable interaural time difference (ITD) is 100 µsec, what is the interaural phase difference (IPD)? Period of 500 Hz = 2 msec = 2000 µsec IPD = (ITD/Period)x360o IPD = (100/2000)x360 = .05x360 = 18o

HRTFs

Head-related Transfer Function

Localization in the Vertical Plane

ILDs and ITDs cannot be used to determine where a sound source is vertically. Consider the three sound sources in this figure. They are all located equidistance between the ears, so each produces a zero ILD and ITD. Some cues other than ILD and ITD must assist us in making these vertical location judgments.

ILDs in the Azimuth Plane

ILDs are produced by the inverse square law and head shadow. The head shadow accounts for almost all of the ILD. Head shadow is generated for sounds with a wavelength similar to or shorter than the head size. The diameter of the human head is about 18 cm. If the speed of sound is 350 meters/sec, and wavelength (λ) = c/f; f = c/λ = 350 m/s / .18 m = 1944 Hz. So, a sound with a frequency of 1944 Hz or higher would produce a sound shadow on the other side of the head.

ITDs in the Azimuth Plane

ITDs are produced by an interaction of the sound waveform and the head size in terms of the time it takes for sound to travel around the head from one ear to the other. ITD is about 0.6 msec for a sound location of 75o and does not change too much across frequencies.

Differences Between MAF and MAP

In MAP, sound is directly sent into the ear canal, while in MAF, sound still needs to go through head diffraction and ear canal resonance. As such, there is a difference between the measured MAP and MAF thresholds. The difference is on average ~6-dB (the missing 6 dB).

Localization in Reverberation

In a reverberant environment, each reflection is like a different sound source at a different spatial location. Surprisingly, we are not often confused as to where a sound source is located even in a very reflective environment.

Complex Maskers

In the everyday world, masker is likely to be a complex sound rather than a sinusoidal tone. In order to measure masking for complex maskers, a white, Gaussian noise is often used as a masker.

Temporal Masking

It is more difficult to detect the signal when the signal is presented near the beginning or end of the masker than when the signal is in the middle of the masker. Masking in the sense of elevated signal detection thresholds can occur even when the masker and signal do not occur at the same time. The amount of backward masking declines faster than that of forward masking, as the temporal gap between the signal and masker increases. A 10-ms, 1000-Hz signal is masked by a 250-ms broadband noise masker in 3 temporal conditions: simultaneous, forward, and backward masking. The signal level is initially 25dB greater than the noise spectrum level, and gradually decreases in 5 dB steps over the stimuli. The noise level is fixed.(example is in slides)

Localization and Lateralization

LATERALIZATION" is used to describe the perception of sounds presented over headphones and "LOCALIZATION" the perception of sounds coming from actual sound sources. In the previous experiments as well as in real world, sounds are presented from different locations in free field. ITDs and ILDs will always occur together. This does not allow us to study ILDs and ITDs separately. One could send sounds to headphones and vary the ILD independently of the ITD or vice versa. However, this often leads to an auditory image in the head located somewhere on a line (a lateral line) running from one ear to the other. The sounds do not appear to be localized out in the 3D space where real sound sources are located.

Subjective Attributes of Sound

Level, frequency, and phase are the physical attributes of sound, while loudness, pitch, and timbre are subjective attributes of sound. Subjective attributes are not perfectly correlated (related) to a single physical attribute. While intensity (physical) changes lead to loudness (subjective) changes, frequency and duration may also cause a sound's loudness to change. The same variability exists for pitch and timbre. Subjective attributes of sound can be measured by scaling and matching psychophysical procedures. Unlike discrimination measures which are objective in the sense that there is a "correct" or "incorrect" response (e.g., two tones are either the same or different in frequency), subjective measures are not objective (there is no "correct" answer). Thus, it is crucial that subjective measures be as reliable and as valid as possible.

Masking Level Difference (MLD)

MLD = Signal detection level in MoSo or MmSm condition minus Signal detection level with different IPDs for signal & masker (e.g., in MoSπ or SoMπ condition) e.g., Signal detection level for MoSo = 70 dB Signal detection level for MoSπ = 55 dB MLD = 70-55 = 15 dB Spatial release from masking also occurs for actual sound sources in free field

Methods for Objective Psychophysics

Method of Limits: the signal level is decreased in successive steps and then the signal level is increased in successive steps. The threshold level is the mean of the last "no"/"yes" level and the first "yes"/"no" level. Method of Adjustment: the listener adjusts the signal level until it is just barely detectable. This level is the threshold level.

Pitch of Complex Sounds

Most sounds (e.g., musical notes and speech sounds) are complex (i.e., a sum of simple, tonal, or sinusoidal sounds). These complex sounds often have a pitch. For a harmonic complex tone with frequencies that are integer multiples of the fundamental frequency (F0), pitch is closely related to the F0, even if the F0 is not physically present or perceivable.

Theory of Pitch Perception

Neither of the two pitch mechanisms shown on the previous slide can account for all of the experiment data on pitch perception of complex sounds. Although research on pitch perception has a long history (over 100 years), it turns out to be a difficult task to develop a unified theory of pitch perception that would enable one to predict the perceived pitch of a complex sound based on the temporal and/or spectral structure of the sound. There isn't a sound or solid theory to use yet.

Loudness Matching

One method to measure loudness is a Matching procedure. A subject is asked to adjust the level of a Comparison tone at a particular frequency so that it is perceived to be equal in loudness to a 1000-Hz Standard tone of a particular level. The level of the Standard tone is used to define the loudness of the equally loud Comparison tone. Example: If a 7000-Hz Comparison tone of 50-dB SPL is judged to be equally loud to a 1000-Hz, 40-dB SPL Standard tone, this 7000-Hz Comparison tone of 50-dB SPL is said to have an "equivalent" loudness of 40 phons. Phons is a loudness unit.

How TSD works: Percent Correct P[C]

P[C] = (Hit Rate + Correct Rejection Rate) / 2; if there are equal number of trials with and without signal P[C] using the TSD method is an improvement over the Classical Methods, but it is still not an un-confounded measure of Sensitivity.

Simultaneously Masked Tuning Curve

Psychophysical tuning curves are seen as an analogy to auditory nerve (AN) tuning curves. In physiological measure of AN tuning curves, one tone is used to generate AN responses. However, in psychophysical measure of tuning curves, two tones (masker and signal) are used. When the two tones occur at the same time in simultaneous masking, they can interact and produce neural outcomes that are not present in the physiological measure (especially those from the nonlinear processing in inner ear and auditory nerve). This may explain the difference between psychophysical and physiological tuning curves.

Spatial Release from Masking

Signal detection is easier when signal and masker are spatially separated rather than when they come from the same location. This is referred to as "spatial release from masking," in that spatially separating the sources reduces the amount of masking one source provides for the other. (Separating the masker and signal reduces the amount of masking) A classical method to study spatial release from masking is to manipulate the interaural phase difference (IPD) of signal (S) & masker (M) separately. Monotic sound to one ear: Sm and Mm Diotic same sound to both ears: So and Mo with 0o IPD Dichotic different sounds to both ears: Sπ and Mπ with a 180o IPD

MAA in the Azimuth Plane

Small differences in the sound source location are difficult to discern if the sources are off to one side (with MAAs greater than 7 degrees) When the sources are right in front, a 1-2 degrees change in location is discriminable. Performance is worse in the middle frequency region, where neither ILDs or ITDs are good cues.

Critical Bands

So far the bandwidth of the noise masker has been broad (covering the entire audio range). However, we don't need all of the spectral content in noise to generate the same amount of masking. Actually, signal detection threshold does not change when the noise bandwidth is reduced, until a critical bandwidth is reached. When the noise bandwidth is narrower than the critical bandwidth, the signal becomes easier to detect. Signal is easier to detect when the noise bandwidth is narrower than the critical bandwidth. One way to explain these types of critical band results is to imagine that the signal is processed in a critical band that is like a bandpass filter centered on the frequency of the signal (an internal critical band filter), and the power of the noise coming through this critical band filter determines the masked threshold. So a wideband noise, wider than the width of the internal filter, produces maximum masking since the filter output is "full." An even wider noise would not further increase thresholds. A noise whose bandwidth is narrower than the internal filter will produce less masking, since the filter output for noise is not "full."

dB A

The 40-Phon equal loudness contour is used in the pressure measurement of sounds with a broad-band spectrum (e.g., environmental noise). Suppose a sound has a flat spectrum that ranges from 10 to 30,000 Hz. Not all of these spectral components are equally loud. Based on the equal loudness contour, components that are very low or very high in frequency are softer than those in the middle frequency range. A microphone would measure the total pressure of this sound and all spectral components would contribute equally to the total pressure measurement. In order to reflect what we actually hear, one would want to discount the contributions of very low and very high spectral components, since they are either inaudible or very low in loudness. To do this, we turn the 40-Phon curve upside down and use it as a Filter to first filter the broadband sound before measuring its total pressure. In this way, low and high spectral components would have been attenuated before they were added to the total pressure. The measured level of the 40-Phon filtered sound would more closely represent what we perceive. The level measured after a sound is filtered by a filter equal to the inverse of the 40-Phon equal loudness contour is in the unit of dBA or dB on the A scale.

Complex Perception

The auditory periphery encodes the time-frequency-level aspects of sound. There are no peripheral information about what or where the sound sources are. Thus, perception of sound sources must be calculated at the level of brain stem and cortex using peripheral information and listening experience. When sound from multiple sources reach the ear drum, our auditory system needs to organize sound into perceptually meaningful streams. This process is named by psychologist Albert Bregman as Auditory Scene Analysis and involves three key aspects: segmentation, segregation, and integration. The peripheral auditory processing could provide the following grouping and segregation cues to the central auditory system to aid it in determining what sources may have produced the sound: Spatial Separation Example: Spatial release from masking Spectral Separation Example: Place coding of pitch Harmonic Relationship Example: harmonic complex tones Synthetic listening vs. analytic listening Spectral Profile Signal amplitude as a function of frequency Temporal Separation Different sound sources start/stop at different time. Temporal Continuity The same sound source (e.g., an instrument) may play many notes over time. The smooth changes in fundamental frequency maintains a continuity to the notes. Temporal modulations Actual sound are often modulated in amplitude and the AM pattern is distinct to the sound source.

Dynamic Range of Hearing

The dynamic range of hearing is the range from the threshold of detection to the upper limit of pain or discomfort. The thresholds of detection are frequency dependent and those for pain or discomfort are less so. Together, the Dynamic Range of Hearing is frequency dependent, ranging from 70 (at very low or high frequencies) to 130 dB (around 1000 Hz).

Precedence Effects

The fact that we can process what and where a sound source is in a reverberant environment is usually referred to as the EFFECTS OF PRECEDENCE. The idea is that the first sound to reach a listener most often comes directly from the source, not a reflection, and as such the first sound takes precedence in our perception. (The direct sound takes the lead!) Precedence effect: the perceived spatial location is dominated by the location of the first-arriving sound and the spatial location information of later-arriving sounds is suppressed. Fusion: The direct sound and its reflections are perceived as a single sound source. Localization Dominance: The fused sound source is perceived to be located at or near the location of the original sound source (not at the reflection). Discrimination Suppression: Spatial information about the reflected sounds is suppressed relative to that of the source. In other words, the original sound from the sound source is less suppressed of its spatial information than the reflected sounds. The reflected sounds are suppressed more than the original sound source sounds.

Masking Pattern

The masker is a 1000 Hz tone, and the signals are of different frequencies. The masked signal detection threshold is measured using psychophysical methods. For the signal at 700 Hz, listener monitors the output of the auditory filter centered at 700 Hz. The masker level is attenuated by 40 dB by this filter (point a). For the signal at 900 Hz, listener monitors the output of the auditory filter centered at 900 Hz. The masker level is attenuated by 10 dB by this filter (point b). ....... The right panel thus shows the spread of excitation of the masker across freq. Assume the signal needs to be as strong as the masker to be detected. The right panel also indicates the masked signal detection threshold as a function of signal frequency (i.e., the masking pattern). The masker was fixed in frequency (1200 Hz) and level (80 dB SPL) and the signals were of different frequencies. Note that the signal was of a longish duration (500 msec). This way of measuring masking has several confounding factors. The presence of beats of various types and difference tones caused by the nonlinearity of peripheral processing produce cues in addition to those of the signal tone alone that lead to a confounding of the results. The regions of signal frequency where beats and combination tones occur are indicated on the figure.

Calculating Critical Bandwidth

The masking data from the notched-noise test can be used to estimate the shape & bandwidth of the critical band. 2 ways to calculate the critical bandwidth: 1. Consider the critical band as a band-pass filter as shown in the Figure. Bandwidth can be calculated at 3-dB attenuation or half-power points (blue dotted lines). 2. Create a rectangular filter having the same total power and peak spectrum level as the measured critical band. The bandwidth of such a rectangular filter is called the Equivalent Rectangular Bandwidth or ERB (green dotted lines).

Pitch Scaling????

The mel scale for pitch perception is derived in a similar way as the sone scale for loudness perception. Reference: a pitch of 1000 mels is assigned to a 1000-Hz tone at 40 dB above the listener's threshold. If a comparison tone has a pitch that is "n" times higher than that of the reference tone, a number of "n" times 1000 should be assigned. Above 500 Hz, 4 octaves in Hz corresponds to 2 octaves in mels. Loudness scale = 1000 Hz or 40 dB SPL Looks like mel scale is half of the Hz value at hand.

Minimum Audible Field (MAF)

The minimum detectable level of a sound is measured at the position of head in a free field with negligible or (not noticeable) reflections from surfaces. Typically, the sound is assumed to come from directly in front of the listener. Example: At Calibration, 0.5 volts of input to the loudspeaker produces 500 micropascals of sound pressure measured by the mic at the position of head. Given this example calibration, if a listener's threshold for detecting a sound is 0.05 volts going to the loudspeaker (i.e., 1/10 of 0.5 volts), then his/her threshold in terms of sound pressure is 1/10 of 500 micropascals (i.e., 50 micropascals). The voltage during testing (0.1 v) is 1/10 of the voltage during calibration (1 v). Thus, the sound pressure in micropascals during testing is also 1/10 of the sound pressure during calibration (200 micropascals). So the detection threshold in micropascals is 1/10*200 = 20 micropascals.

Minimum Audible Pressure (MAP)

The minimum detectable level of a sound is measured inside the ear canal, close to the ear drum. Typically, the sound is delivered by headphones. MAP is often measured separately for each ear.

Pitch Matching

The pitch of a complex sound is often determined in a pitch matching procedure. The complex sound would serve as the standard sound. Comparison sounds for pitch matching are usually either sinusoidal sounds or sounds with a strong periodicity like a train of clicks. The period or frequency of the comparison sound is adjusted until the comparison sound is perceived to have the same pitch as the standard sound. The frequency of the comparison sound is used to indicate the pitch of the standard complex sound.

Musical Notations

The relationship among Musical Note, Cents, and Frequency (Hz) for the equal temperament scale of musical pitch. 264 is F0 D4: 264 * 2 and 2/12 = 296Hz which is the next Hz down the list. 2 semitones is 200 Cents One octave has 12 semitones. Same Pitch, height, but different chromas.

Psychophysical Tuning Curve

The signal is a 1000 Hz tone, and the maskers are of different frequencies. The masker level required to mask the signal is measured using psychophysical methods. In this task, listener would attend to the output of only one auditory filter centered at the signal frequency. If the masker level is equal to the signal level, the masking is not enough, because the masker level is attenuated by the auditory filter, while the signal level is not. Thus, the masker level need to be higher. For example, the filter attenuation is 20 dB is at 1.2 kHz, the masker level must be increased by 20 dB to produce enough masking. The black curve on the left is the tuning curve, which is the flip-over of the auditory filter shape. The frequencies of the three tonal signals were 300, 1000, and 3000 Hz. The signal was a short-duration tone (20 msec) at a low level (20 dBSL). The psychophysical tuning curve method avoids all of the possible confounds of the masking pattern method: The 20-ms signal duration is too short (less than one beat period) for beats to occur (best beats occur for rates <50 Hz, or a 20-ms period). The low level of the signal means that nonlinear distortion frequencies (harmonics and difference tones) will not be audible. A 20-ms, 1000-Hz signal at the middle of a 250-ms, tonal masker at each of four masker frequencies. The masker level is initially 3dB less than the signal level, and gradually increases by 3 dB in each successive stimulus. The signal level is fixed. Count how many times you can detect the signal in addition to the masker.

Neural Basis for Pitch of a Pure Tone

The spectral and temporal properties of frequency are preserved in neural coding at the auditory periphery. At low frequencies, either period of neural firing (temporal) or which fibers discharge (spectral or place) could be the basis for pitch, while at high frequencies, only the spectral or place difference (which fibers discharge) could be used to differentiate one frequency (pitch) from another.

Spectral Notches in HRTFs

The spectral notches are generated by the interaction between direct and reflected sound waves as they enter the ear canal. Notch occurs when the direct wave is out of phase with the reflected wave at a given frequency.

Limitation of Classical Methods

The threshold is suppose to represent a measure of the sensitivity (e.g., of the auditory system) to the changed stimulus parameter (e.g., stimulus level or frequency). However, the classical methods are prone to a possible confounding effect. Expectations, experience, instructions, and other situations that are not associated with sensitivity can affect the listener's tendency to choose one response or the other, and thus affect the measured threshold. This is called response bias. Example of Response Bias

Individual Differences in HRTFs

The torso, head, and pinna of individuals differ a great deal. Thus, it is not surprising that the HRTFs also vary as shown below. The HRTFs were obtained for four different subjects for a sound source located at 45o elevation and a 0o azimuth. Red curve is for the right ear and the blue curve for the left ear.

Duplex Theory of Sound Localization

The ways in which ILDs and ITDs are physically different as a function of frequency and the fact that localization errors occur at the middle frequencies, lead to The Duplex Theory of Sound Localization: High Frequency Sounds are located in the azimuth plane based on ILDs, and Low Frequency Sounds are located in the azimuth plane based on ITDs. The dividing line between low and high frequencies is between 1200 and 1600 Hz.

Timbre

There are several other subjective dimensions of sound in addition to loudness and pitch. The major one is timbre. Timbre does not have a direct definition. Its definition is: "If two sounds have the same perceived pitch, loudness, and duration and are perceived as different, the perceptual difference is TIMBRE." So the difference between the sound of a cello and a violin when they both generate a note of the same pitch, loudness, and duration is a Timbre difference. Sometimes a timbre difference is considered a difference in Sound Quality. Timbre is sometimes rated on a scale of bright to dull; a violin has a brighter sound than a cello.

Localization in the Azimuth Plane

This figure shows the average percent errors in locating the source of tonal signals in the azimuth plane as a function of the tonal frequency. Note, that the localization errors increase around mid-frequency range, where ITDs are starting to cause phase ambiguity and ILDs are not strong enough to produce an appreciable sound shadow (i.e., neither ILDs or ITDs provide good cues for sound localization).

Thresholds of Hearing

Thresholds of Hearing determine the least intense sound level required to detect the presence of sound as a function of the sound's frequency. The basic measurements in hearing are the Thresholds of Hearing, which form the basis for the main audiometric test of hearing, The Audiogram.

Theory of Pitch Perception

Thus, there is not a simple rule or process that allows one to predict the pitch of any complex sound based on a physical description of the time waveform or frequency domain spectrum. Thus, there is not a well-established theory of pitch perception that can account for the pitch of all sounds. At the present time, the theories that do the best job of describing the data are those based on analyzing the temporal fine structure of sound. This suggests that the neural basis for pitch perception might be due to the ability of neurons to fire in synchrony to the fine-structure changes in the sound pressure waveform.

Measuring Noise Amplitude

Total Power (TP) and Spectrum Level (No) are the two measures that can be used for any broadband stimuli with fairly flat amplitude spectra, such as white noise. Total Power is like the area of a rectangle (No x BW), so: TP = No x BW In dB: 10log(TP) = 10log(No) + 10log(BW); TPdB = NodB + 10log(BW). If TP and BW are known, No = TP / BW; NodB = TPdB - 10log(BW). Example: No is 50 dB and the BW is between 500 and 1500 Hz. Thus, BW is 1000 Hz and TPdB = 50 + 10log(1000) = 50 + 30 = 80 dB.

Spatial Dimensions

We are able to locate the source of sound in 3D space (i.e., near-far, left-right, and up-down) often without even being aware that we do so. Sound per se provides no information about space as its only parameters are frequency, amplitude, and phase or time. A parameter for space (e.g., distance) is not a variable of sound. Spatial location of sound has to be internally computed by central neurons based on localization cues. These cues exist because sound interacts with obstacles (e.g., head) on its path from the source to the ear.

Objective Psychophysics Requires

a listener to make a decision as to whether or not a stimulus is physically changed.

Lateralization: IPD and ILD Thresholds

a) Just noticeable changes in IPD (∆IPD) as a function of base IPD (different curves) and frequency. ∆IPD increases for frequencies above 1200 Hz, consistent with the idea that ITD or IPD cues are not good for localization at high frequencies. b) Just noticeable changes in ILD (∆ILD) as a function of base ILD (different curves) and frequency. ∆IPD and ∆ILD increase when sound sources are off to one side, consistent with the MAA results. The differences in IPD and ILD increase due to the MAA off to the side (7 degrees or greater according to slide 12 in Spatial hearing.

lower threshold

it takes less to detect a sound

higher threshold

it takes more to detect a sound

Pitch

the perceptual property of sounds that allows their ordering from low to high on a frequency-based scale. Pitch is a fundamental element of music perception. For example, we recognize melodies based on the direction and size of pitch changes between musical notes. Pitch is also very important for speech perception. For example, pitch is used to identify speech prosodies (questions vs. statements), voice characteristics (age, gender, and emotion, etc.), and lexical tones (e.g., those in Chinese).

PsychoPhysics

the study of the relationship between the physical variables of environmental events/objects and the sensations and perceptions that they elicit. "The art of psychophysics is to formulate a question that is precise and simple enough to obtain a convincing answer." For instance, Do you hear a sound? Are the sounds the same or different in pitch? How much does the loudness change?


Related study sets

Milady Chapter 15. Scalp Care, Shampooing & Conditioning .

View Set

Chapter 14: Personal Selling and Customer Service

View Set

Exam 2 (chapters 4,5,6) (anatomy 1)

View Set

Chapter 1- The Speech & Communication Process: Public Speaking

View Set

Pharmacology Chapter 1 Introduction to Drugs

View Set

PrepU - Adult Nursing III - Adolescent - Chapter 29

View Set