Acoustic Phonetics #2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Two Sources of Turbulence

(1) Channel turbulence: --produced when high-velocity airflow leaves a narrow channel and hits inert outside air (as described on previous slide). --Narrower channel = faster = louder turbulence noise. channel tuburlence —> Accelerated air in the channel will then hit inert/ slow air outside of narrow channel / constriction and will create an increase in random particle commotion = turbulent noise Air speeds up as it flows through a narrow channel and then the contact between the rapidly moving air and inert air produces turbulence (2) Obstacle turbulence: --produced when airflow hits an obstacle --stream of air bounces off obstruction in vocal tract. --Creates louder noise than channel turbulence --Generates more turbulence & more noise than channel. --Characterizes the sibilant fricatives, where the teeth form an obstacle --The hissing fricatives (sibilant) are specifically taking advantage of obstacle turbulence from teeth. --Possibly also f, v with the upper lip as obstacle -- /f/ and /v/ aren't considered sibilants, even though upper lip may be obstacle Airflow bounces off some obstacle (when a streaming air bounces off the teeth, produces louder amplitude noise)

Types of Hearing Loss

(1) Conductive hearing loss = An issue in transmission of sound at the level of the outer or middle ear. • Could be caused by fluid in the ear due to ear infection, stiffening of the ossicle chain due to aging, or other factors. (2) Sensorineural hearing loss = An issue in the function of the inner ear, especially the hair cells.

Constrictions in the Vocal Tract (Perturbation Theory)

(1) Labial Constriction --Constrictions in the labial region are at VELOCITY ANTINODES for both F1 and F2 = DECREASE resonant frequencies • F1 = Velocity Antinode = Decreases • F2 = Velocity Antinode = Decreases --------------------------------- (2) Palatal Constriction --Constrictions in the palatal region are at a for F2 and near a velocity antinode for F1. • F1 = Velocity Antinode = Decreases = Lowers • F2 = Increases = Velocity Node = Raises EXAMPLE: High Front Vowels: /i//ɪ/ --F1 is low --F2 is high SO, very far apart peaks in LPC spectrum & the dark areas are far apart on spectrogram (Constriction near the palate) --------------------------------- (3) Pharyngeal Constriction --Constrictions in the pharyngeal region (pharynx) are near a velocity antinode for F2 and near a velocity node for F1. • F1 = Velocity Node = Increases = raises • F2 = Velocity Antinode = Decreases = lowers EXAMPLE: Low Back Vowels: /ɑ/ /ɔ/ --F1 is high --F2 is low SO, very close together peaks in LPC spectrum & the dark areas are also very close together on spectrogram (Constriction in the pharynx) --------------------------------- (4) Velar Constriction --Constrictions in the velar region are near a velocity antinode for F2; in between nodes and antinodes for F1 • F1 = Velocity Antinode = Decreases = lowers • F2 = Velocity Antinode = Decreases = lowers EXAMPLE: High Back Vowels: /u/ /ʊ/ --F1 is low --F2 is low SO, the peaks are semi-spaced out in LPC spectrum & the dark areas are also kind-of apart but still kind-of close (Constriction near the velum)

Parts of the Cochlear Implant

(1) Microphone (converts sound to electrical energy), signal processor, radio transmitter. Worn external to the body. (2) Receiver implanted in mastoid bone (3) Tiny electrode array wound as far into the cochlea as possible. • However, only the first turn of the cochlea can be reached with current technology. (4) 12-22 electrodes stimulate points along auditory nerve.

Voiced Initial Stop

(1) No prevoicing (2) Still see a burst, then the duration between a burst and beginning of vowel is short Burst intensity has low energy which reflects how much pressure is built up when VF are adducted

Acoustic Characteristics of Nasals

(1) Periodic Voicing --Continuous voicing in the absence of pressure buildup (2) Low frequency first formant (aka nasal resonance) --Due to long length of coupled oral and nasal cavities (3) Lower amplitude than adjacent vowels --Due to DAMPING caused by soft walls of the nasal cavity (4) Antiformants = missing resonant frequency

Voiceless Aspirated Initial Stop

(1) Prevoicing (2) Long duration between burst and beginning of vowel Burst intensity has a lot of energy which reflects how much pressure is built up when VF are abducted

Wavelength depends on....

(1) The frequency of the wave (inverse relationship) (2) The velocity of wave transmission, a property of the medium (direct relationship) **the speed of wave transmission is dependent on the medium (solid, liquid, how much moisture is in the air, temperature will affect it) ---BUT We will treat velocity as a constant (c), the speed of sound in air at room temperature. (c = 34,400 cm/sec)

How does an acoustic resonator amplify and filter sound? (resonance in a container / closed tube)

(1) When a compression wave hits a hard surface (wall of a container), it is reflected back (2) Reflected wave encounters the oncoming wave (3) if the sound MATCHES a natural resonant frequency of the container = reflected wave encounter another pressure peak = summed together = constructive interference = sound will be amplified Resonant frequencies = amplified OR (3) if the sound does NOT match the natural resonant frequency of the container = reflected wave encounters a pressure trough = cancels each other out = destructive interference = sound is reduced Non-resonant frequencies = attenuated ("filtered out") = troughs = destructive interference

Sound wave movement / resonance in a tube with one open end

(1) When a sound pressure peak hits the open end of a tube, it undergoes a change in polarity ("anti-reflection") It reflects back with a change in polarity (changes the sign) • Meaning that a peak becomes a trough & compression becomes rarefaction, and vice versa (2) The pressure peak disperses into the open air (leaves the vacuum) and leaves behind a region of rarefaction. (3) The wave propagates back down the tube as a rarefaction instead of a compression. (4) When the rarefaction hits the closed end, it reflects with no change in polarity (as we saw in the tube closed at both ends). (5) When it returns to the open end, it changes polarity again (rarefaction at the end of the tube sucks in air from outside) (6) Now the wave reflects as a compression. (7) This means that the wave has to travel the length of the tube a minimum of four times (out and back twice) in order for a reflected peak to meet an oncoming peak. (8) The wavelength of the lowest-frequency sound that will resonate in an open tube is four times (4x) the length of the tube. = 1/4 of the wave fits in the tube

Steps of resonance in tube with open end

(1) no sound. pressure is neutral (2) region of compression in the beginning (3) compression travels down the tube (4) compressed air reaches open end of tube & pushes out. it leaves behind a region of rarefaction (5) the region of rarefaction travels back up the tube (6) rarefaction gets to the closed end of tube & bounces off as a rarefaction still !! = no change in polarity (7) region of rarefaction travels down the tube (8) rarefaction gets to open end & pushes out. it changes into a region of compression AGAIN (9) the process repeats itself, as the region of compression travels up the tube again. ****only when it reaches open end = there is a change in polarity ****for resonance to be created = 4x the length of the tube : out & back & out & back = 1/4 of the wave fits in the tube

Formant Transitions & Alveolar Stops

** F1 increases across all stops ** Length of front cavity is small, so f2 is high --> In alveolar stops [t/d], a short front cavity gives rise to a high F2. --> Starts at around 1800 Hz in an average vocal tract !!!!! **** -- Note that F2 transition can be either rising or falling depending on F2 of following vowel. • Rises in transition to front vowel (high F2). • Falls in transition to a back vowel (low F2).

/s/ vs /ʃ/ on a FFT Spectrum

** Look for the highest point / peak /s/ Because the peak is at a higher frequency, near 8000Hz /ʃ/Because the peak is at a lower frequency, near 3000Hz

Standing Wave

***2 sine waves of same frequency aren't traveling, but oscillating in a stationary way. the waves pass each other as they travel in different directions appears to stand still instead of moving. • Multiple reflections of the wave are self-reinforcing --they appear to stay still & oscillate between high & low pressure • Creates ongoing alternation between high and low pressure at the ends of the tube

Identifying i vs j on a Spectrogram

*Both have low f1, and high f2 with large space between them [aja] —> glides are shorter in duration, & vocal tract is more constricted = lower intensity = very little energy [aia] —> vowels are elongated & the intensity is greater

Identifying Glides on a Spectrogram [wi] vs [ju]

*How the formants change heading into vowel: [ju] —> F1 stays the same, F2 starts high then lowers. /u/ has a low F2 [wi] —> F1 stays the same, F2 starts low then raises. /i/ has a high F2

Wave movement in closed container

*Suppose we have a tube closed at both ends, with a speaker in one end. --The speaker plays a pulse of sound --This creates a pressure peak that begins at the speaker end and travels to the other end, leaving a region of rarefaction behind it. (each pressure peak has a region of rarefaction behind it!!!) --When it hits the wall, the pressure peak reflects off and travels back to the speaker. --Hits the speaker end and starts over again. ***composed of alternating regions of compression and rarefaction (rarefaction = lessening of density) (1) sound is introduced with neutral pressure (2) pressure peak moved to middle of tube, not showing rarefaction yet (3) peak is at the end, compression at the end, rarefaction closest to speaker (4) neutral atmospheric pressure / cancelling out (5) rarefaction reached the far end

Identifying Liquids on a Spectrogram

*focus on the formant transitions as it goes into vowel [led] —> fairly abrupt formant transitions, F3 is higher [red] —> has a dramatically low F3 height, more gradual formant transitions

VOT and Place of Articulation

-- Different places of articulation have different average VOT values. -- This has to do with the subglottal/ supraglottal pressure differential • More anterior constriction = larger volume above the glottis = lower average pressure. • Easier for vocal folds to start vibrating after release. -- Therefore, more anterior stops (e.g., [b]) have shorter VOTs. VOT duration is lowest/ shortest for labial constriction = more anterior VOT duration is greatest / longest for velar constriction = more posterior Voiced have shorter VOT than voiceless.

Formant Transitions & Velar Stops

-- In velar stops [k/g] , the major cue is smaller space between the second formant F2 and the third formant F3. ** F1 increases across all stops -- F2 is relatively high, F3 is relatively low. *Second formant and third formant come together ** Two formants come together in a characteristic "VELAR PINCH"

Types of VOT

-- Zero VOT >> Onset of vocal fold vibration is roughly simultaneous with stop release >> Voiced Stops >> Stop is released & BRIEF interval between burst and start of VF vibration "ada" -- Positive VOT >> There is a period of delay between stop release and onset of vocal fold vibration. >> Voiceless Stops "ata" >> Stop is released (burst) but a LAG before VF begin vibrating -- Negative VOT (prevoicing) >> Vocal fold vibration begins BEFORE the closure is released. >> VF are vibrating the entire time = before / during the closure & before the burst >> Voiced stops in intervocalic position *flat line = not vibrating *Jagged = vibrating Vf

Basilar Membrane of Cochlea

---Basilar membrane is flipper-shaped: • Base is narrow and stiff; vibrates less easily. = harder to set into vibration • Apex /tip is wide and floppy; vibrates more easily. • Base = high frequency sounds • Apex = low frequency sounds ---Different sections along basilar membrane have different natural resonant frequencies. • Stiff base is most responsive to high- frequency sounds. • After vibrating base, high-frequency energy dissipates and does not travel to apex. • Low-frequency sounds travel to vibrate wide, floppy apex. ---Basilar membrane effectively performs Fourier analysis on the sound signal (breaks it into frequency components ---It takes incoming sound made up of many frequencies & decomposes this complex sound into its components

Open vs Closed Tube

---In a tube with two closed ends, both ends were NODES for particle displacement (particles can't move far because of the wall of the container). --The lowest frequency that fits in a closed tube is 2x the length of the tube = 1⁄2 a wave fits in the tube • In a tube with one open end, the open end must be an ANTINODE for particle displacement because nothing is constraining them (MAXimum airflow, particles move freely & pressure is neutral in open tube). • The lowest frequency that fits in a tube with one open end has a wavelength that is four times the length of the tube = 1/4 of the wave fits in the tube. • Also called a quarter-wave resonator ***Resonance in an OPEN tube has a LOWER frequency (LONGER wavelength) than the first resonance of the tube closed at both ends.

Formant Transitions & Labial Stops

--> Labial stops [p, b]: Lip closure adds length to the front cavity. --> Large cavity resonates at a low frequency. ** F1 increases across all stops --> Labial stops feature a low F2 height (usually 600-800 Hz) that will generally rise in transition to following vowel. *** If you see ALL FORMANTS RISING, a labial stop is a good bet. -- (Reverse the logic for a coda stop - F2 falls in the transition from a vowel to a labial stop in coda position).

Vocalic vs. Consonantal Sounds

--A vowel is produced with the vocal tract mostly open. --A consonant is produced with a major obstruction in the vocal tract (articulators block airflow either partly or completely).

Vowel Formants

--Air-filled containers resonate at different frequencies based on their volume. • A larger volume resonates at a lower frequency than a smaller volume (ex: violin > cello) --Vowel formants are affected primarily by resonance in two "containers": The oral cavity and the pharyngeal cavity (throat). --point of max constriction = narrowest area · anything anterior to point = oral cavity · anything posterior to point = pharyngeal cavity • Front cavity = the space in front of the point of maximum constriction between tongue and palate. --In a neutral configuration (schwa vowel /ə/), oral and pharyngeal portions of vocal tract have roughly equal volumes. • when the oral and pharyngeal cavities have similar dimensions, they can model the vocal tract like an open tube in neutral vowels • This is where we can model a vowel like a tube with one open end (higher formants are odd-numbered multiples of the first formant). --For vowels other than ə, we consider oral and pharyngeal resonances separately. *there are mainly two formant frequencies. F3 doesn't change much for vowels, but does for consonants

Round Vowels

--In English, most back vowels are rounded Rounding the lips has the effect of lowering all formant frequencies. • Increases the effective length of the vocal tract --> from glottis to end of vocal tract is extending --ALL front vowels are unrounded!!! --The distinction between front and back vowels is based on F2 frequency. **Lip rounding increases the length of the oral cavity and therefore lowers its resonant frequency (F2). --Front vowels have a short front cavity and high F2. --Back vowels have a long front cavity and a low F2. Rounding the lips makes the front cavity longer and F2 even lower. So rounding makes a back vowel "more back"; easier to hear the difference between front and back vowels. ***If you round lips, you're exaggerating its effect & REALLLLY lowering F2 & making the cavity longer. ***Rounding = easier to hear contrast between front and back vowels **When women/children imitate an adult male voice, they lower their f0 and round their lips!!!!

Coarticulation

--In speech, multiple articulatory movements take place simultaneously. Impossible to break acoustic signal into discrete segments like IPA symbols. --Overlapping production --Producing next sound while we are still finishing first sound (anticipating) • Coarticulation refers to the fact that adjacent sounds overlap in speech production, influencing each other's articulatory and acoustic properties. EXAMPLES: --"Sue" and "see," paying attention to the shape of your lips. • If an [s] comes before a round vowel, lip rounding starts during the fricative. • If the following vowel is not round, [s] is pronounced with retracted lips. --"key" and "coo" paying attention to the placement of your tongue. • When [k] occurs before a front vowel, the tongue hits a more anterior point on the palate than before a back vowel.

Voicing Contrast in Final Position

--In utterance-final position, VOT is not a cue to voicing (because there is no following vowel). -- Duration of preceding vowel is the primary cue in this context: longer before a voiced stop. -- Can also look for voicing bar during final voiced stop closure, but often very faint or absent. [bæg] --silent gap with faint voicing bar --the æ is long in duration before a voiced stop [bæk] --silent gap with NO voicing bar --the æ is short in duration before a voiceless stop **Longer vowel duration before voiced stops in final position **Voicing bar during voiced stops in final position **Shorter vowel duration before voiceless stops in final position **No voicing bar for voiceless stops in final position

Perturbation Theory

--More accurate model of resonance of vowels in the vocal tract --> Resonances are affected by the introduction of points of constriction (narrowing). --> Location of a constriction in the vocal tract determines whether resonant frequencies will increase or decrease. --> They will increase or decrease relative to the tube they come from • This concept is called "perturbation": The constriction / narrowing "perturbs" airflow !!!!!!!!!! ------------------------------------- Basic Idea #1: Vocal tract resonances (formants) are the result of standing waves in the vocal tract. • These standing waves have velocity antinodes and velocity nodes. Basic Idea #2: A constriction near a velocity ANTINODE = open end = DECREASES the resonant frequency (ex: lip rounding) Basic Idea #3: Constriction near a velocity NODE = closed end = INCREASES a resonant frequency (ex: glottis)

Obstruents during Sound Production

--Obstruents require pressure. AKA pressure consonants —need pressure within oral space (intraoral pressure) Obstruents are sometimes described as "pressure consonants" because they require a buildup of pressure in the oral cavity (intraoral pressure). • In stops, pressure must build up to create a burst noise when released. • In fricatives, pressure must build up to force a stream of air through a narrow space between articulators. --If there is muscle weakness affecting ability to form a seal with the lips/tongue, pressure consonants will sound imprecise. ---> **Dysarthria — weakness — consonants will sound underarticulated because obstruents require a strong seal, pressure is being lost to nasal cavity • Obstruents also require a strong seal at the velopharyngeal valve.

Nasals on a Spectrogram

--Periodic voicing and formants --Low intensity (less dark formants) --Low frequency energy --Low F1 (200-300 Hz) —Concentrated energy is in low frequencies --Antiformant = white region / blank space —Periodic voicing bar at bottom —Changes abruptly from vowel to nasal (dont smoothly Segway) which explains what happens when there's a large change in the resonating characteristics of the vocal tract as soon as the velum is opened & theres coupling between the pharyngeal and nasal spaces Clear abrupt change = • Sudden change in resonance as soon as velum is lowered, adding nasal cavity to resonator. Though formant transition is rough, there is still coarticulation Velum starts to lower in anticipation of upcoming nasal sound = anticipatory nasalization

Glottal Fricative /h/

--Primary constriction/ narrowing for /h/ is in the glottis --Air becomes turbulent when it moves through glottis since it is narrowed --Sound is generated by turbulent air moving through the glottis. --It is turbulent noise, not vocal folds vibrating --Sound source is still located at glottis NOT supra glottal • So not really a supraglottal sound source... --Vocal folds are partially adducted, but not to the point of contact or vibration. --The position of the supraglottal articulators is not constrained during an [h] sound. --Tongue moves into the configuration for the vowel that comes after [h]. --Acoustic signature looks different depending on what sound comes after because of coarticulation = sounds are overlapping. --Tongue Automatically moves into configuration for whatever comes after /h/ --Beginning next sound as you finish the first one. (which is why it is in a lot word-initial position) ***Basically a voiceless version of the following vowel

LPC Spectrum (Linear Predictive Coding)

--Show the spectrum of the voice AFTER FILTERING through the vocal tract. --FILTERING INPUT of vocal tract x-axis = frequency y-axis = loudness --No time axis. --Formants appear as PEAKS in the spectrum instead of bars on a spectrogram. formants = peaks harmonics = individual lines ------------------------------- High Front Vowels: /i//ɪ/ --F1 is low --F2 is high SO, very far apart peaks in LPC spectrum & the dark areas are far apart on spectrogram (Constriction near the palate) ----------------------------- Low Back Vowels: /ɑ/ /ɔ/ --F1 is high --F2 is low SO, very close together peaks in LPC spectrum & the dark areas are also very close together on spectrogram (Constriction in the pharynx) --------------------------- High Back Vowels: /u/ /ʊ/ --F1 is low --F2 is low SO, the peaks are semi-spaced out in LPC spectrum & the dark areas are also kind-of apart but still kind-of close (Constriction near the velum) ------------------------------- Low Front Vowels: /æ/ --F1 is high --F2 is mid to high SO, the peaks are semi-spaced out in LPC spectrum & the dark areas are also kind-of apart but still kind-of close (BUT more spaced out than high back vowels)

/h/ on a Spectrogram

--Shows faint bands of energy in the same location as the formants of the following vowel. --Producing turbulent noise in glottis that passes through WHOLE length of vocal tract --Noise is filtered as the production of vowel sounds -- /h/ in he • Fairly even distribution of energy • No striations corresponding with vocal folds opening and closing • No clear formants -- /h/ in ha • lots of intensity as it corresponds with a low back vowel -- /h/ in who • Increased in intensity that correspond with formants of following vowel

Sociophonetics

--So far we have been focusing on the linguistic information conveyed by the acoustics of speech, e.g. [i] vs [u]. --However, the acoustic signal also carries indexical information about properties of the speaker. --Gender, age, race, culture/region of origin, and even sexual orientation can often be identified with above-chance accuracy based on information in the speech signal alone. --Some of these differences reflect phsyical characteristics of speakers (e.g., average differences in vocal tract size between male and female talkers). --Others are learned characteristics shared across members of a community.

Gender-affirming communication training

--Some transgender individuals elect to work with speech-language pathologists to achieve a speech/voice presentation congruent with their gender identity. --Voice training for trans speakers may place emphasis on f0. --However, modifying pitch alone may not suffice for speakers to achieve the desired gender percept (Mount & Salmon, 1988). --Considered best practice to target both fo and resonant frequencies. • For example, retracting the lips can raise formants (especially F2) to be closer to average values for cisgender female speakers.

Turbulence Noise

--Supraglottic sound is produced when the vocal tract is constricted tightly enough to produce turbulence noise. --Turbulence is directly proportionate to velocity --When flow is slow, it is usually smooth. --When velocity increases, fast-moving air strikes inert air, causing an increase in irregular/ random particle motion (turbulence). --Recall that velocity of flow increases in a narrow/constricted portion of the vocal tract. --Narrower channel = faster = louder turbulence noise. • Faster the flow, the more likely turbulence is to occur • Particle velocity increases as it goes into a narrow channel

The Ear

--The ear converts vibratory sound energy into electrochemical energy in the auditory nervous pathways. (1) outer ear (2) middle ear (3) inner ear --> takes sound from its vibratory energy form (pressure wave) and converts it into electrochemical energy --The tympanic membrane oscillates in response to the pressure changes associated with sound waves. ---> tympanic membrane = eardrum --Energy is reflected as sound passes from air to fluid as the medium of transmission; lever action of the ossicle chain in the middle ear amplifies sound to offset this loss.

How do we produce different sounds / what makes distinct sounds / how do they sound different?

--The sound produced by VF vibration is filtered by resonance in the vocal tract!!! EXAMPLE: You can keep your f0 constant but change from [i] to [u] !!! = creates differences in vowel quality = differences in formant frequencies *frequency doesnt change, but amplitude does can change resonant frequency by manipulating articulators WITHOUT changing f0 All of the source frequencies (i.e., harmonics) are still there, but some frequencies increase in amplitude and others decrease in amplitude. as we articulate & manipulate the vocal tract & change mouth shape = changing resonant characteristics of the vocal tract = how we communicate different speech sounds

Variation across speakers

--Vowel formant data from Peterson & Barney (1952). --Note overlap in vowel categories as produced by 76 speakers (men, women, children). --General clustering by category (vowel quality) --However: "Two talkers can produce the same phonetic segment with different acoustic patterns and different segments with the same acoustic pattern." --In the eh theres a lot of æ Acoustically identical vowels, but for one speaker its an e but for another its an æ --Vowel perception depends less on the absolute height of formant frequencies than on relative distances between formant frequencies. So because of this variation, we must use relative, rather than absolute --For example: A vowel with a low F1 plus high F2 is perceived as [i], independent of the exact heights of individual formants.

Constriction & Frequency

--What happens when a constriction is introduced somewhere along the vocal tract? • A constriction at the lips is close to a V for all resonant frequencies. • Lip rounding has the effect of lowering all formant frequencies. • If the constriction is near a velocity maximum (V) velocity antinode && zero pressure (node) = (ex: lip rounding).... --> Existing energy is primarily KINETIC rather than potential --> Constriction IMPEDES particle movement Think traffic: • Going from 3 lanes to 2 lanes (constriction) --> narrow space means that cars (air particles) cannot move as fast --> Frequency DECREASES *****All resonant frequencies have a velocity antinode (velocity maximum) at the LIPS because it is the open end !!! SO, a construction / narrowing at the lips (max velocity) from lip rounding, results in LOWER resonant frequency • If the constriction is near a pressure maximum (P) pressure antinode &&& zero velocity (node) --> Existing energy is primarily POTENTIAL rather than kinetic --> Constriction increases stiffness of air as a transmitting medium --> No movement. Compressing air further in vocal tract --> ENHANCE particle movement -- Think crowd of people: • Getting "squeezed" together will force some people to move. --> Frequency INCREASES *****All resonant frequencies have a velocity NODE (velocity minimum) at the GLOTTIS because it is the CLOSED end !!!

Standing Waves in the Vocal Tract

--When vocal tract is resonating, there will be places of different extremes --The first three resonant frequencies (F1, F2, F3) of the "neutral tube" vocal tract --F3 that has a higher frequency, there are MORE alternations between P & V in the vocal tract The "P" and "V" refer to points of MAXIMUM pressure and velocity!!! P = antinode for pressure V = antinode for velocity **All three resonances have a... -Velocity antinode (V) = open end (where the node at open end is pressure) -Pressure antinode (P) = closed end (where the node at closed end is velocity) *****All resonant frequencies have a velocity antinode (velocity maximum) at the LIPS because it is the open end !!! SO, a constriction / narrowing at the lips (max velocity) from lip rounding, results in LOWER frequency

Non-Point Vowels

--Will be intermediate in F1 and F2 [ʊ] [/ɛ/]

Terms associated with Vocal Tract Filter

--base resonant frequency --shape of vocal tract --formants --vowel quality (i, u, a) --resonant frequencies

Second Formant Frequency (F2) of the Vocal Tract in Vowels

--determined by backness = size / length of the front cavity • Front cavity = the space in front of the point of maximum constriction between tongue and palate. When this cavity is small, it resonates at a high frequency. --front vs back vowels *FRONT = anterior tongue position = higher F2 --Front vowels have a short front cavity and high F2. *BACK = posterior tongue position = lower F2 --Back vowels have a long front cavity and a low F2. ****The more front the vowel, the higher the F2 frequency. ***Low front vowels have lower F2 than high front vowels ----tongue body is displaced back into the pharynx in low vowels. This also opens up a larger front cavity. --The more front the vowel, the smaller the oral cavity is and the larger the pharyngeal cavity is --> the smaller the oral cavity, the higher the frequency --The more back the vowel, the larger the oral cavity is and the smaller the pharyngeal cavity is --> the larger the oral cavity, the lower the frequency --front vowels have a higher F2 frequency --Lip rounding increases the length of the oral cavity and therefore lowers its resonant frequency (F2).

First Formant Frequency (F1) of the Vocal Tract in Vowels

--determined by height = volume of the pharyngeal cavity --how tightly the vocal tract is constricted. --high vs low vowels *high vowels = low F1 *low vowels = high F1 ***When the volume of the pharyngeal cavity is small, it resonates at a high frequency. **small cavity = resonate at high frequency F1 increases as vowels get lower. ******* · The higher the tongue (small oral cavity and large pharyngeal cavity), the lower F1 frequency. --> tongue is raised up to be near palate & is tightly constricting the vocal tract --> tightly constricted vocal tract = large pharyngeal space · The lower the tongue (large oral cavity and small pharyngeal cavity), the higher the F1 frequency --> the tongue is lowered, and leaves a space between the cavities (not that constricted) --low vowels (æ, ɑ) have a higher F1 frequency --A low tongue/jaw height forces tongue body back, creating a smaller volume in the pharyngeal cavity. To match the articulatory vowel space, F1 is plotted inversely on the y-axis

Terms associated with Vocal Source

--pitch --harmonics --fundamental frequency --rate of vocal fold vibration

Sound Medium

--the velocity (speed) of transmission of the pressure wave is a property of the medium, not the wave itself. --depends on the gas or other medium through which it travels -------------------------- --Sound travels faster through a more elastic substance. --Solids are more elastic than liquids, which are more elastic than gases. --sound travels fastest in solids ------------------------------- **sound moving through air depends on TEMPERATURE The speed of sound in air is around 331 m/s at 0 degrees C. Sound travels faster at higher temperatures !!!!!!! *ex: Speed of sound in air at room temperature (20 degrees C) = 344 m/s, or 34,400 cm/sec. • We will call this value c. c = speed of sound = 34,400 cm/sec • l = c/frequency OR • λ = c/frequency

Glides vs Vowels on Spectrogram

/j/ looks like /i/ —> f1 is low f2 is high -- tongue is high and front (really wide separation between the formants) /w/ looks like /u/ —> f1 is low f2 is low — lip rounding, high back tongue position *Spectrogram is less dark in higher frequencies for glides compared to vowels * When [j] comes before [i], there is hardly any change in formant heights. * When [w] comes before [i], there is a dramatic transition in F2 height (low for [w], high for [i]).

Articulation of /r/

/r/ is produced with lip rounding Two major ways to produce the anterior constriction for [ɹ]: • Retroflex [ɹ]: Tongue tip is raised and curled to alveolar ridge. • Bunched [ɹ]: Tongue tip is lowered and tongue body is bunched near the palate. -- Narrowing of pharyngeal space (tongue root retraction) is present in both variants. -- These two tongue positions have very similar acoustic consequences. ** BUT Bunched vs retroflex is an oversimplification. Individual speakers vary: May use retroflex [r], bunched [r], or something in between. May use different tongue shapes in different phonetic contexts (e.g., onset versus coda).

Sibilants /s/ vs /ʃ/ on a Spectrogram

/s/ --Because it is skinnier / shorter in duration --It is darker in higher frequencies /ʃ/ --Because it is fatter / longer in duration --It looks more evenly distributed (darker throughout) --More concentration / darker in lower frequencies --Ends earlier

Identifying Z vs S (fricatives) on a Waveform & Spectrogram

/s/ —> Waveform: Aperiodic (random), Longer Duration, Little Intensity /s/ —> Spectrogram: No voicing bar, large white gap /z/ —> Waveform: Periodic (recurring peaks), Shorter Duration /z/—> Spectrogram: Voicing bar

Word on a Spectrogram (Identifying phonemes) — ∧kaŋ

1 — vowel —> greater intensity 2 — stop gap —> region of silence (no energy, no voicing bar) 3 — VOT —> between release burst and start of vowel (has some aspiration noise) 4 — vowel —> greater intensity 5 — nasal —> antiformant (regions of white = region of no acoustic energy), more energy is concentrated in low frequencies, intensity is low

Transitions for Labial, Alveolar, and Velar Stops on a Spectrogram

3 characteristics to remember: (1) All formants rise in labial (2) In alveolar, formant starts off at 1800 Hz and rises or falls based on next vowel (3) Velars = pinch

Resonance & Source-Filter Theory Review

A tube with one open end is a QUARTER wave resonator, which means that the base resonant frequency has a wavelength 4x the length of the tube Which multiples of the base resonant frequencies will appear as high resonant frequencies in a Quarter Wave Resonator ? ODD NUMBERED MULTIPLES ONLY Other whole-number multiples of the base resonant frequency will not resonant because they do not meet the requirement to have a NODE for particle displacement at the CLOSED end of the tube (don't really move) and an ANTINODE for particle displacement at the OPEN end of the tube (move freely) Peaks in the vocal tract filter function are resonant frequencies / formants Harmonics that don't line up with peaks get attenuated Not changing pitch, just changing the reasonnt characteristics /I/ vs /a/ — different vowel sounds without fundamental frequencies Saying /a/ at different pitches —> can change frequencies Formants are straight across Individual horizontal lines, Source --pitch --harmonics --fundamental frequency --rate of vocal fold vibration Filter --base resonant frequency --shape of vocal tract --formants --vowel quality (i, u, a) --resonant frequencies

Identifying Stops on a Spectrogram

A. /b/ = All formants rising B. /d/ = Can either rise or fall. Usually looks like /b/ C. /g/ = Velar pinch. F2 and F3 coming really close together

Identifying p, b, sp on a Waveform (Stop Voicing / VOT)

A. /b/ VOT is almost 0, Very short duration B. /sp/ = Evenly distributed energy, with a large gap (VOT) until the vowel C. /p/ It has a long VOT that has energy

Identifying p, b, sp on a Spectrogram (Stop Voicing / VOT)

A. /b/ Very little intensity, Very short duration B. /sp/ Intense aperiodic energy, very large gap until vowel C. /p/ Energy fades into vowel. Hard to see where it stops

Antiformant / Antiresonances for Nasals

Absence of energy = antiformants = created by closed off oral cavity Resonant frequencies that ATTENUATE sound instead of amplifying it.!!!!!! Created by side-branch resonators such as closed-off oral cavity ("cul-de-sac"). A frequency near the resonant frequency of the side branch will create a "trapped" resonance that does not contribute to the sound output of the system.

/s/ vs /ʃ/

Acoustic and perceptual differences between [s] and [ʃ] are greater than one might expect based on difference in constriction location alone. Why? [ʃ] but not [s] is produced with lip rounding, which further lengthens the front cavity/lowers the resonant frequencies. This pairing of a primary cue (difference in constriction location) with an additional articulatory difference (lip rounding) is sometimes termed acoustic enhancement. Lip rounding enhances the acoustic articulation of /ʃ/ = further lengthens front cavity = further makes lower frequency = further enhances acoustic difference between /s/ and /ʃ/ • Same thing we saw for front versus back vowel contrasts.

Identifying stops on a spectrogram

All formants rising for /b/ —> F1 is rising. ALL three rising transitions (labial) Velar pinch for /g/ —> F1 is rising. F2 and F3 are pinching (velar) /d/ = F1 is rising. Slight falling transition for F2 & F3 so no velar pinch (alveolar)

Low to Rising F1 (Transitions)

All places of articulation feature a LOW F1 that rises in the transition to the following vowel (and falls in the transition from a vowel to any stop). WHY? --> Recall that a tightly constricted vocal tract yields a low F1. --> To produce any stop, the vocal tract must be tightly constricted. --> Therefore, F1 rises in the transition from any stop to a vowel. *** ALL F1 transitions have a rising contour.

Antiformant of Nasals

Antiformants are created by resonance in the closed-off oral cavity. The length of the closed oral cavity differs between labial, alveolar, and velar nasals. Labial nasal = m = length of side branch is long as it extends from lips to velum Alveolar nasal = n = shorter than m as it is from alveolar ridge to velum Velar nasal = ŋ = really short SO, antiformant for m = long-side branch = lowest & antiformant for ŋ = short side branch = highest • Longest for [m], shortest for [ŋ] Therefore, [m, n, ŋ] have different antiformants. • Lower for [m] than [n], and lower for [n] than [ŋ].

Biofeedback with staRt

CSL software costs $2K ($5K with hardware)—out of reach for most SLPs. staRt: A low-cost, user-friendly app to make biofeedback more widely accessible. Collaboration between BITS lab and researchers students from Music Technology, NYU Langone, and NYU Tandon.

Gliding

Children often replace liquids with glides, which have similar acoustic properties and are easier to articulate. • [w] is the most common substitution for [ɹ]. = Read —> weed • Either [w] or [j] may substitute for [l]. = Lamp —> wamp Please —> pwease

Function of Coarticulation

Coarticulation makes articulation faster and more efficient !!! -- Overlapping production of sounds means more sounds can be produced in a second. -- Recall that the average rate of speech for a typical adult is 5-5.5 syllables per second, which works out to around 15 phonemes per second. That's fast! --With coarticulation, you get clues to the identity of a sound both during the sound (INTERNAL CUES) and during transitions to/from adjacent sounds. --Coarticulation SPREADS OUT information, giving the listener more chance to DECODE the signal. --Thus, coarticulation is more efficient from the point of view of perception as well as articulation.

Affricates Acoustics

Compared to fricatives, affricates have a shorter rise time (time from onset to maximum amplitude). Attributable to initiation with stop burst. Silent gap is not that silent because you still see voicing bar in /dʒ/ = voiced affricate You see tiny ups and Downs between vowel and consonant = periodicity = not completely flat leading up to it

Transitions

Component of a stop consonant -- Formant transitions: Changes in vocal tract resonance as it moves from one shape to another (e.g. from stop closure to vowel). -- Transition period is brief, usually around 50 milliseconds. --Shape of formant transitions provides primary cue to place of articulation for stop consonants (our topic for next time)

The Release Burst

Component of a stop consonant It is an explosion of APERIODIC sound immediately following silent gap !!!! When constriction is released, pressurized air rushes out and hits inert air in front of the constriction, generating turbulence noise. Looks and sounds like fricative noise (aperiodic energy spanning a WIDE range of frequencies) but with much shorter duration than fricatives. • 10-35 ms, but may be even shorter

Voice Onset Time (VOT)

Component of a stop consonant It is the duration of interval from the stop burst to the start of vocal fold vibration. The duration of time from the stop release to the onset of vocal fold vibration for the following vowel. May look like a brief period of silence or may feature aspiration noise VOT is not the same thing as the silent gap! VOT = AFTER stop release/burst, before start of vowel VOT is the MAIN marker of the voiced/voiceless CONTRAST / DISTINCTION for prevocalic stops in English.

Silent Gap

Component of a stop consonant Period of complete closure; no sound is escaping as pressure builds up behind constriction. Vocal tract is completely closed off, no sound is being emitted, and pressure is building up • May not be distinguishable from preceding/following silence in utterance-initial or -final position. In "pan" the silent gap isn't distinguishable from the other silence before it. So in order to see a stop in word initial position, you need to add "a" before it

Components of a Stop Consonant

Components of the acoustic signal of a stop in syllable-initial position: • (a) The silent gap • (b) The release burst • (c) The voice onset time (VOT) • (d) The formant transition(s)

Bursts in Stop Consonants

Different stop places of articulation differ in the acoustic properties of the burst. • However, this is NOT the best/most reliable cue to stop place of articulation (that will be formant transitions). The sound produced by the stop burst resonates in the vocal tract ANTERIOR to the constriction (like fricatives) Size of this resonating cavity will determine spectral properties of the burst. --> For labial stops, there is no real cavity for burst noise to resonate; low-intensity energy is distributed EVENLY across frequencies. --> For an alveolar stop, burst noise resonates in a small front cavity. Like alveolar fricatives, energy is concentrated at HIGH frequencies. --> Velar stop bursts have a longer front cavity. Energy is concentrated in the MIDDLE frequency range.

Nodes and antinodes in a longitudinal standing wave

Displacement node: --pressure oscillates between compression and rarefaction (alternate between high & low pressure) Displacement antinode: --particles move freely --pressure stays neutral.

Limitations of auditory coding by Cochlear Implants

Electrodes may not be matched to the "expected" frequency place in the cochlea Each electrode carries information about a relatively broad range of frequencies A single electrode may stimulate a broad range of nerve fibers.

Identifying vowels on a spectrogram

Emphasizing resonant frequencies = WIDEBAND = can see the difference between vowel quality! We can identify vowels on a spectrogram using the relative HEIGHTS of F1 and F2 Wideband: --time = x-axis --frequency = y-axis --At the bottom of the spectrogram, there's a really dark area. this is a CONCENTRATED BAND OF ENERGY!! IT is the fundamental frequency / voicing bar / vocal source!!! --------------------------------- **High vowels = low F1. **Low vowels = high F1. *ʊ has a slightly higher F1 than the other high vowels that are supposed to have a low F1 because of its tongue height position *mid vowels are somewhere in-between, but closer to low vowels **A relatively high F1 frequency is still relatively low, since it obviously cant go above F2 ---------------------------- **Front vowels = high F2. **Back vowels (especially rounded) = low F2.

Acoustics of Vowels Review

F1 = volume of pharyngeal cavity F1 is higher for low vowels F2 = volume of oral cavity Posterior / back vowel = longer front cavity = lower F2 In a tube with one open end, the closed end is a NODE with respect to velocity (up against wall, cant move) and an ANTINODE with respect to pressure (fluctuating between max and low pressure) The open end is a ANTINODE with respect to velocity and a NODE with respect to pressure. (Perturbation Theory)Constriction near a VELOCITY antinode should LOWER the resonant frequency relative to its value in a neutral tube. (Perturbation Theory) Constriction near a PRESSURE antinode should RAISE the resonant frequency relative to its value in a neutral tube.

Formant Frequencies

F1: INVERSELY related to tongue height • High vowels = low F1 • Low vowels = high F1 F2: DIRECTLY related to tongue frontness • Back vowels = low F2 • Front vowels = high F2

Acoustics of /l/

F3 does not fall in [l] like in [r]. F3 remains unchanged for /l/ = No lowering of F3. F3 Looks like a flat line Abrupt transitions from vowel to consonant Can also have anti formant = blank space Looks like a nasal on a spectrogram - abrupt formant transitions, possibility of antiformants (pocket of air above tongue).

Fricative vs Affricative on a Spectrogram

Fricative = continuous energy Affricate = large white space, then some noise

Fricative vs Affricative on a Waveform

Fricative = low intensity, continuous aperiodic noise Affricate = total flat line, followed by intense energy

Formant Transitions for Fricatives

Fricatives & affricates contain internal (intrinsic) cues to place of articulation, making the formant transitions LESS important. Rising F1. Falling F2 and F3 from ʃ into aɪ

Resonant Frequency of A Closed Tube

If a certain frequency has a wavelength that fits evenly into a container, whole-number multiples of that frequency will also create resonance !!!!!!! **The lowest resonant frequency is the longest wave that fits in the tube. lowest resonant frequency = base resonant frequency. **The longest wave has a wavelength (λ) that is 2 times the length of the tube (λ = 2L) R = resonance λ = wavelength L=length of tube c = speed of sound = 34,400 cm/sec F = c / λ R = c / 2L **the higher resonant frequencies have shorter wavelengths.

Source-Filter Theory

In speech.....source + filter = output the vocal folds vibrate, producing sound (source of energy) AND the vocal tract acts as a resonator that amplifies some frequencies and attenuates others (filter) (Johannes Mueller, 1848) filter = vocal tract source = vocal fold vibration filter is independent of source When a complex source wave is passed through a resonator (filter): • Frequency of components is determined by source • Amplitude of the components is determined by filter If you force air through larynx / sound produced by larynx in isolation = squawking (doesn't sound like speech at all) SO, sound is not just the VF vibrating, but needs the resonance to filter it!! larynx produces a whole range of noises but only some are being filtered by resonance, which is what we hear

The Cochlea

Inner ear Cochlea = snail-shaped ---In the cochlea, vibrations trigger action potentials in hair cells that are transmitted to the auditory nerve. ---When hair cells encounter vibrations = trigger / release action potential that transmits to auditory, then to the brain ---Vibratory waves entering cochlea set fluid inside into vibration. ---Basilar membrane is a flexible structure that vibrates with these waves. • It vibrates in synchrony • Like a series of tuning forks

Light vs Sound

Light is typically described in terms of wavelength (λ = v/f) Sounds are typically described in terms of frequency. **But wavelength determines whether a sound will "fit" into a container and produce a standing wave.

Nasal Resonance

Long tube = resonate at low frequency Short tube = resonate at high frequency Since you're closing off the oral cavity for nasals, the pharynx to the nose is the longest tube we've dealt with thus far. • Conjoined pharynx and nasal cavity create a LONG tube (~20-21 cm) SO, we expect F1 to be super low, which can overlap with f0, so you wouldn't be able to tell the difference between where the voicing bar ends / begins • F1 may be as low as 200-300 Hz; may overlap with fo.

Acoustics of /r/

Low F3 is the acoustic hallmark of [ɹ]!!! intensity is lower in [ɹ] than surrounding vowels (vocal tract is more constricted, so less energy escapes) /r/ has a Clear acoustic signature that pertains to F3 --> It plunges down Low F3 Low intensity F2 is intermediate in height. Third formant, F3, is lowered dramatically, bringing F2 and F3 close together

Perceiving / Locating Nasals

Nasal murmurs do NOT provide strong place cues They dont have strong intrinsic cues, so the formant transitions become important !!! Transitions are a more reliable cue to nasal place than the intrinsic cue of antiformant frequency !! Once you add a vowel before it, it becomes easier to hear!! *** Nasal stops are also associated with formant transitions in neighboring vowels, just like we saw for oral stops. Repp (1986): Listeners were only 72% accurate in discriminating [n] vs [m] in isolation BUT!! 95% accurate with the addition of the first 10 msec of the following vowel.

Obstruents & Sonorants

OBSTRUENTS: stops, fricatives, affricates. Vocal tract is constricted which creates turbulent noise which is a supraglottal sound source. This supraglottal may be paired with a glottal sound source for voiced obstruents. SONORANTS: nasals, liquids, glides. More vowel-like. Vocal tract is less constricted = pretty open. Degree of constriction is not enough to make turbulence. Airflow is relatively unobstructed. There is not an aerodynamic/ pressure issue. These are ALL VOICED.

Affricates on a Spectrogram

On a spectrogram, looks like a combination of a stop (silent gap) and a fricative. --Fricative portion of affricate is usually shorter --Absecene of voicing bar in /tʃ/ --Waveform is very flat throughout stop

Identifying Vowels on a LPC Spectra

Peaks = formants, resonant frequency of vocal tract A. /a/ (low back vowel) = F1 is high, and F2 is low = very close together peaks B. /u/ (high, back vowel) = F1 is low, and F2 is low = not close, not far apart peaks C. /i/ (high, front vowel) = F1 is low, and F2 is high = very far apart peaks

Perturbation Theory & /r/

Perturbation theory predicts extreme lowering of F3 of /r/ Explains how the articulatory characteristics of [ɹ] lead to low F3 = constriction at V antinode In perturbation theory, a constriction near a velocity antinode results in a decrease in formant frequency relative to values for a neutral tube. • [ɹ] is produced with constrictions at the lips, palate/postalveolar region, and F3 pharynx. • The standing wave for the third vocal tract resonance (F3) has a constriction at each of these points. • Three constrictions at V antinodes lead to extreme lowering of F3.

Standing Waves

Pressure and velocity are 90 degrees out of phase. ---When particles are stationary, pressure is maximized (oscillates between compression and rarefaction). --When particles move freely, pressure is neutral. --As a result, a node for pressure is an antinode for velocity, and vice versa (antinode takes on extreme values)

What makes sounds different / What creates such distinct sounds from [i] to [u] while keep fundamental frequency constant?

RESONANCE *All of the source frequencies (i.e., harmonics) are still there, but some frequencies increase in amplitude and others decrease in amplitude. • This creates differences in vowel quality (e.g., [i] vs [u]).

Sampling Rates & Fricatives

Recall that digital sampling rate limits the range of frequencies that can be represented. • Frequencies above the Nyquist (equal to one-half the sampling rate) are not represented accurately. • High-frequency energy in fricatives requires a high sampling rate. • Phone systems often sample at 8K, which cuts off all frequencies > 4K • For phones, it has 8000 hertz, which means nyquist is 4k --> A lot of energy is above 4k hertz for fricatives, which makes it hard to hear them --> Speech is still intelligible on phone, but fricatives are distorted --> With a lower sampling rate, we lose higher frequencies • OK for most sounds (e.g., vowels), but can distort fricatives

Vibration / Oscillation

Repetitive movement around a central point, like the swinging of a pendulum. Oscillation can be free or forced. free = set in motion and it goes back & forth by itself based on physical qualities forced = keep giving extra energy. have to keep pushing it so it stays in motion ex: pushing child on a swing = forced oscillation ***If you apply force exactly at the natural PEAK of each oscillation..... --the energy will build up --amplitude increases --frequency stays the same ***If you apply force at a higher or lower frequency... --there is no buildup of energy

Formant Transition

Resonant frequencies change as the vocal tract moves from one configuration to another. In the speech stimulus, the rapid shift in frequency that precedes a formant. In a CVC syllable, the first and last parts of the vowel will reflect movement to or from theposition for the adjacent consonant. Example: Formants are moving as the vocal tract moves from its configuration from p to a in POP *** Changing regions of vowel formants are FORMANT TRANSITIONS. *** Part that does not change is the STEADY STATE • Transition duration is short, ~50 ms. ***REMEMBER Formant transition provides PRIMARY CUE to place of articulation for stops - resonant characteristics tell where the articulators are moving from/to.

Fundamental Frequency vs Resonant Frequency

SOURCE: **can change fundamental frequency but still have the same vowel (ah, ahh, ahhhh) -- All 3 perceived as same vowel FILTER: **can change resonant frequency and have different vowels (e, i, u) --The first resonant frequency of the vocal tract is NOT the same thing as the fundamental frequency of vocal fold vibration (or the first harmonic). --The source and filter are distinct and independent of one another. Formants are the resonant frequencies of the vocal tract. They depend on the shape of the vocal tract (filter). They do not depend on the fundamental frequency (fo ) of the sound source!

How are source and filter independent?

SOURCE: **can change fundamental frequency but still have the same vowel (ah, ahh, ahhhh) -- All 3 perceived as same vowel FILTER: **can change resonant frequency and have different vowels (e, i, u) // you can change vowel quality without changing f0 frequency

Antiformant on a Spectrogram

Show up on spectrogram as areas of especially low intensity (white regions). white / blank areas

Spectra for Sibilants & Non-Sibilants

Sibilants have clear energy peaks in high frequencies. --->> clear peak in spectra --->> There is some frequency being amplified --->> Location of peak differs: S has a peak at a higher frequency because it has a short cavity in front of the constriction ------------------------------- Non-sibilants have relatively flat spectra; energy declines in high frequencies. --->> spectra is pretty flat --->> /f/ energy declines in high frequencies

Sonorant Summary

Sonorant consonants resemble vowels (periodic voicing, formant structure) but with generally lower intensity. (1) Nasals • Low frequency due to large/long resonator • Especially low intensity: damping due to large surface area of resonator • Anti-formants are "trapped" resonances that are subtracted from the acoustic output (2) Glides • Similar to their vowel counterparts in production and acoustics • Differentiated primarily by duration and intensity (3) Liquids • Rhotics are characterized by particularly low F3.

Identifying Consonants

Sonorant — Regularly repeating striations / peaks But energy/ intensity is less than neighboring vowels Has a voicing bar Stop: No acoustic energy. Virtually absence of sound Blank space. Looks like a flat line Fricative —> fairly low intensity energy. More concentrated in higher frequencies Affricate —> a blank space / flat line/ silent . FOLLOWED by a burst of high insanity, aperiodic energy, very concentrated for higher frequencies

Sound Resonance

Sound at the natural resonant frequency of an object will create a larger-amplitude oscillation than other frequencies. Even a fairly low-intensity sound at the RIGHT frequency can produce a large-amplitude vibration. ex: Breaking a glass by sustaining a high note: --tap glass and try to repeat / match that frequency generated --if you sustain this high note, the glass will eventually break because you are producing the same resonant frequency of the glass

Resonance as a Filter

Sound of a vibrating string has multiple frequency components (harmonics). --sound vibrating produces a SERIES of harmonic frequencies, not just one frequency Frequencies close to a resonant frequency of the body of the instrument are AMPLIFIED. Frequencies DISTANT from the resonant frequency are ATTENUATED = reduced = troughs. • The larger the volume of the resonator, the lower the resonant frequency. • Cello vs violin **(violin = the smaller the volume, the higher the resonant frequency) • Blowing across a bottle that is mostly full vs mostly empty **(as the bottle gets fuller, the frequency gets higher!!)

Stops Articulation

Stop consonants: A complete constriction is formed in the oral cavity (with simultaneous closure at the velopharyngeal port), and pressure builds up behind it. Primary sound source is created as pressurized air is released from behind a constriction in the oral cavity. When constriction is released, air typically rushes out in a brief burst of turbulence noise ("release burst"). HOWEVER • Not all stops have oral release. • Before a nasal with same place of articulation (e.g., hidden), a stop may be released by lowering the velum and letting pressure out through the nose. • Unreleased stops are the norm for the first in a sequence of two consonants (e.g., hot dog, Batman) • Final stops may also be unreleased (e.g., that's a wrap, how about that). • Glottal stop: Airflow is briefly blocked by complete adduction of the vocal folds.

Resonant Frequency of A Open Tube

The lowest resonant frequency is the longest wave that fits in the tube. The length of the longest wave is 4 times the length of the tube (λ = 4L). **The higher resonant frequencies (R2, R3, etc.) are ODD-numbered multiples of the base resonant frequency. ***Only the odd-numbered multiples have a displacement node at the closed end and antinode at the open end. (Even-numbered multiples fit in tube BUT do NOT meet this criterion and therefore do not create resonance) EXAMPLE: (2 * the base resonant frequency) will NOT resonate in open tube ***Odd-numbered multiples of the base resonant frequency have a antinode node at the open end of the tube and therefore will create resonance. ----------------------------- Resonant frequencies of OPEN tube, for a tube that is 20 cm long: F = c / λ R = c / 4L R = resonance λ = wavelength L=length of tube c = speed of sound = 34,400 cm/sec First Resonance (1/4): 1⁄4 wave fits in the tube = 1 x (1⁄4) = R1 = (1 X 34,400) / 4 (20) R1 = 430 Second Resonance (4/3): 3 x (1⁄4) = R2 = (3 X 34,400) / 4 (20) R2 = 3 x R1 R3 = 1290 Third Resonance (4/5): 5 x (1/4) R3 = (5 X 34,400) / 4 (20) R3 = 5 x R1 R3 = 2150 ---------------------------- What is the lowest resonant frequency of SCHWA /ə/, for a tube (vocal tract) that is 17 cm long: R = c / 4L First Resonance (1/4): 1⁄4 wave fits in the tube = 1 x (1⁄4) = R1 = (1 X 34,400) / 4 (17) R1 = 506 Hz --More higher resonances.... Second Resonance (4/3): 3 x (1⁄4) = R2 = (3 X 34,400) / 4 (17) R2 = 3 x R1 R3 = 1518 Third Resonance (4/5): 5 x (1/4) R3 = (5 X 34,400) / 4 (17) R3 = 5 x R1 R3 = 2530 **the longer the vocal tract, the lower the frequency

Nasal Murmur

The sound produced during a nasal stop the sound emitted during the production of a nasal

Articulation of Nasals

The velum is lowered. Open VP port. • The velum or soft palate is a muscular structure that can be raised to contact the posterior pharyngeal wall. • When the velopharyngeal port (VP port) is closed, air is forced to exit through the oral cavity. • If airflow is blocked or forced through a narrow channel, turbulence noise may be generated. • When the velum is lowered, oral and nasal cavities are coupled; sound from glottis can resonate in both spaces. • If the oral vocal tract is open, a nasal vowel is produced; if it is closed, a nasal stop is produced. • With velum lowered (VP port open), oral and nasal cavities join together, creating a significantly longer/larger resonator. Unlike other sonorants, vocal tract is FULLY constricted/closed (labial, alveolar, or velar place). • But unlike stops, air pressure does not build up behind the oral closure. • Air flows freely through nasal cavity. • Voicing is not impeded.

VOT vs Silent Gap

Though they both look like periods of silence... • Silent gap: BEFORE stop release/burst. • VOT: AFTER stop release/burst

Diphthong

Three English diphthongs do not have a corresponding monophthong: • [aɪ] • [ɔɪ] • [aʊ] Two diphthongs are allophones in complementary distribution with a monophthong (diphthong occurs in a stressed or final syllable, monophthong elsewhere): • [eɪ] ~ [e] • [oʊ] ~ [o] Acoustically identical vowels, but for one speaker its an e but for another its an æ *****Each diphthong has a characteristic F1-F2 pattern reflecting movement between two vowel targets.

Release Burst on a Waveform & Spectrogram

VOICELESS Stops = (p, t, k) ex: ata WAVEFORM: --Very large intensity after the silent gap / flat line --Longer duration than voiced --This reflects aerodynamics = vocal folds are abducted (open) = more air getting through = more pressure SPECTROGRAM: --Easy to see where it begins but hard to see where it ends --More concentrated in higher frequencies --Very dark at the top of spectrogram --Where the energy stops being concentrated in high frequencies at the top & voice onset time interval that has aspiration noise that is more evenly distributed ------------------------- VOICED Stops = (b, d, g) example: ada WAVEFORM: --Very small intensity after the the silent gap / flat line --Shorter duration --Lower intensity --Reflects aerodynamics—> need pressure to create turbulent noise. Not enough air flow = not enough pressure. Vocal folds are adducted (closed) so LESS air is getting through --This reflects the fact that the adducted vocal folds impede airflow, restricting the buildup of intraoral pressure. SPECTROGRAM: --Shorter in duration --Concentration is at the top

Silent Gaps on a Waveform & Spectrogram

VOICELESS Stops = (p, t, k) ex: ata WAVEFORM: --Voice onset time is characterized by aspiration --Long duration (Closure appears to be longer) --Period of low energy that looks like a FLAT LINE and then ends abruptly --The almost-flat line has random, aperiodic fluctuations in energy = random noise SPECTROGRAM: --Low acoustic energy --NO voicing bar --Just a big blank space ------------------------- VOICED Stops = (b, d, g) example: ada WAVEFORM: --Still random noise but slightly reliable peaks and troughs that correspond with vocal folds vibrating --Has periodic fluctuations --Shorter in duration SPECTROGRAM: --Very faint energy in lowest frequencies --Has a voicing bar = not completely blank

Oral vs Nasal Sounds

Velum raised = oral sound Velum lowered = nasal sound

Voiced & Voiceless Fricatives

Voiced and voiceless fricatives look similar apart from presence/absence of voicing bar (low-frequency energy corresponding with the fundamental frequency and lowest harmonics). ------------------------------ VOICELESS fricatives (/f/ /θ/ /s/ /ʃ/ /h/): --Spectrogram: No energy in lowest frequency --Waveform: No regular up and down pattern/ just a noise. >> Voiceless fricatives also tend to be longer in duration than voiced fricatives. -------------------------------- VOICED fricatives: --Spectrogram: Has the Voicing Bar at the bottom (visible energy in lowest frequency) --Waveform: there is a recurring pattern during duration of fricative. >> Noise is present but super imposed on pattern >> Gets louder as person transitions out of saying this fricative >> Striations reflecting vocal fold opening/closing also visible in voiced fricatives. ** Look for periodicity in the waveform during the fricative, as well as a voicing bar. —Voiced fricatives are rare and we tend to devoice them because it takes a lot of effort —Airflow is lower for voiced fricatives • Voiced fricatives in English are [z, ʒ, v, ð]. • Voiced fricatives have two sound sources: --- (1) Voicing: Sound source at the glottis --- (2) Frication: Turbulence noise at the constriction

Voicing of Consonants

Voiceless = produced without vibration of the vocal folds. Voiced = produced with vibration of the vocal folds. But actually, there are many acoustic cues to voicing PRIMARY cue to voiced-voiceless contrast in English is VOT (Voice Onset Time). VOT = Duration of interval from release of stop to onset of periodic vocal fold vibration for the following vowel. VOT can be a positive number, a negative number, or zero.

Voiceless & Voiced Stops in regards to Silent Gap

Voiceless stops during silent gap = (p, t, k) = --True silent interval --Longer in duration Voiced stops during silent gap = (b, d, g) = --Shorter in duration --Low intensity --Low frequency energy --Has voicing bar --NOT truly silent • Vocal folds keep vibrating during pressure buildup stage. • With vocal tract closed, only a small amount of sound gets through.

Vowel Classification

Vowels are categorized in two major dimensions based on the placement of the tongue: • Tongue height • Tongue advancement or backness --on the vowel quadrilateral, height = y-axis backness = x-axis Secondary dimensions of vowel categorization: • Lip rounding • (Tense/lax)

F1 and F2 of the Vocal Tract in Vowels

We described resonance in terms of two tubes (pharyngeal cavity and front cavity). F1 = resonance in pharyngeal cavity F2 = resonance in front cavity BUT, they can swap cavities sometimes... it is not always 100% • Cavities are coupled, not separate • Not always true that F1 goes with the pharyngeal space and F2 with the front cavity - formants can "swap cavity affiliation"

Biofeedback (FFT Spectrum)

When a child is stuck on a sibilant distortion, use real-time FFT --> real-time FFT spectrum from the CSL Sona-Match software. -- Sibilant distortions can be challenging to remediate. • If they are palatalizing /s/, it looks like an /ʃ/ --->>> So that the peak/ concentration of energy is at the higher frequencies like /s/ -- Biofeedback: contrast the speaker's real-time spectrum with a trace of the desired acoustics. --Could also use for lateralized or dentalized productions, which have a flatter frequency spectrum than [s].

Antinodes

When peak aligns with peak or trough with trough, they create locations with maximum displacement goes to extremes of standing wave positive & negative values

Nodes

When peaks align with troughs, they create locations with zero particle displacement points where standing wave isn't moving stationary = 0

Spectrogram Reading of Diphthongs

[aɪ] --starts out: • high F1 • low F2 --goes to: • low F1 • high F2 **Formants start out close together then separate and become far apart [oʊ] --starts out: • medium F1 • low F2 **Formants both get lower (sliding down) COMPARED TO: æ (regular mono-thong) --steady state --lines are pretty straight --not changing quality • high F1 • high F2

Acoustic Resonator

air inside a partly or fully enclosed container is set into vibration. Body of musical instrument, pipe organ; VOCAL TRACT example: --Sound is created by plucking/bowing a string. --This sets air inside the hollow body of the instrument into vibration. --Resonance in the container AMPLIFIES the sound --Sound of instrument is much louder than the sound of plucking the string by itself. --Resonance will also filter the sound.

Resonator

an object set into vibration by another vibration. • Resonators do not initiate sound energy; they respond to it !! ex: pictures rattling on the wall = resonator

Which vowels have a low F2

back vowels (u, ʊ, o, ɔ, ɑ)

Tacoma-Narrows Bridge

example of resonance: The Tacoma-Narrows Bridge vibrating in sympathetic resonance with the wind wind resonance matched with the natural responses of the bridge = --the energy will build up --amplitude increases eventually stretching the bridge to the point of breaking !!! *Why soldiers NOW break step when marching across a bridge!

Which vowels have a high F2?

front vowels (i, ɪ, e, ɛ, æ)

Which vowels have a low F1

high vowels (i, ɪ, u, ʊ)

Formants vs Harmonics

how dark = formants individual lines = harmonics perfectly straight, individual lines = not changing in pitch • Two narrow-band spectrograms: ---Speaker A = different vowels. alternating between /i/ and /ɑ/vowels at a constant pitch. • Formants (property of the vocal tract filter) move up and down. Harmonics (property of the vocalsource) stay steady. ---Speaker B = different pitch of same vowel. saying /ɑ/with up-and-down gliding of pitch. • This time, the harmonics go up and down, and the formants stay steady This constitutes a double dissociation demonstrating independence of source and filter: ---A single vowel sound can be produced at different pitches (f0). ---Different vowel sounds can be produced with the same pitch. ---These would not be possible if the source determined the formant frequencies, or vice versa.

Wavelength and Frequency

inversely related!!!! As wavelength get smaller, the frequency gets higher. the higher the frequency, the shorter the wavelength (waves emitted fast so the distance between peaks is small) As wavelength gets longer, the frequency gets lower. Frequency = speed of sound / wavelength F = c / λ

Resonance in Speech

it is not in a closed tube mouth = open end of vocal tract glottis = closed end of vocal tract

Liquids

l, r • Produced with a tongue constriction that does not generate frication noise. • No turbulence noise in supraglottic vocal tract • Air is flowing smoothing • Air flows smoothly; vibrating vocal folds provide sound source. •Laterals and rhotics are grouped together because they pattern together phonotactically -- ex: may form a cluster: [prV] or [plV]; [Vrp], [Vlp] • Other languages: -- Allophonic relationship (e.g., Korean: rhotic in onset, lateral in coda) -- In free variation (e.g., Japanese)

Nasals & Intensity

large surface area = absorption of sound = nasals are damped in intensity On a spectrogram, nasals look like vowels with a very low intensity. Reasons for loss of intensity: • Large resonator has large surface area to absorb sound. • Soft, mucus- and cilia-lined nasal cavities are particularly absorbent. • Absorption (rather than reflection) of sound = damping; **Nasals are highly damped. Because of their low intensity, nasal sounds are easily confusable with each other.

Approximants

liquids and glides

Which vowels have a high F1

low vowels (æ, ɑ)

Resonant frequencies of closed tube

lowest resonant frequency = base resonant frequency Resonant frequencies of closed tube, for a tube that is 20 cm long: F = c / λ R = c / 2L R = resonance λ = wavelength L=length of tube c = speed of sound = 34,400 cm/sec First Resonance (0.5): 1⁄2 wave fits in the tube = 1 x (1⁄2) = R = 1 X (34,400 / 2 (20)) R = 860 Second Resonance (1): whole wave fits in the tube = 2 x (1⁄2) = R = 2 X (34,400 / 2 (20)) R = 1720 Third Resonance (1.5): 1.5 waves fit in the tube = 3 x (1⁄2) = R = 3 X (34,400 / 2 (20)) R = 2580 Resonant frequencies of closed tube, for a tube that is 10 cm long: R1 = 1720 Hz R2 = 3440 Hz R3 = 5160 Hz

Hypernasal Voice Quality

may appear as a continuous band of low-frequency energy on the spectrogram.

Types of Resonators

mechanical and acoustic

Sonorant Consonants

nasals, liquids, glides In a sonorant consonant, the vocal tract is not constricted tightly enough to create turbulence noise. The only sound source for a sonorant consonant is the vibration of the vocal folds (glottal sound source). Sound of vocal fold vibration will resonate in the vocal tract (filter). Like vowels, sonorant consonants have measurable formants on a spectrogram. Often have LOWER INTENSITY than vowels because there is a still a major obstruction blocking the escape of sound.

Formants

resonant frequencies of the vocal tract R turns into F f0 is NOT part of the same series as F1, F2, and F3 higher F frequencies are odd number multiples of base frequency the first 3 formants are the ones we use to see contrast in speech!!

Input Spectrum

spectrum of sound going into a system (source-filter theory) • Here is what the spectrum of the sound source looks like after passing through the vocal tract filter function. peaks in the vocal tract filter function are resonant frequencies / formants source + filter = output Source Function = glottal pulses Transfer Function = vocal tract filter OUTPUT = radiated signal: Y-Axis = response amplitude X-Axis = frequency

Wavelength

the horizontal distance from any starting point to an equivalent point in the next cycle of the wave. Wavelength = lambda it depends on both the frequency of the wave (FREQUENCY) and the velocity of transmission of the pressure wave (SPEED), which is a property of the medium that wave is moving though NOT the wave itself λ = v/f

Mechanical Resonator

the object itself is set into vibration. Tuning fork, pendulum, string of a guitar

Standing Wave occurs when....

two sine waves of the same frequency encounter each other while traveling in opposite directions. WHEN the reflected pressure peak hits the speaker at the same time a new peak is being generated --The two peaks will sum together --The pattern will repeat itself, but stronger ***Can arise when a wave hits a boundary and reflects back, as in an acoustic resonator.

Sound

vibratory energy (oscillating particles create alternating regions of high and low pressure) Sound energy can set other objects into vibration. Ex: loud stereo from car driving by can make pictures on wall rattle = --force generated by pressure wave of sound is what sets the object into vibration

Glides

w, j very short vowels that pattern as consonants. (Occur at edges of syllable, not as nuclei). Also called "semivowels." Due to short duration (around 75 ms), generally LACK a clear steady-state interval. May be difficult to distinguish a glide from a vowel. Short duration Low intensity More tightly constricted than in production of a vowel

Resonance

when reflected wave adds together with new wave filters & amplifies sound sound produced by VF vibration is FILTERED by resonance in the vocal tract. Amplitude of the oscillation will only build up if force is applied at the natural resonant frequency of the system. = when it reaches the peak, you push it, so it keeps going higher (like on a swing) if you DONT apply the force at the natural peak = no build up of amplitude Every object has its own natural resonant frequency / every object oscillates after a force is applied !!!! it is influenced by --Mass --Length --Tension or stiffness --Shape ex for mass: the smaller the tuning fork, the higher the frequency ex: the smaller the volume, the higher the resonant frequency ex: most empty bottle = more volume = lower frequency ex for length: the shorter the string on a swing, the higher the frequency of oscillation

Constriction near a V (velocity antinode)

will decrease the resonant frequency relative to its value in the neutral tube.

Constriction near a P (pressure antinode)

will increase the resonant frequency relative to its value in the neutral tube

"Come get a banana" on a Spectrogram

— Glottal pulses are irregular at the end because this is glottal fry / creaky voice. Usually comes at the end of a sentence. — Final lengthening —> last syllable gets dragged out & is long in duration

Identifying Vowels on Spectrogram

—Always a voicing bar —important are the F1 & F2 for vowel identification A. /a/ (low back vowel) because F1 is high, and F2 is low = very close together, look like they merge together (intensity looks evenly distributed, all grey) B. /i/ (high, front vowel) because F1 is low, and F2 is high = very far apart (large whiteish gap between the formants) C. /u/ (high, back vowel) because F1 is low, and F2 is low (looks similar to /a/, but still has a small whiteish gap)

Identifying Sibilants [s, ʃ ] on a Spectrogram

—Energy is in more concentrated in higher frequencies —Greater intensity

Identifying Non Sibilants [f, θ] on a Spectrogram

—Lighter in intensity —Almost evenly distributed

Identifying Nasals on a Waveform

—Periodic peaks —Low peaks/ height/ intensity compared to vowels

Identifying Diphthongs on a Spectrogram

—formants that aren't straight & are rising or falling [aɪ] [æ] [oʊ] A. [æ] because the formants are in a steady state. Look parallel to each other. Straight across B. [aɪ] Because the formants are close but then separating from each other. Look like < C. [oʊ] Because the formants are sloping down

/s/ vs /ʃ/ on a Spectrogram

• Acoustic energy for /s/ is concentrated at very high frequencies (about 4000 Hz and up). --Little energy in low frequencies --High energy in higher frequencies for /s/ = * Increasing energy with increasing frequencies = peak frequency is near the top of the range (~ 8000 Hz). --------------------------------- • Concentration of energy for ʃ extends into lower frequencies (around 2000 Hz). --Lowest frequencies don't have energy --Then a peak in energy --Then energy falling off after = peak frequency is around 3000 Hz.

Spectrogram of Sibilants vs Nonsibilants

• Acoustically, sibilants have more intense energy (darker) concentrated at higher frequencies. Greater intensity in energy = sibilants Higher frequency = most concentration of frequency = sibilants ---------------------------- • Nonsibilant fricatives have lower-intensity energy that is evenly distributed across frequencies. Uniform in the distribution of energy (evenly distributed) = non sibilants Lower intensity = non sibilants

Other Factors to Consider with Cochlear Implants

• Age of implantation ---Earlier age of implantation is generally associated with better speech recognition outcomes. ---Even with very young implantation, outcomes still vary across individuals for reasons that are not fully understood. • Unilateral vs bilateral implantation ---Better sound localization with two CIs than one (Litovsky et al., 2004, 2006) • Deaf identity and belonging to Deaf community

Cochlear Implant Simulations

• Aim to replicate the experience of listening with a cochlear implant. • Speech is filtered into a fixed number of frequency bands. • Then the overall amplitude envelope in each band is multiplied by a band of noise. • Can use sine waves instead of noise as carrier signal. Qualitative Difference: --Sine wave carrier = more melodic --Noise Carrier = more noisy

Articulation of Fricatives

• Articulators are brought close together but without complete occlusion NEED to bring articulators close together but not fully contacting / close off vocal tract entirely. • Requires fairly precise articulatory control • Must have narrow enough constriction to generate frication (turbulent air) • But not so great a constriction that air is fully stopped Children have trouble with this, so these end up replacing fricatives with stops

Pressure & Velocity in a tube with one open end

• At the closed end: --Pressure reaches its maximal values (pressure antinode). --MAX PRESSURE --Air particles cannot move back and forth, so velocity is zero (velocity node). --MIN VELOCITY = particles cant move freely • At the open end: --Pressure is neutral (pressure node). --MIN PRESSURE --Air particles are rushing out and back in, so there is a velocity maximum (velocity antinode). --Air in tube is interfacing with outside air = particles can move out of tube then rush back in --MAX VELOCITY

How Sound is Represented in Cochlear Implant

• Audio input is passed through a series of bandpass filters (12-22 bands) • Cochlear implant represents changes in amplitude within each band using electrical current to the corresponding electrode. • Listener can hear that sound is going up and down in amplitude in a certain way over time is in this broad frequency band. • All other frequency information is lost. --can be represented in terms of waveform or spectrogram Top is a narrowband spectrogram. Bottom is same signal after filtering into 22 frequency bands. ---> limited frequency resolution of range = looks pixally Broad regions of energy concentration are the same across the signals. Fine-grained frequency information (e.g., individual harmonics) is not preserved in cochlear implant context. Relative to typical hearing, cochlear implant has both reduced range and frequency resolution. high intensity = red low intensity = blue Each electrode covers a fairly wide band of frequencies. Pitch range is optimized for speech; higher and lower frequencies (as in music) are not accommodated. Sounds may shift up or down ***Cochlear doesn't really differentiate between frequencies higher or lower than normal speech = lack of engagement of music • Increasing the number of channels can increase the frequency resolution. = finer resolution --A single channel = not a lot of intelligibility However, adding more electrodes (beyond 22) will not automatically improve performance. Activation may spread between electrodes, distorting the signal.

Biofeedback with CSL Sona-Match

• CSL Sona-Match can be used to provide biofeedback for [r] misarticulation. • Create an [r] template representing correct production (low F3). • Cue client to line up their formants with peaks in template.

Properties of Consonants

• Consonants are categorized by the properties of manner, place, and voicing. (1) Place: Where in the vocal tract the obstruction is created. • Labial, dental, alveolar, palatal, velar, glottal (2) Manner: How the airstream is modified as it moves through the vocal tract. (sonorant = resonants & obstruents = non-resonants) (3) Voicing: With/without vocal fold vibration

Place of Articulation for Fricatives

• Constriction further back --> longer tube, lower resonant frequencies EXAMPLE: /ʃ/ is a postalveolar fricative (further back) [ʃ] has a longer front cavity that resonates at a lower frequency. ---------------------------------- • Constriction further forward--> shorter tube, higher resonant frequencies EXAMPLE: /s/ is a alveolar fricative (more anterior) /s/ has a short cavity in front of the constriction. /s/ = shorter resonating cavity & higher resonating frequencies

Fricatives on a Spectrogram

• Energy extends into higher frequencies for fricatives than other sounds. • This makes them some of the easiest sounds to identify in a spectrogram. • When focused on fricatives, may want to set max frequency in Praat to 8000Hz (vs default 5000 Hz).

Aspiration of Stops

• For sounds with a long VOT (voiceless), air is flowing rapidly through abducted (open) vocal folds and unobstructed vocal tract. • Fast-moving air creates weak turbulence noise in the glottis. -- Sounds like an /h/ fricative --Called ASPIRATION NOISE • Voiceless stops in English are typically aspirated. • In certain contexts (ex: in an s-stop cluster), they become UNaspirated ↓ -- Try holding a tissue in front of your mouth while saying /p/ in pie versus /p/ in spy. --If you say spy, paper doesnt move • Voiced sounds do not have aspiration noise. • Vocal folds are adducted when stop is released, so there is no rush of airflow.

Fricative Acoustics

• Frication noise is white noise (aperiodic, distributed fairly evenly over a wide range of frequencies). • Within this band of noise, some frequencies will be amplified or attenuated, depending on position of articulators - topic of our next lecture. Wideband spectrogram —> vertical striations representing formants for vowels --BUT for fricative consonants, random energy. --Frequencies tend to be higher than vowels because they are shorter. --See white noise --More intense in high frequencies --Less intense in low frequencies set frequency higher to see the fricative

Cochlear Implants

• If there is sensorineural hearing loss but auditory nerve is intact, hearing may be partially restored using a cochlear implant. --If hair cells aren't firing action potential, then they insert the electrode array • Electrode array inserted into the cochlea stimulates auditory nerve fibers directly. • Outcomes vary widely in terms of speech perception and production. • Best outcomes are usually seen in young children or postlingually deafened individuals (i.e., who developed speech in the context of typical hearing).

Recap of VOT & Voicing

• In English, for singleton stops in word-initial position..... • Voiceless stops have a long POSITIVE VOT. -- Range is around 40 to 100 ms = long duration • Voiced stops have a near-ZERO VOT. -- Range is around 0 to 20 ms = short duration • In intervocalic position, voiced stops may have NEGATIVE VOT. ------------------------- Most English "voiced" stops do not match our initial definition of voicing ("produced with the vocal folds vibrating"), since phonation does not occur until after the stop is released. A more phonetically accurate description of English stops in initial position: • Voiceless aspirated [p^h] • Voiceless unaspirated [p], [b̥ ] peach beach = no voicing during the closure speech = Voiceless stops are produced without aspiration in an s-stop cluster. • If you cut the [s] out of speech, it sounds like beach. • Intervocalic stops may show true voicing(i.e., phonation during closure). ----------------------- Different languages mark the voiced-voiceless contrast in different ways. -- In French and Spanish (among others): • Voiceless stop: Near-zero VOT, no aspiration • Voiced stop: Negative VOT. -- Truly voiced = vibrating throughout entirety of stop (Armenian -- "bag") = have voicing bar throughout = negative VOT -- Voiceless stops produced by a native speaker of French or Spanish may sound voiced to English- speaking listeners. • And vice versa, voiced stops produced by English speakers may sound voiceless to speakers of French or Spanish.

Vowel Backness

• In a front vowel, the tongue body is advanced to a position near the palate/alveolar ridge. (anterior area of oral cavity) i, ɪ, e, ɛ, æ • In a back vowel, the tongue is backed to a position near the soft palate. (posterior area of oral cavity). mass of tongue is retracted. u, ʊ, o, ɔ, ɑ • Intermediate vowels are central vowels. /ə, ʌ, ɚ, ɝ/

Resonance in a closed tube

• In a tube closed at both ends.... --Pressure peak has to go to the end and come back to the speaker just as the new pulse is beginning --This means that the length of the whole wave is 2x the length of the tube -- ...or you could say that 1⁄2 a wave fits in the tube *sine wave of a sound goes out, then reflects, and goes back to beginning = wave is 2x length of tube *****At the ends of the tube, particles are stationary (displacement node), while pressure alternates between high and low values. *****In the middle, particles move the most (displacement antinode) while pressure remains neutral.

Aperiodic Sounds

• In an aperiodic sound, pressure values vary rapidly and randomly over time • Sounds like noise, not melodic • Random and non repeating. Not forming a pattern. Just different peaks and troughs • No bands representing concentration of energy (no formants or harmonics) just see energy randomly distributed throughout a range of frequencies ---Spectrum and waveform look similar for aperiodic --Waveform of white noise (aperiodic) shows no repeating pattern. --Spectrum of white noise shows energy distributed across a wide range of frequencies, with no clear structure (e.g., harmonics). -->> emphasizes randomness in frequency domain. -->>No particular frequency where energy is concentrated. -->> Fluctuations in frequency. -->> ABSENCE of large spikes / peaks which would represent harmonics or formants. -->> May be represented with a straight line

Vowel Height

• In high vowels, the tongue body is RAISED close to the palate. (i, ɪ, u, ʊ) • In low vowels, the jaw is more open and the tongue is lower than at rest position (æ, ɑ). • Vowels of intermediate configuration are mid vowels. (neither raised nor lowered) [e] [ɛ] [o] [ɔ] [ʌ] [ə]

Liquids in Development

• Liquids are some of the latest-emerging speech sounds in English. • Smit et al. (1990): Age of mastery for [ɹ] -- 90% of people can correctly pronounce /r/ at 8 years old • Crowe & McLeod (2020) revised this estimate down but still place [ɹ] and [l] in the latest- emerging class of consonants. • Possible explanation in articulatory complexity: Two major lingual constrictions instead of one (plus lip rounding in [ɹ]).

Nonsibilant Fricatives

• Nonsibilant fricatives [f, v, θ, ð] have lower-intensity energy that is evenly distributed across frequencies. Labiodental & interdental fricatives = Evenly distributed energy Lower energy = dealing MOSTLY with channel turbulence = amplitude of noice is lower than in obstacle Labiodental = somewhat obstacle turbulence. teeth are already part of constriction SO for /f/ and /v/, the stream of air is hitting the upper lip AND Hitting a soft obstacle instead of hard obstacle = lower in intensity Lower overall energy, and more evenly distributed instead of being concentrated in one place --Lower-intensity energy: Channel turbulence (and possibly obstacle turbulence striking the upper lip) creates lower-amplitude noise than obstacle turbulence. --Wide range of frequencies: The place of articulation for the nonsibilant fricatives is so far forward that there is "no appreciable resonating cavity" in front of the constriction. Uniform in the distribution of energy (evenly distributed) = non sibilants Lower intensity = non sibilants

Gender Differences in Formants

• On average, vowel formants are about 20% lower in male speakers than female speakers. • Reflects differences in average vocal tract length.

Acoustics of Sibilant Fricatives

• Sibilants [s, z, ʃ, ʒ] have more acoustic energy concentrated at higher frequencies than other fricatives. • More energy: Obstacle turbulence creates higher-amplitude noise than channel turbulence. • Higher frequencies: Sibilants have a small, short resonating cavity that produces high-frequency resonance Greater intensity in energy = sibilants Higher frequency = most concentration of frequency = sibilants

WHEN do resonators vibrate?

• Some resonators will vibrate in response to more frequencies than others. • The range of frequencies that a given resonator will transmit is its BANDWIDTH. • A resonator that responds to a NARROW range of frequencies is called sharply / narrowly tuned = Vibrates in response to very LIMITED RANGE frequencies. • A resonator with a WIDER bandwidth is broadly tuned = Vibrates in response to MANY frequencies.

Sonorant vs. Obstruent

• Sonorant consonants are produced with fairly free airflow. The sound source for sonorant consonants is vocal fold vibration. -- Sonorants are the vowel-like consonants. • For Obstruents, airflow through the oral cavity is restricted to a point where turbulence noise is generated. --Obstruent consonants feature a supraglottal sound source. --Voiced obstruents have both a glottal sound source (vocal folds vibrating) and a supraglottal sound source (turbulence noise). ---Voiced obstruent will feature a glottal sound source (vf vibrating) & supra glottal sound source (turbulent)

Acoustics of Sonorants and Obstruents

• Sonorants (liquids, glides, nasals) are the "vowel-like" consonants. --Complex periodic (nearly periodic) waveform --Nearly periodic = melodic quality --In a spectrogram, sonorant consonants will appear similar to vowels BUT but intensity will be lower because vocal tract is more closed off for these consonants • Obstruents are the "noisy" consonants --Have an aperiodic (random, non-repeating) waveform --Aperiodic = noisy quality

Source vs Filter

• Source: Vocal fold vibration --f0 = rate of vocal fold vibration --f0 determines frequencies of harmonics. --Perceived as: vocal pitch • Filter: Vocal tract --Formants = resonances of the vocal tract --Shape of vocal tract (transfer function) determines relative amplitudes of harmonics --Perceived as: differences in vowel quality

Spectrograms

• Spectrograms can be manipulated to emphasize either the harmonics (source) or formants (filter). • Narrowband spectrogram emphasizes individual harmonics. • Wideband spectrogram smears across harmonics, making formants readily visible = bigger regions where harmonics are being smeared together = formants = easier to see on Wideband

Obstruents

• Stop (plosive): Airflow is obstructed completely, causing pressure to build up. • Fricative: Air becomes turbulent as it is forced through a narrow channel. • Affricate: Like a combination of a stop and a fricative; complete obstruction released into a narrow channel.

Biofeedback for Sonorants

• The "Real-Time LPC Response" function of the CSL Sona-Match program shows the spectrum of a vowel/sonorant consonant being produced. • Spectrum changes in real time as speaker produces different sounds. • Clinician can make a template to act as target for biofeedback.

Affricates Articulation

• The affricate sounds make up their own manner class. • Affricates are a combination of a stop plus a fricative at the same place of articulation (homorganic). • Begins like a stop, with a buildup of intraoral pressure behind a complete closure. • However, this pressure is released into a narrow constriction (frication channel) and not in a single explosive burst. • English has only two affricates, [tʃ]and [dʒ].

Longer vs Shorter Vocal Tracts

• The average adult cisgender male has a vocal tract of 17 cm. --So schwa will have resonances (formants) at approximately 500 Hz, 1500 Hz, and 2500 Hz. **taller = the longer the vocal tract = the larger the volume = the lower the frequency • Females tend to have shorter vocal tracts of about 14 cm --On average, resonances are around 600 Hz, 1800 Hz, and 3000 Hz. **the shorter the vocal tract, the higher the frequency

Vocal Tract Resonance

• The frequencies that resonate (i.e., that are amplified) are called formants. --acoustic resonators have multiple resonant frequencies. When the vocal tract resembles a neutral tube (as for /ə/), the higher resonances of the vocal tract (formants) are odd-numbered multiples (1, 3, 5) of the base resonant frequency. --A representation of the resonant frequencies of the vocal tract, termed the VOCAL TRACT FILTER FUNCTION !! Y-axis = response amplitude X-axis = frequency attenuate = troughs amplify = peaks --each formant allows certain frequencies to pass and blocks others **frequencies that are not close to any resonant frequency = blocked!! **frequencies that are close = amplified BUT the vocal tract filter is NOT extremely specific / NOT narrow = broad & fairly large amount of frequencies will be amplified ***The vocal tract is a VARIABLE resonator: Its shape changes constantly over the course of speech!!! • Move the tongue forward/back, up/down; round or retract the lips; close or open the velopharyngeal port As the shape of the vocal tract changes, so do the resonant frequencies/ formants. • When vocal tract is not shaped like a neutral tube, higher formants will be present but will not have a 1:3:5 relationship to base resonance We identify different vowels and sonorant consonants based on their different formant frequencies.

Articulation of /l/

• The lateral/alveolar liquid • Lateral describes this sound because air flows freely through the lateral edges of the mouth (between sides of tongue and cheeks). • Midline / center of tongue is raised. Still an opening between the cheeks for air to flow freely • Tongue is raised up and contacting the alveolar ridge or touching the teeth • Tongue is raised to contact alveolar ridge. • In some speakers, tongue will touch or protrude between teeth.

Sibilant Fricatives

• The sibilant fricatives in English are [s, z, ʃ, ʒ]. --Tongue forms a channel. High-pressure stream of air is accelerated through the channel and directed at lower incisors. Edges of tongue = raised up Middle of tongue = lowered --Stream of air striking incisors causes obstacle turbulence; particularly loud noise. Pressurized air flows down the middle of tongue that is lowered This air hits the front teeth (obstacle) = louder turbulence = higher amplitude noise Sibilants have a constriction around the alveolar ridge = Short, small resonating energy = more concentrated energy --Distorted quality of [s] sounds produced by children with missing teeth may reflect absence of obstacle turbulence. --->> Child who has lost their two front teeth, may have trouble pronouncing the /s/ sound

The Vocal Source

• The source (for voiced sounds) is the spectrum produced by vocal fold vibration. • f0 depends on the rate of vibration of the vocal folds • Harmonics occur at every whole number multiple of f0. first harmonic (f0) is NOT equal to the first resonant frequency • Harmonics decrease in amplitude as they increase in frequency. = they are shaped by filtering • More tightly spaced = f0 is lower • More spread apart = f0 is higher • All of these frequencies enter the vocal tract.

The Vocal Tract Filter

• The vocal tract above the glottis acts as an acoustic resonator that filters the sound source. • AMPLIFIES harmonics that are close to a resonant frequency of the vocal tract while ATTENUATING harmonics that do not resonate. --VOCAL tract amplifies some harmonics = acts as an acoustic resonator & filters • Sound that emerges from the oral cavity after filtering differs from the sound source created at the glottis --because of filtering by resonance in vocal tract, sound created at the glottis is different than sound that actually emerges from mouth!!!! -not just attenuation or removal -resonance "echoes" sound -some sine waves add -constructive interference ---a sound wave at the lips and at the glottis. the waveform changes during passage through the vocal tract. the tract has acted as a filter

Vocal Tract

• The vocal tract is NOT like a tube closed at both ends. • Vocal tract is like a tube closed at one end (glottis) and open at the other (lips). • Resonance behaves differently in this case because the wave is interacting with an opening instead of bouncing off a closed end --The vocal tract can be modeled as an air-filled tube with one open end. BUT this is only true when it is in a neutral state when it resembles a simple tube = • In its simplest configuration ([ə] schwa = most neutral) --We will modify our model for other vowels, where the vocal tract has a more complex shape. --A tube with one open end is a quarter-wave resonator. (Lowest resonant frequency has a wavelength four times (4x) the length of the tube) • Resonant frequencies of the vocal tract are called FORMANTS.

Fricative Acoustics

• Turbulence provides the source of fricative noise -- Fricatives — Turbulent noise generated at some constriction -- Voiced fricatives have an additional source at the glottis • Obstacle turbulence is louder than channel turbulence --Sibilants feature obstacle turbulence, are particularly high in intensity • Filter of fricative turbulence noise depends on the place of articulation; area in front of constriction acts as resonating cavity. --> Labials: essentially no filter (flat spectrum) --> Sibilants: short filter, emphasizing higher frequencies --> Back fricatives: longer, more vowel-like filter

Silent gap and burst in final position

• Very hard to see on a spectrogram • Final burst will be absent if stop is unreleased.

Fricative Resonance

• When the sound source is supraglottal, does resonance ("the filter") still apply? ---> Yes. The constriction acts as the sound source. If there is a cavity in front of the constriction, sound will resonate. • Frequency spectrum of fricatives depends on the length of the cavity in front of the constriction (shorter = higher). Fricatives have the supragottal sound source Dealing with resonance in a shorter cavity Longer tube = lower (male vocal tract is longer) Shorter tube = high frequencies SO same rule applies: Turbulent noise resonating in longer, larger chamber = lower frequency Turbulent noise resonating in shorter chamber = higher frequency

Aerodynamics of Voiced Fricatives

• Within and across languages, voiced fricatives are less common than voiceless fricatives • And voiced fricatives tend to undergo devoicing, e.g. [dɔgz̥ ] • WHY?? -- Aerodynamics! -- For vocal folds to vibrate, subglottal pressure must exceed supraglottal pressure, but fricative constriction raises supraglottal pressure. --Need high velocity airflow to produce turbulence, but vibrating vocal folds impede flow of air. --Can compare airflow in voiced vs voiceless fricatives by producing [f] vs [v] with a paper in front of your mouth.

Light and Dark /l/

• [l] has light and dark allophones. • Light [l] is produced with tongue tip raised and tongue back low. • Dark or velarized [ɫ] is produced with both tongue tip and tongue back raised. • Light [l] is characterized by a higher F2 than dark [ɫ]. • Dark l has a lower F2 than light l • Complementary distribution: Light [l] occurs in syllable-initial position, dark [ɫ] in syllable-final position. • leap, [lip] • peel, [piɫ]


Kaugnay na mga set ng pag-aaral

World History Ch 17, Section 3--Luther Leads the Reformation

View Set

Linux - Chapter 15 - System and User security

View Set

Chapter 2- Fluid, Electrolyte, and Acid Balance (Exam Questions)

View Set

Module 2 Quiz Questions and Problem Set 2

View Set

Data Structures & Algorithms To Know in Javascript

View Set