Speech Science
Characteristics of an Approximate
- Limited articulatory constrictions that alter resonant frequencies - Classification based on syllable position - Formant transitions typically faster than for vowels
What two articulatory behaviors help lower the spectral peak for SH?
- Lip rounding - curling of the tongue up
liquids
- Retroflex /r/ & lingual-alveolar /l/
Characteristics of a Sonorant
- Similar to vowels - Free airflow; articulation shapes the vocal tract - Characterized mainly by formant frequencies - Periodic laryngeal source (all voiced)
In English what is the approximate dividing line between VOT values for voiced and voiceless stops?
- Under 25 Ms. for voiced stops - Over 30 Ms. for voiceless stops
What happens to dormants surrounding nasal consonants? What are the consequences for intelligibility?
- Vowels affected by nasalization tend to have formants lower in intensity with wider bandwidths - Consequence: Formants are less distinct (hard to identify vowels)
What are the features of a simple periodic sound (pure tone)?
- a sine-shaped waveform - a single frequency - a certain amplitude - a certain wavelength
What Determines Center Frequency?
-Mass reduces high-frequency energy - Stiffness reduces low-frequency energy - The combination determines the optimal frequency
Three Categories of VOT
-Voice Leading -Zero onset/short-lag -Long-lag VOT
Resonance of Cavities depends on...
-Volume of cavity -Area of any aperture -Surface characteristics of the cavity -Coupling factors between diff cavities
Sampling
-You're measuring the voltage. Math that guarantees you'll get the same back that you put in
affricates
-combination of stop and fricative
sonorant or resonant
-consonant is vowel-like -voiced
fricatives
-continuants -produced with continuous airflow through the oral cavity -audible turbulence, without voicing
resonance
-every object has a natural resonant frequency influenced by its shape and size -depends on shape of vocal tract
antiformants
-frequencies of sound energy loss -frequencies at which sound is not transmitted through the vocal tract -sound is 'trapped' in the back cavity
fricative cues
-frication -transitions -voiceing
Glide cues
-gradual formant transition that is quicker than that of diphthongs
obstruent
-has a supraglottic sound source -the vocal tract is constricted tightly enough to produce turbulence noise
s, z
-high energy noise with most of the energy lying in the high frequencies (above 4,000 Hz) -Front cavity is long enough to introduce a significant resonance
Time Domain
-instantaneous amplitude across time -amplitude is Y axis (ordinate) -Time is X axis (abscissa) (Easy to tell if its square or sine or whatever)
ʃ,ʒ
-intense noise spectra with most of the energy lying in the mild to high frequencies (above 2,000 Hz) -Front cavity has significant resonance effect
liquids
-lateral and rhotic -steady state -transition
approximate
-liquid and glides and two types of non-nasal sonorants -liquid/l/ /r/ -glides: /j/ /w/
h
-low energy, flat, diffuse spectrum -the whole vocal tract filters the noise, so vowel-like formant patterns often are evident in the radiated noise
nasals
-murmur -transitions
strident/sibilant fricatives
-produced with an especially loud noise -tongue forms narrow channel -stream of air strikes incisors -/s/ and /z/
assimilation
-rubbing off of properties in other areas where they don't necessarily belong -NOT THE SAME AS COARTICULATION
consonant
-sound where produced with major obstruction in the vocal tract -one or more sound sources
vowel
-sound where vocal tract mostly open -always voiced
affricate cues
-stop gap -frication
/v/
Place: labiodental Manner: fricative Voice: voiced Aperiodic Turbulent, & Periodic Laryngeal Source, Muscles: Orbicularis Oris Inferior, Manner-presence of aperiodic noise, Place- low, Voice- Presence of Phonation
/j/
Place: palatal Manner: glide Voice: voiced Periodic Laryngeal Source, Muscles: Genioglossus, Damping, Formant transitions
/r/
Place: palatal Manner: liquid Voice: voiced, Periodic Laryngeal Source, Muscles: Superior Longitudinal Muscle, Orbicularis Oris, Rapid formant changing in F2 and F3 Formant changes and damping
liquids
/r, l/ semivowels characterized by rapid movements, formant structure, F3 distinguishes the phonemes (F3 and F4 are closer for /r/); antiformants for /l/
Consonants: Liquids
/r/ /l/
What are some examples of liquids
/r/, /l/
Liquids
/r/, /l/ Transitions similar to vowels
sibilants
/s, z, sh, Ʒ/ intense noise, differentiated among themselves by voicing and noise spectrum pulses for /z ʒ/ no pulses for /s ʃ/
What sibilants and how do they look on a spectrogram?
/s, z, shhh, "juh"/; concentrate energy above F3 with palatals even higher than alveolars
Which phonemes have steep, high-frequency spectral peaks
/s/ /z/ /sh/ /3/ sibilants; posterior
Difference between /s/ and /z/
/s/ is voiceless & longer, and /z/ is voiced & shorter. -Both have concentration 3500+, and dark noise.
/n/ has a very similar formant patterning to
/t/ and /d/
What phonemes are non-resonant Affricates
/tsh/ /d3/
Consonants: Affricates
/tʃ/ /j/
/w/ shares similar formant structure to which vowel?
/u/
High, back, rounded vowel
/u/
low F1 and low F2
/u/ has...?
glides
/w, j/ also called approximants and semivowels, gradual articulatory motion. narrow but not closed vocal tract, have formants
Consonants: Glides
/w/ /j/
Semivowel Glides
/w/ and /j/ -/w/ = /b/ + /u/ + transition to vowel -/j/ = /d/ + /i/ + transition to vowel
What are some examples of glides
/w/, /j/
Examples of tense vowels
/ɝ/, /u/, /o/, /i/, /e/
Examples of lax vowels
/ɪ/, /ə/, /ʊ/, /ɛ/, /æ/, /ɚ/
Difference between the strong, voiceless fricatives /ʃ/ and /s/
/ʃ/ starts at a lower frequency of 2000Hz, /s/ starts around 3500Hz. -Both dark noise, long durations.
central
/ʌ/ is long
Difference between /ʒ/ (voiced) and /ʃ/ (voiceless)
/ʒ/ has shorter duration, /ʃ/ has longer duration -both have greatest energy 2000Hz+ and dark noise.
/w/
Place: velar Manner: glide Voice: voiced Periodic Laryngeal Source, Muscles: Styloglossus, Orbicularis Oris, Damping, Formant transitions
affricates
/ʧ ʤ/ described as a combo of stop and fricative complete obstruction in the vocal tract, intraoral pressure builds up, release to generate fricative noise (distinguishes them from stops)
/ŋ/
Place: velar Manner: nasal Voice: voiced Periodic Laryngeal Source, Muscles: Levator Palatini, Palatoglossus, highest/longest variable,
Difference between /θ/ and /ð/ (ex: Thaw and That)
/θ/ has a trailing tail and longer duration, /ð/ has shorter duration -both have light noise and are 2000+
4 Ways a System Loses Energy/Why is there damping or decrease in amplitude?
1) Friction: air molecules rub against each other as well as the walls of the vocal tract resonator (vocal tract tissues). 2) Absorption: energy is lost/transferred to another structure (vocal tract tissues). 3) Radiation: air molecules escape from the tube and are lost. Escape from nose/mouth. 4) Gravity: exerts a force on the air molecules that opposes the inherent vibratory forces (inertia).
Characteristics of the Vocal Tract Resonator
1) Quarter-Wave resonator. Open at one end (mouth/nose) and closed at other (vocal folds/glottis). Driving frequency could be vocal folds or in tract. Responds with greatest amplitude to some frequencies while it attenuates to other frequencies. 2) Series of connected air-filled containers. 3) Irregular shape: acts like Nonuniform resonator-broadly tuned and responds to a wide range of frequencies around resonant frequency. 4) Variable resonator: frequency of response changes when vocal tract changes shape, which depends on the sound being made. Different resonant frequencies as vocal tract changes shape/configuration.
Acoustic correlates of a stop
1) silent gap 2) release burst 3) voice onset time 4) formant transition
Calculate Formants
1) take length of vocal tract. (Ex. 17cm) 2) find wavelength of lowest resonant frequency. (lambda = 4L for Quarter-Wave Resonator; 4(17)=68cm 3) use this info to find frequency. (lambda = c/f, so f = c/lambda; 34000/68=500Hz 4) calculate higher resonant frequencies. (For closed tube, multiply by 3 for 2nd formant b/c odd number multiples) 2nd=1500Hz
The lowest resonant freq (or, 1R) has a wavelength that is __ times the length of the vocal tract, while the ____ resonant freq (or 3R) has a wavelength that is 3 times the lowest frequency. Finally the ____ resonant frequency, (or 5R) has a wavelength that is five times the lowest frequency
4, second, third
F2 equation
485.7 x 3 = 1,457 Hz
F3 equation
485.7 x 5 = 2,428 Hz
wavelength
4x17.5=70
If 3 pure tones at 50 Hz, 150 Hz, and 250 Hz were added together to make a complex wave, the frequency of the complex wave would be:
50
What are the Hz for Formants I-III?
500 Hz for formant I 1500 Hz for formant 2 2500 Hz for formant 3 3500 Hz for formant IV- high pitch
Pinna resonates at what freq?
5kHz
Radiation from lips ®; gain
6 dB/octave gain
Radiated acoustic pressure wave ℗; loss
6 dB/octave roll-off
/g/
Place: velar Manner: stop Voice: voiced Periodic Laryngeal Source, Muscles: Styloglossus, Palatoglossus, Mylohyoid, Levator Palatini, Manner- silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice- +phonation, -phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration (final position),
Assimilation
A sound becomes like its neighbor; one articulator is involved
Assimilation
A sound becomes like its neighbor; one articulator is involved Partial Assimilation: no change in phonemic category EX: see powerpoint Complete Assimilation: phonemic class changes EX: Velarization of n/ before /k/ in "ten cards" Can be seen in acoustics, speech movements, and muscle activity Examples of types of assimilation (look at powerpoint)
Speech production system is comprised of ____.
A sound source and filter (resonator). Source excites the resonator (there are many different sources)
Consonant
A speech sound produced with one or more areas of the vocal tract narrowed by some degree of constriction (partial or complete); less energy, greater meaning, some are voiced
/k/
Place: velar Manner: stop Voice: voiceless Aperiodic Laryngeal Source, Muscles: Styloglossus, Palatoglossus, Mylohyoid, Levator Palatini, Manner- silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice- +phonation,-phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration (final position)
F2 transitions
Plosives- EX: /b/- frequency is low; the formant structure is like /a/ low & low. EX: /d/- starting at a mid frequency. Which means the F1 frequency is about the same as the F2. EX: /g/- frequency is high The frequency of the F2 transition of where it starts and ends; tells us about place of articulation. Burst release (concentration of energy)
Average Duration
Plosives: 10 msec. Affricates: 75-130 msec. Fricatives: 130 msec.
on a spectrum display, a voiced fricative shows... -vertical striations due to the opening and closing of the vocal folds -a voice bar at low frequencies - frication noise with a frequency range comparable to that of its voiceless cognate - frication noise somewhat weaker than that of its voiceless cognate - all of the above - none of the above
all of the above
oscilloscope
electronic test instrument allows you to see how voltages vary over time (amplitude measure)
in oral speech sounds, soft palate is ___
elevated against posterior pharyngeal wall
the levetor veli palatini muslce is involved in the - of the soft palate
elevation
analyze and give feedback for consonantal and vocalic /r/. demonstration of RTGram. and you can analyze with Praat
how to use acoustics in tx:
What are the spectrographic differences between open and close vowels?
http://www.u.arizona.edu/~ohalad/Phonetics/notes/Formants%20Spectrograms%20and%20Vowels.PDF
How do different sound classes appear on a spectrogram?
https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/spectrogram-sounds.html
Which item below is not used for measuring the glottal waveform
Spectrogram (transillumination, inverse filter, pseudo-infinite length tube, and electroglottograph all do)
These 2 instruments used together would provide you with measurements of fundamental frequency and vocal fold contact patterns
Speech Analysis using a microphone, PRAAT/ Multi-Speech and EGG
speech rate vs articulation rate
Speech rate includes pauses, whereas articulation rate excludes them
Assimilation
Speech sounds become like neighboring sounds, involves an alteration in the movement of a single articulation
Obstruents
Speech sounds produced with an obstruction of some sort in the airway. Stops, fricatives, affricates.
Resonants/Sonorants
Speech sounds produced with continuous, non-turbulent airflow. Nasals, liquids, glides, vowels. Typically voiced.
Continuants
Speech sounds without complete obstruction, but a continuous airflow. Fricatives, nasals, liquids, glides. Voiced or nonvoiced.
This portable device is used to measure vital capacity
Spiropet/ spirometer
The vowel /ɑ/ modeled as two tubes.
Start of /ɑ/ is thin tube and end is fatter tube because the tongue is in pharyngeal cavity, taking up space so it is thin at one end.
Stops: Aspiration
breathy noise generated following the release of a voiceless stop consonant as air passes between the vocal folds as they begin to adduct for following vowel
What sound is considered a high, unrounded front vowel? what are the articulatory configurations associated with this vowel
i, Low F1, High F2
consonants
identified as continuant, sonorant, strident, based upon closure and distribution of energy at higher frequencies
phonemic analysis of vowels
if 2 words are identical except for single vowel or consonant, the vowels are said to be different phonemes ex-bed/bad, beet/bit
What is delayed auditory feedback?
if a listener listens to his own recorded speech under a slight time delay, fluency speech can become disfluent, syllables repeated or prolonged (disfluent speakers can become fluent as well)
When are VOT values positive?
if onset of phonation follows stop release ex. if phonation begins 75 ms after release of stop, VOT value = +75
/θ/
Place: interdental Manner: fricative Voice: voiceless Aperiodic Turbulent, Muscles: Superior Longitudinal, Manner- presence of aperiodic noise, Place- low, Voice- Absence of Phonation
during a non-resonant affricate, what happens
airflow interrupted by a sound slowly pulls away from stop and constriction and produces a turbulent noise
egressive
airflow is...what?
open glottis = ?
airstream is only audible at the point of constriction = voiceless fricative
effects of narrowing the vocal tract at different locations
all formants are lowered by labial constriction
intonation ___ -is the rise and fall of fundamental frequency in an utterance -is perceived as the melody of speech -conveys meaning (declarative vs interrogative utterances) -conveys non-linguistic information about the state of the speaker's mind and/or emotions - all of the above
all of the above
in general, the first formant F1 is ___ in consonants than in vowels, because consonants involve ___
lower; more oral constriction
For formant 1, narrowing or rounding the lips will cause it to ______ in frequency and opening it will cause it to ____ in frequency
lower; raise. Narrowing or rounding the lips causes it to lower in frequency; opening raises it. lower tongue = raises formant raise tongue = lowers formant
F3
lowered by lip or mid tongue constriction, raised by anterior or posterior constriction
F2
lowered by lip or posterior constriction, raised by anterior constriction....the more front the sound, the higher and the more back, the lower
Sonorants
nasals, glides, and liquids, Characteristics:Similar to vowels, Free airflow; articulation shapes the vocal tract, Characterized mainly by formant frequencies, Periodic laryngeal source (all voiced)
Sonorants
nasals, glides, liquids
Resonant consonants are:
nasals, liquids, glides- similar to vowels - aka sonorants- more open vocal tract allowing resonant energy
Phonation Threshold pressure
minimum amount of transglottal pressure needed to set VF into Vibration ~In healthy adults this number is low
Node
minimum vibratory amplitude; formant frequency is raised by constriction; minimum volume velocity or maximum pressure
Sonorants: Semivowels/ glides
more constricted than vowels gradual articulator movement to following vowel good formant structure requires movement
What are sounds in English more susceptible to?
more susceptible to duration fluctuations than sounds in other languages (learned vs. psychologic difference? no one is sure what explanation for this is true)
during the close phase, a nasal has ___ energy than a stop because ___
more; air and sound may exit via the nostrils
where does perception for hearing occur
mostly in brain
where does sensation for hearing occur
mostly in inner ear
dysarthria
motor speech disorders due to neurological damage, rate changes, articulatory adjustments are neglected, diminshed acoustic contrasts incomplete stop closures, timing and sequencing are interrupted
Stops: Formant transitions
movement from stop to following vowel (or from following vowel to stop)
different sized resonating cavities
movement of the articulators creates what?
spatiotemporal dynamics
movement of the articulators relative to some frame of reference
What unit is VOT measured in?
ms
flaps
much shorter in duration than regular stops and voicing may not be distinguished
what is meant when said speech is redundant
multiple cues for same thing
manner
murur is a cue for ______ of articulation
Semi vowel
named because their articulations and acoustic features resemble those of vowels. The articulators used in their production form only minimum constrictions in the vocal tract, they are characterized by formant structures similar to those of vowels and diphthongs. The _____ are subdivided into two manners of articulation glides and liquids.
Fricatives
narrow constriction, push air through generates noise=aperiodic longer duration than stops voicless= just aperiodic noise aperiodic noise and periodic vocal folds
a tuning fork is a
narrowly tuned and has very low damping
the consonant manner class(es) that have phonationg at their only sound source are
nasal and semivowels
nasals involve resonance within the
nasal cavity
500
nasal murmur is dominated by low-frequency energy -often below ____ Hz
sonorants
nasal, liquids and glides tend to look both vowel and consonant like
velum
nasals are produces by lowering the _____, allowing a coupling of the oral and nasal cavities
vowels
nasals have highly damped formants, meaning broad bandwidths, however, nasal bandwidths are narrower than ___
Why is coordination of the articulatory and laryngeal system necessary?
necessary to create voicing differences that are reflected in VOT
If change is required, feedback will be what?
negative
the fact that no word in English can start with "tl" is an example of a - sequence constraint
negative
Prevoicing
negative VOT- release was before the start of voicing
Is there one set of accepted distinctive features?
no
do labiodental, dental, glottal fricatives have a narrow spectra?
no
Partial Assimilation
no change in phonemic category
Is accurate tactile feedback essential to accurate speech production?
no one knows for sure
voiceless stop equals
no periodic vocal fold vibration
What is partial assimilation?
no phonemic change occurs in the sound, only a phonetic change
voiceless
no voice bar at the bottom
What does aspiration diffuse?
noise energy generated at larynx (or lower)
photoglottography
bright light placed against the neck just below cricoid cartilage, probed passed to pharynx acts as photosensor, amount of light passing through glottis is proportional to glottal area
How are non resonant consonants characterized?
characterized by more restricted airflow than for semivowels or nasals
True
children under 12 years demonstrate spatiotemporal patterns similar to, but not as stable, as adults. True or False?
tracheostomies
children with this have limited interaction to hear lots of speech and limited ability to produce speech
front vowels
clearly separated F1 and F2
back vowels
close F1 and F2, but separated F2 and F3
Frication
noise energy noise is generated as air is forced through a narrow constriction
peaked spectrum
noise for stop /k/ generated in the mid-frequency range of 1.5 - 4 kHz
falling spectrum
noise for stop /p/ generated in the low frequency range of 500-1500 Hz
rising spectrum
noise for stop /t/ generated in the high frequency range above 4 kHz
What makes sounds audible in nonresonants?
noise in the speech signal makes the sounds audible regardless of whether phonation accompanies articulator or not, thus a single articulation can produce 2 different speech sounds, one phonated and one unphonated
What is frication?
noise random vibrations
cue to fricative
noise; transition to next vowel
2 types of sounds that are not perceived categorically
non speech sounds and vowels in isolation
rate
normal, fast or slow
sound pressure level in dB
not an accurate reflection of the sensitivity of the human ear (our perception of intensity - the loudness of a sound)
initial cluster with stops
not as much aspiration
How are speech sounds produced in reality?
not produced one at a time, independently of each other. They are produced in a context and are altered by neighboring sounds
stable
relationship between the vowel formants is stable or unstable?
stress
relative syllable weight based on F0 duration, instensity, amplitude and formants
rapid articulatory movements
relatively fast formant transitions (mostly F1)
What are the formant configurations for cardinal vowel [a]?
relatively high first formant; second formant just over it and slightly higher
What are the formant configurations for cardinal vowel [i]?
relatively low first formant; second formant is high.
what are the formant configurations for cardinal vowel [u]?
relatively low first formant; second formant is low as well
During nasal sounds, the levator palatini muscle is
relaxed
Stops: burst release
release of pressure behind constriction articulatory release should be very fast and burst is short (5-40msec) more intense for voiceless than voiced stops spectrum of burst varies with place of articulation burst spectrum also influenced by following vowel (coarticulation) if release is too slow, it could sound like a fricative/ fast=stop
F2 in fricatives
relevant for distinguishing labio-dental and dental fricatives
Wide Bandwidth
resolves time information well (formant structure) but frequency information poorly
formant
resonance of vocal tract
tube length and location/degree of constriction effect
resonances
the coal tract naturally resonates at these frequencies
resonant (formant) frequencies
two (general) consonant types include
resonant and non-resonant
nasal cavities form a ______ during nasal sounds
resonant chamber
The coal tract naturally resonates at certain frequencies. They are termed _____ or _____ frequencies
resonant or formant
when the styloglossus muscle contracts the tongue is
retracted
language specific
rising does not always mean question, other languages have tone on syllable as a contrastive feature
stop release
rising pressure behind the obstruction is rapidly released producing a 20-30ms transient three types: transient, frication, and aspiration
in active theories what happens if there is a large error signal
runs it through the process again and uses context
high amplitude fricatives
s z sh zsh
sibilant/strident
s, z
Strong Fricatives
s, z, ʃ, ʒ -All have dark random energy -Place of articulation determines frequency range -Voicing determines the duration and existence of voice bars or striations
one of the functions of the orbicularis oris is to
close the lips, push them forward, and help produce biblial sounds like "ooh"
Is the VP port open or closed during stops
closed
glottis
closed end of the tube (vocal tract)
the three phases of the glottal cycle are
closed, opening, closing (order matters)
stop consonants involve these three distinct components
closure, burst and transition
Consonants can be affected by
coarticulation
productive correlate of parallel processing of sounds is
coarticulation
two articulation are moving at the same time for different phonemes
coarticulation
What creates syllables?
coarticulation of vowels and consonants binding together
interaction of cues
combination of burst and vowel contributes to perception of place of articulation
Voiced obstruents...
combine periodic and aperiodic sources
the stop release burst of /k/ would be expected to have a ___ spectrum because there is ___ cavity anterior to the closure point.
compact; a large
the velar stops have a release bust with a ___ spectrum, and labial stops have a release burst with a ___ spectrum.
compact; diffuse falling
The type of assimilation while saying 'ten cards' Be specific in describing what happens to /n/
complete assimilation- /n/ becomes /ng/ due to the following velar stop
Stops: Stop Gap
complete constriction of the superlaryngeal vocal tract (VP port) voiced
Stops
completely blocking air behind articulators (vocal tract is completely closed) open tract then burst of air comes out ALL ORAL NOT NASAL
speech
complex sequencing of phonemes in rapid succession
turbulence
complex, unpredictable air flow
one of the functions of the buccinator muscle is to
compress the cheecks
How are models of speech production often generated?
computer or mechanically generated
Resonances are dependent on the ____ of the resonator.
configuration/shape
fricatives
consonant produced with a narrow constriction through which air escapes with a continuous noise
This sound class involves a constriction within the vocal tract
consonants
2 types of sounds that perceived categorically
consonants and vowels in context
to provide for constant airflow during speech production we must maintain - in the lungs
constant pressure
Consonants are produced with more _____________ in the vocal tract than vowels
constriction
Upstream
constriction near the glottis
Downstream
constriction near the mouth
Speech sounds vary in duration due to what?
context
rate
contextual timing characteristics of speech production (not individual sounds)
What sounds are longer than stops?
continuant consonants (fricatives, nasals, semivowels) are longer, including the duration of stop closure
after the corticospinal tract of the pyramidal system crosses over, innervation is considered to be mainly ipsi-/contra-lateral
contra-lateral
Consonant production involves a ____ within the vocal tract
contriction
the pyramidal system is comprised of three tracts which are..
corticospinal, corticobulbar, and corticopontine
murmur
coupling of the oral and nasal cavities causes a nasal _____
segmental shortening due to rate of speech
same pattern but reduced strength
different source frequency but same filter
same vowel at different pitch
implication of independence of source and filter
same vowel can be produced at different fundamental frequencies and different vowels produced at same fundamental frequencies
____ formants change due to the size of the oral cavity
second
place cues for stops 2
second formant (F2) transitions (and even F3)
some people produce sounds correctly even if they cant perceive it correctly- who?
second language learners
rate of speech
segments compress as rate increases
What makes suprasegmental features different from the segmental features of speech?
segments of speech are vowels and consonants, suprasegmentals are prosodic features of speech that tend to occur simultaneously with 2 or more segmental phonemes
give two of the types of information that a person has stored for every word they know
semantics (meaning of words) and phonetics (sequencing of sounds)
/w/ /j/ /r/ and /l/ are generally classified as
semivowels
which consonant manner class is most intense acoustically?
semivowels
Range of Human Hearing
sensitivity to sounds depends on both the amplitude and frequency of a sound.
DURATION: Syllabification
separate every syllable
tongue, jaw, and velum
set the filter for how you want to resonate a particular vowel-- set it by using what 3 things?
in terms of speech perception which is perceptual invariance
several sets of cues may be heard as same consonant
stress
creating meaning thru emphasis: F0 contour, intensity, duration many people use a combo of the three, used to change meaning of sentences based on stress of specific word
Voiced stops beginning a word are usually produced with a long/short delay of voicing onset for the next vowl
short
voiceless stops that follow an (s) in the same word (e.g. spy) are usually produced with a long/short delay of voicing onset for the next vowel
short
affricates
short stop + extra short fricative
fricatives
should show air and may look like a stop
sound spectrum
shows bands of harmonics over time: the narrowband and the wideband
any one given person can change the fundamental frequency of their voice by the relative contraction of which muscles?
crico-thyroid, vocalis
DURATION: Ataxia
damage to the cerebella and effects prosody
parallel processing of sounds
decoding more than one sound at a time
2 cues to nasality
decrease in intensity; weakness of upper formants
What does falling intonation result from?
decreased cricothyroid activity or from decreased subglottal pressure at the end of the breath for this utterance
in switching from breathing for life to breathing for speech, the number of breaths per minute increases/decreases
decreases
Increasing lung volume..
decreases lung pressure
Speed of sound is influenced by
density and temperature
direct realist theory of speech perception
derives from the visual perception theory. • Perception: what the listener hears ("the object"), not the actual acoustic event • Perception consists of a single step from acoustic signal to perception
what is the general function of the extrinsic muscles of the tongue
determines the position of the tongue in the oral cavity
what is the general function of the intrinsic muscles of the tongue
determines the shape of the tongue in the oral cavity
The primary muscle of inspiration
diaphragm
final consonants with stops
difference is seen in the duration of the vowel
F2 transitions are moving in __________directions
different
fricative spectra shows
diffuse energy in non sibilants and concentrated energy in sibilants
the stop release burst of /t/ would be expected to have a ___ spectrum because there is ___ cavity anterior to the closure point.
diffuse rising; a small
Manipulating rate of change differentiates semi-vowels and ____
diphthongs.
What sounds are intrinsically long?
dipthongs and tense vowels
What is proprioceptive feedback?
direct feedback from the muscles;s sense velocity and direction of movement and position of articulators and other speech organs
Glottal source - F0 + harmonics Vocal tract - formants
do not confuse this:
when the rectus abdominis contracts, the rib cage is pulled down/up
down
when the hyoglossus muscle contracts the tongue is pulled
downward
when the risorius muscle contracts
draws back the angle of the mouth
An aperiodic sound can be created
due to partial adduction of the vocal folds, at various locations along the supraglottal vocal tract, and through forcing the airstream through a constriction
cue to place of production for nasals
duration of F2 transition to adjacent vowel
2 cues to voicing in affricates
duration of closure, duration of preceeding vowel
partially voiced
during closure
When is there a groove formed along the tongue midline?
during sibilants
When does aspiration occur
during the release burst, but is not the same thing as the release burst contributes to voiceless stop bursts seeming more intense- turbulent air flow occur at the larynx not at the articulators *release burst and aspiration occur at the same time
Spirintization
during the stop gap, when sound occurs due to incomplete closure of the articulators sounds like a fricative another way for air to leak through is through the port
diphthongs
each of these has a characteristic F1-F2 pattern. the actual value of formant frequencies is variable across individuals and within individuals across speaking contexts
standing wave patterns
each resonant pattern is a (blank)=formant
syllable timed
each syllable has about the same duration and vowels do not get reduced ex: spanish
cite three characteristics of muscle that are used in the evaluation of muscle function
strength, range of motion, motor control
Any syllable can be spoken with greater/lesser ______ depending on the meaning demanded by context
stress
What are the suprasegmental features of speech?
stress, intonation (pitch) , and duration (length of time)
the suprsegmental features of speech are
stress, intonation, duration
Suprasegmentals include:
stress, intonation, rhythm and juncture
in heteronym pairs, the different meanings are indicated by ___
stressing the first syllable in nouns and the second syllable in verbs
the strident/sibilant fricatives are ____ than the nonstrident fricatives because ___
stronger; the teeth form an obstacle to the airflow and there is a resonating cavity
What speech disorders involve a breakdown in the rhythm of speech
studdering/ dysarthria
clear speech
stupid people trying to sound smart effort to be highly intelligible: slower, avoidance of articulatory modification, greater intensity of consonants, greater F0 variability, precise timing
the external muscles of the larynx are enervated by the - branch of CN X (vagus)
superior
This task requires a client to say a sound for as long as possible at a comfortable pitch and loudness level
sustained phonation/ max phonation time
No VOT when a stop is in the ___________________ position. It uses the preceding vowel duration instead
syallable- final post vocalic VC
liquids may function as _______, while gildes never do
syllable nuclei
What are suprasegmental features overlaid on?
syllables, words, phrases, and sentences
videokymography
television technology, limits scanning of the endoscopic image to rapid repetition of a single line: drawing the glottis in the horizontal plane over time. research tool, identifies glottal configuration
F2
tells you how fronted or how backed the tongue is in the mouth while producing vowel
F1
tells you how high or how low the tongue is in the mouth while producing vowel
when several impulses travel down the same axon towards the snapse with another neuron the effect is - summation
temporal
coarticulation
temporal overlap of articulatory movements for different phonemes
Co-Articulation
temporal overlap of articulatory movements for different phones.
Which are longer- tense or lax vowels
tense
longer
tense vowels are what?
what do spatial target models say?
that a speaker can still produce a sound accurately even in the face of disruption of underlying muscle activity : "motor equivalence"
What has further evidence said about DAF?
that audition does operate as a feedback system for speech control, but no one knows if the feedback is necessary for speech
What is the overall conclusion regarding the vocal tract normalization theory and simple target theory?
that even though a comprehensive and tested theory is not available, we can assume for the moment that he formant frequencies are the most important cues to vowel perception from the sound signal. That is the simple target theory (though imprecise at times) will lead to the best and most practical results when we try to interpret spectrographic images clinically.
What is problem of the simple target theory?
that formants 1 and 2 are not as reliable and consistent as would seem at first sight. (graphs of vowels circled)
What does rise-fall intonation curve mean?
that most often the pitch rises during the first part of the utterance and falls at the end, signals person its their turn to talk
What do acoustic-auditory models tell us?
that there can be variation in the articulator of sounds, like vowel formants, but the listener will still recognize the sound accurately
lower
the acoustic energy for /l/is primarily in the ______ frequencies -resembles a nasal
Murmur
the acoustic pattern associated with nasal radiation of acoustic energy
Voice Onset Time (VOT)
the amount of time between the burst release and the onset of voicing.
the weakening of the spectrum of nasals above 300 Hz is due to
the antiresonances of the nasal passageways, sinuses, and blocked oral cavity
When does aspiration occur in english
the beginning of a word and the beginning of stressed syllables
oropharynx
the bend at this, doesn't matter acoustically
Why is partial assimilation okay?
the brain is flexible and thus you don't have to hit every exact spot
formants
the characteristic resonances
perservatory (progressive) coarticulation
the current speech sound is influenced by the properties of a sound realized previously -ex. dogs or cats
What is VOT
the duration of the period of time between the release of a stop and the beginning of vocal fold vibration
1000
the first antiformant for /m/ occurs at around ______ Hz.
F1, F2, F3, F4, F5
the first four or five are relevant for speech, and for specification of a vowel, only the first three are relevant
If an area of maximum velocity (v) is penetrated, the formant moves ______ in frequency.
the formant moves DOWNWARD in frequency.
F3
the frequency of this is quite low, making it difficult to distinguish between the second and third formants
acoustic targets
the goal of articular movement may be a specific acoustic event. Supporting this theory=limitations acoustic feedback (hearing impairment) negatively affect speech production
VF are touching
the highest amplitude is when?
What is primary stress?
the highest level of stress, usually seen on the second syllable in a word
How is a greater intensity for the heavily stressed syllable attained?
the increased vocal fold tension that yields a higher F0 value also leads to greater excursion of the vocal folds from rest, causing greater amplitude of the stressed syllable
VOT (voice onset time)
the interval between the release of the stop and the onset of vocal fold vibration
The [l] sound is fairly comparable to the [r] in most respects except?
the l sound is fairly comparable to the r in most respects but fails to reveal the drop in F3 characteristic for the [r]. the l has relatively closed sound, appears softer on spectrograms.
15cm
the length of a female vocal tract (then F1=34,000/ (15x4) = 566.67 Hz)
In general, the longer a tube...
the lower its lowest resonant frequency
What is fundamental frequency
the lowest frequency of pattern repetition
note under vocal tract transfer function
the more widely spaced harmonics of the higher -F0 sound of the female voice compared to the male voice
aBduction is
the movement of the vocal folds away from the midline
aDduction is
the movement of the vocal folds towards the midline
doesn't
the nasal cavity does or doesn't change its articulatory posture?
What is lexical stress?
the pattern of stress within words, can count the number of syllable nuclei in an utterance to determine the number of syllables in a word (also differentiates nouns from verbs like PERmit vs perMIT and Extract vs exTRACT)
Changing F0 is the what?
the pitch pattern or intonation contour of a sentence
Why does /sh/ have lower overall frequencies than /s/?
the point of articulation is further back in the mouth than for /s/, giving a longer resonating cavity anterior to constriction and those lower overall frequencies (2000 Hz and above) - there is also lip rounding and protrusion in production of this soon, causing a longer oral cavity and lower frequencies (think back vowels)
What are the cues for voicing of stop plosives?
the presence of a voicing bar (phonation itself). the duration of VOT, VTT (or stopgap if it happens in the middle of a word). presence of aspiration (VL consonants only). duration of preceding vowels.
Describe voicing of fricatives
the presence of phonation during fricative noise. relatie duration of noise segment and bordering vowels. vowel is prolonged prior to z; z itself is shorter, however than the s phoebe.
When are vowels longer?
when they occur before voiced consonants ("leave") than they are when they occur before voiceless consonants ("leaf") - they are also longer before continuants than stops ("leaf" vs "leap")
Place of Articulation
• Bilabial • Tongue + fixed point of articulation • Pharynx/glottis
harmonics
these are in between (not the same amplitude or intensity), lots of frequencies that have a roll-off in intensity
six tense
these are long vowels: /i/, /e/, /ɝ/, /u/, /o/, /ɚ/
harmonics
these are related to fundamental frequency because they are whole number multiples of the fundamental
three lax
these are short vowels: /I/, /ә/, /ɛ/, /ʊ/ and /ɔ/
What is a non-phonemic diphthong?
these diphthongs are simply stressed versions of existing pure vowels; for ex: (oU eI). diphthongization, an allophonic change rather than a contrastive variation.
formants
these do not have set relationship because they are different for the different vowels.
lips
these gain 6db in terms of amplitude in the low or the high frequencies---get boost so you can hear them better
vowels
these have formants and sonorants
obstruents
these include stops, fricative and affricates
acoustic filters
these let through (pass) energy or reduce (attenuate) energy
instrinsic muscles of the tongue
these muscles origin is inside the oral cavity
formants
these particular areas are strengthened/amplified--need 3 because or /r/ because we have to see F3 dipping down to F2 to confirm /r/
Why are fricatives considered continuants?
these sounds can be prolonged
1. pitch, 2. loudness, and 3. duration
these three things contribute to the perception of stress, helping to differentiate the meaning of similar words
1. nasal airflow and 2. nasalance
these two things are highly dependent upon the stimulus material (whether it contains nasal consonants)
What are semi-vowels?
they are consonants that reveal similarities with vowels. They are entirely dependent on resonance. they have in common with diphthongs that they contain some kind of change or transformation in the first 2 formants ([r] sound has change in 3rd formant). However, as a rule, they can't occur independently without some bordering vowel; the element of change occurs more quickly than those characteristics of diphthongs. manipulating the rate of change differentiates semi vowels and diphthongs.
What do suprasegmentals do?
they play a role in the process of understanding speech- they enable listeners to interpret a speakers intention
What is resonance quality (voice quality)?
thin/oral resonance, muffled, or "back in the throat" resonance.
Give other examples of complete assimilation.
think, bank, anger
stop gap
this 50-150ms event corresponds to the complete closure of the vocal tract (silence), minimum radiated acoustic energy, , neck acts like a low pass filter
What is an appropriate response to the following question? That's NOT your green book?
this IS my green book
disadvantage
this artificially constrains theories to a specific approach or outlook
wideband
this band shows formants better
narrowband
this band shows harmonics better
frication
this comes after the burst
boundary condition
this exists at the lips between the vocal tract and the atmosphere
consonant
this has a locus that vowels move to or from
voice bar
this has a range- for women: 250-under 60
advantage
this helps to understand theories
What is an appropriate response to the following question? WHOSE green book is this?
this is MY green book
Perception of nasality
this is a complex phenomenon that is difficult to measure
palatoglossus (glossopalatine)
this is an extrinsic muscle of the tongue that contracts to raise the root of the tongue or with SG or GG, create groove in back of tongue
genioglossus
this is an extrinsic muscle of the tongue that has posterior fibers contract to push tongue out or against front teeth and anterior fibers that contract to retract tongue. the anterior + posterior fibers contract to pull tongue downward
syloglossus
this is an extrinsic muscle of the tongue that is antagonist of genioglossus. It contracts to pull tongue up and back and maybe pull tongue side up
complex wave
this is contained within the pressure wave
cut off frequency
this is defined as the frequency at which the amplitude of the frequency component is decreased by 3 dB (half of its power)
WHAT is that?
this is my green BOOK
17.5 cm
this is the average length of the male vocal tract
acoustic analysis
this is the cheapest way to see what the tongue is doing (but the rest are fun too..)
fundamental frequency
this is the lowest (most natural) frequency
mouth opening
this is when the jaw rotates downward and translates downward and forward
vertical
this muscle of the tongue contracts to flatten tongue
transverse
this muscle of the tongue contracts to narrow and/or elongate tongue
inferior longitudinal
this muscle of the tongue contracts to shorten tongue and bring apex and lateral margins DOWNWARD
superior longitudinal
this muscle of the tongue contracts to shorten tongue, bring apex and lateral margins UPWARD
vocal tract
this resonates the source signal by allowing certain frequencies to pass through the filter with greater amplitude than other frequencies
vocal tract
this runs from above the larynx to the lips and/or the nose
transfer function
this specifies the vowel
What are the most effective resonators of aperiodic noise?
those immediately anterior to the constrictions and occlusions in the oral cavity
1. presence or absence of voicing, 2. place of articulation, 3. manner of articulation
three features of phonetic description of consonants
stop
three phases of a ___: closure release transition
1. vocalic, 2. glide, and 3. consonantal
three sets of landmarks
lowest (dotted) line/0 dBHL
threshold of hearing/Minimal Audible Field
the external thyroarytenoid (or muscularis) muscles connect from the - to the -
thyroid, muscular process of the arytenoids
the vocalic (internal thyroarytenoid) muscles connect from the - to the -
thyroid, vocalic process of arytenoids
waveform is
time and amplitude
the voice onset time is the
time between stop release and the beginning of phonation
VOT
time from release of stop closure to onset of voicing
intrinsic
timing of movements of articulators is an (intrinsic or extrinsic) characteristic of the relationship among different muscles for a given movement
juncture
timing we give a group of phonemes to relay a message. where we put in breaks changes the meaning "an aim" and "a name" have same phonemes but different break
place cues for stops 3
to a very limited extent the duration of the VOT for initial stops
Why does the VP port close during non-resonant consonants
to ensure all airflow is directed at the oral cavity
What is the goal of target models?
to reach the spatial target , the brains internalized spatial representation of vocal tract areas in which the articulators move
What is Delayed Auditory Feedback seen as?
to some, its seen as proof that speech is a servomechanism and auditory feedback is the main control
the hypoglossal nerve enervates all the muscles of the
tongue
F2 corresponds to
tongue advancement/ size of oral cavity
F1corresponds to
tongue height/ mouth opening
This is how vowels are classified
tongue height/advancement/tension
[u]
tongue high - low first formant tongue back - low second formant
[a]
tongue low - high first formant tongue central - second formant slightly higher
explain what is happening with the articulators during the production of /tu/
tongue moving back, while lips are moving forward
Describe basilar membrane
tonotopic response (from thin and stiff at base to wide and floppy at apex)
how is /l/ produced?
tounge-tip contact with alveolar ridge, sides of tongue down: lateral
the contraction of the lateral crico-arytenoid muscles has the effect of drawing the vocalic muscles toward/away from the midline
toward
sensory feedback
transfer of a portion of the system's output back to the input for regulation and error correction
place
transitions are cues for ___ of articulation
t//f voiced fricatives often lose their voicing when they are produced in a word final position
true
t/f on side view the articulation of (L) looks like the articulation of (d)
true
t/f the articulation of the consonant (r) as produce in various dialects and by various
true
t/f the presence of voicing during the production of voiced stops is highly variable
true
t/f upper motor neurons exist entirely within the CNS
true
When is the rise-fall intonation curve true?
true of declarative sentences and those that do not have yes/no answers
the source for voiceless fricatives involves ___, while the filter involves___
turbulent air flow through a constriction; the cavity in front of the constriction
noise
turbulent airflow is what?
For formant II there are _____ the numbers of (p) points and (v) points.
twice the number. This formant is said to be mostly responsive to tongue front to back positions.
Coarticulation
two articulation are moving at the same time for different phonemes. Occurs due to the temporal overlap between articulatory gestures for vowels and consonants. Example- 'two' /tu/- t is moved back in the mouth and lips are protruded during /t/
What is coarticulation?
two articulators are moving at the same time for different phonemes
1. executive 2. effector
two levels of motor programs
1. open, 2. closed
two types of feedback control loops for sensory feedback
1. front. 2. back
two types of tongue advacement (see book for examples of vowels in vocal tract)
Diphthongs
two vowels within the same syllabic nuclei, smooth glide from one vowel to the next; onglide/offglide; each have a characteristic F1-F2 pattern
sinewave speech
type of speech where waves track the center frequencies of F1, F2, and F3 of a naturally produced sentence. no consonants except spaces.
release bursts are
typically stronger for voiceless than voiced stops
What are the limits for human pitch detection and discrimination?
uncertain and unreliable above about 5 kHz
short lag
under 20 ms
when the external intercostals contract the rib cage is pulled down/up
up
how are fricatives both periodic and aperiodic
upper vocal tract is aperiodic and lower vocal tract is periodic
a strong release burst is typical of __ stops because the ___ at the time of stop release
voiceless; vocal folds are apart
During closure the only possible source of voicing is shown as a
voicing bar
Zero onset/short-lag
voicing begins at or very shortly after burst release. Vocal folds adducted by the time the stop is released. During Silent closure; phonation begins at release or just after.
Zero onset/short-lag VOT
voicing begins at or very shortly after burst release. Vocal folds adducted by the time the stop is released. During Silent closure; phonation begins at release or just after.
Voice Leading
voicing begins before burst release. Vocal folds approximated throughout stop closure, and phonation occurs during stop closure.
pre-voicing
voicing begins just before release
simultaneous
voicing begins upon release
Long-lag VOT
voicing begins well after release. Vocal folds adduct after the stop is released. Voicing is delayed; the stop is aspirated
fully voiced
voicing no stopping
Eddies
volumes of air that perform rotation of aperiodic, high frequency fluctuations in pressure and velocity
two cues to juncture
vowel lengthening; silence
What provides information regrading the changes that occur due to tongue position and oral cavity size
vowel quadrilateral
the f2 "transition locus" of a stop if the hypothetical starting point of f2 of a ___ following a stop, and it gives information about ____
vowel; place of articulation of the stop
relative to one another
vowels are perceived as... what?
vowel quadrilateral
vowels form approximate shape of (blank)
What are segments
vowels, consonants, semivowels- which speech is composed
The majority of our prosody is what we do with ____.
vowels. largely determined what we do with vowels and phonation of them.
Give examples of semi-vowels
w, j, r, l
Which two semi-vowels are considered glides? the tongue positions for these are almost identical to what two vowels?
w, j; [u] and [i] vowels respectively. when they occur close to these vowels, acoustically there appears to be very little change (although perceptually there is). only 1F and 2F matter for perception here.
strongly aspirated
way after release
always
we almost always or never devoice final affricates and fricatives in real life
evidence that categorical perception is learned
we best identity categories in our language
air flow
we need (blank) to initiate phonation when the vocal folds are adducted to create air pressure changes that flow into the oral and nasal cavities or just to send air into the cavities without phonation where modifications of the pressure wave occur
VOT 20 ms or less (english)
we perceive voiced stops /b/
VOT 25 ms or more (english)
we perceive voiceless stops /p/
Nasals have ____ formant patterns
weak
jaw movement from mandibular teeth
what are the bottom two pairs of paths represent under x-ray microbeam?
1. motion of the tongue tip, 2. blade, and 3. dorsum
what are the top three pairs of paths under x-ray microbeam
consonants
what has less energy and may have 2 sources of sound? vowels or consonants
find stopgap in the picture of the spectrogram
what is /a/?
Manner of Articulation
• Degree of constriction and its effect on the airflow • Complete or transient cessation of airflow • Constriction with continuous airflow
Frication vs Aspiration
• Frication noise - vocal tract • Aspiration noise - vocal folds
contact quotient
what percentage of the cycle cue VF closed? CQ = (contact phase/vibratory cycle) x 100% normal is 40 - 60% vaires with voice quality, louder will be closed longer
vowels, diphthongs, approximants
what sounds have formants?
true
what we hear is linear, including syllables, sounds, and words? true or false?
What is anticipatory assimilation?
when a sound is influenced by a following sound
What is carry-over assimilation?
when a sound is influenced by a preceding sound
A formant is defined as:
A resonance of the vocal tract
shorter
lax vowel are what?
Acoustic Characteristics of Affricates
Silent Gap Release burst
Acoustic correlates of affircates
Silent gap and frication.
assess reaction to auditory stimuli
how to test infants?
What sounds are intrinsically short?
lax vowels (/a/ vs /i/)
Turning the treble control down on a radio is a type of
low-pass filtering
The higher the harmonic, the ___ the energy (dB)
lower
the motor branches of cranial nerves are considered upper/lower motor nuerons
lower
men's speech
lower F0, more closed, shallower spectral tilt, more power/amplitude
amplitude
lower frequencies have highest energy
a motor unit is comprised of
lower motor neuron, muscle fibers, neuromuscular junction
What are the 2 types of assimilation
partial and complete
What are the two types of assimilation?
partial and complete assimilation
exhalation for life is a passive/active process
passive
Immobile Articulators
• Alveolar ridge • Hard palate • Teeth
Compare speech intelligibility of speaker who are deaf from birth to those individuals who acquire deafness as adults.
"children learn languages easier"
What is an example of assimilation of manner of production?
"educate" used to be articulated with a stop-glide sequence, now the more common movement of the tongue back toward the palate as the stop is released generates the affricate /dz/ (palatalization) (stop goes to affricate)
In regards to limitations of the simple target theory, explain how "ideal frequency" targets for vowels often aren't achieved.
"ideal frequency" targets for vowel formants, often, aren't achieved in natural speech production (yet, these incomplete productions do not seem to affect speech perception).
place cues for fricatives (spectrum) s
"s" has a relatively high peak (4500-8000Hz); the peak energy for fricatives for males tend to be lower than for females.
place cues for fricatives (spectrum) sh
"sh" has a peak lower than "s" (around 2500-4500Hz) males typically lower than females
What is example of coarticulation?
"two": tongue is reaching alveolar ridge for /t/ at the same time the lips are rounding for /u/
Assimilation and Coarticulation are differentiated in terms of:
# of articulators and # of speech sounds involved in each effect
Formant/Resonant Frequency Equation
(2n-1) x (c/4*L) n = resonant number c = velocity of sound (34000 cm/s) L = length of tube
Silence
(Exception is Brownian Motion) As long as the air pressure is steady at the atmospheric level on both sides of the ear drum, a listener hears nothing.
Diffuse falling spectrum?
(High amplitude/low frequency or low amplitude/high frequency) - Bilabial
Pitch Contour
(Intonation) Change in Fundamental Frequency over time
/f/
(Nonsibilant), Place: labiodental Manner: fricative Voice: voiceless, Aperiodic Turbulent Friction, Muscles: Orbicularis Oris Inferior, Manner- presence of aperiodic noise, Place- low, Voice- Absence of Phonation
/z/
(Sibilant) Place: alveolar Manner: fricative Voice: voiced Aperiodic + Periodic Laryngeal Source, Muscles: Superior Longitudinal, LCA is active, Manner- presence of aperiodic noise, Place- high, Voice- Presence of Phonation
/s/
(Sibilant) Place: alveolar Manner: fricative Voice: voiceless Aperiodic Turbulent Friction, Muscles: Superior Longitudinal, Manner-presence of aperiodic noise, Place- high, Voice- Absence of Phonation
/ʒ/
(Sibilant) Place: palatal Manner: fricative Voice: voiced Aperiodic Turbulent Friction & Periodic Laryngeal Source, Muscles: Intrinsic Laryngeal Muscle, Manner- presence of aperiodic noise, Place- high, Voice- Presence of Phonation
Digital Resonance
-Based on arithmetic -Moving average filter is good example of low-pass filter
Electrical Resonance
-Based on capacitance, inductance, & resistance -Traditional bass and treble controls
/ʃ/
(Sibilant) Place: palatal Manner: fricative Voice: voiceless Aperiodic Turbulent Friction, Muscles: Intrinsic Laryngeal Muscle, Manner- presence of aperiodic noise Place- high, Voice- Absence of Phonation
/tʃ/
(Stop & Fricative) Place: palatal Manner: affricate Voice: voicless Transient Aperiodic & Continuous Aperiodic, Muscles: Styloglossus, Superior Longitudinal , Manner- presence of a silent closure interval, transient release burst; rapid rise/fall time; presence of non-transient fricative noise, Place- F2 transition to/from neighboring sounds, Voice- presence vs. absence of phonation; duration of fricative noise; duration of preceding vowel
/dʒ/
(Stop & Fricative), Place: palatal Manner: Affricate Voice: Voiced Transient Aperiodic, &, Continuous Aperiodic & Periodic Laryngeal Source, Muscles: Styloglossus, Superior Longitudinal, Muscles for Phonation, Manner-presence of a silent closure interval, transient release burst; rapid rise/fall time; presence of non-transient fricative noise, Place- F2 transition to/from neighboring sounds, Voice- Presence versus absence of phonation; duration of fricative noise; duration of preceding vowel.
Which wave to the right has energy at only one frequency?
(The sine wave pic)
Which wave tot eh right has the largest RMS amplitude?
(The square wave)
Which wave to the right is aperiodic?
(The white noise wave)
Carry- over assiliation
(left to right)- sounds influenced by preceding sounds. Examples Cats> ends in /s/ Dogs > ends in /z/
Diffuse rising burst spectrum?
(low frequency/amplitude or high frequency/amplitude) - Alveolar
What is the Vocal tract Normalization hypothesis?
(re: how perception overcomes inconsistencies of vowel production.) This theory states that our perceptual system pays particular attention to the occurrence of the so called "point vowels" (cardinal vowels) which represent the most extreme configurations for a given individual vocal tract and uses them as anchoring points for judging other vowels. In a sense, the listener "calibrates" perception for the individual variations of formant patterns for vowel production. perhaps, also, visual cues or non speech vocal tract sounds are taken into consideration.
Anticipatory assimilation
(right to left)- sounds influenced by following sounds. Example 'avec' /avek/ vs. 'avec vous' /aveg
Frication
*turbulent noise of a sound* The hissing element of a speech sound, such as an affricate.
Double Incoherent Pressure (dB SPL)
+3
Double Intensity (IL)
+3
Double Coherent Pressure (SPL)
+6
spectrography
- /b/ is mostly in the formant transition - depends on adjoining sounds - depends on position in word - silent interval - coarticulation spread across many phonemes
How many sources does a voiced fricative have? What are they?
- 2 - Upper vocal tract and quasi periodic vibrations of the vocal folds
Burst release (concentration of energy)
- Also known as a stop release/stop production. Oral release yields a transient noise source. - Is a concentration of energy which appears in spectrograms as a vertical spike following the silent gap, is somewhat more intense and thus more conspicuous for the voiceless than for the voiced stops. - These are very brief but often cover a broad range of frequencies with varying intensity.
Semi-vowel
- Articulations and acoustic features resemble those of vowels. - Used in their production form only minimum constrictions in the vocal tract. - Characterized by formant structures similar to those of vowels and diphthongs. - Subdivided into two manners of articulation glides and liquids.
Characteristics of an Obstruent
- Blocked or restricted airflow - Aperiodic sound sources in upper vocal tract - May be voiced or voiceless
Describe the touch receptors on the tongue.
- Can feel touch on 2 separate points on the tongue tip which are only 1-2 mm apart - need to be 1 cm apart on the back or lateral margins of the tongue - superior surface of the tongue is more sensitive
Upper Airway/Vocal Tract
- Closed at one end (glottis) and open at the other end (lips) - average male = 17.5 cm in length
Voiceless fricative the source is described anatomically as? Acoustically as?
- Constriction formed by supra glottal articulators (tongue, palate, lips, teeth) - Continuous aperiodic sound
How do consonants compare to vowels in frequency and amplitude? What are the implications of that for speech perception?
- Constrictions are higher in frequency and lower in amplitude - BC consonants carry the information of speech high frequency NIHL can impede speech perception more than low frequency losses
Resonances frication noise and aspiration noise?
- Frication: Higher resonances - Aspiration: Lower resonances
F1 equation
34,000 cm/70 = 485.7 Hz
Spectral roll-off (tilt)
- amplitude decreases 12 dB for every octave increase in frequency (causes the slope in harmonic series - steepness of closing phase reflects how rapidly the vocal folds close (slope) - Spectral roll‐off is a function of the speed of vocal fold closure
evidence for motor
- categorical perception - VOT experiments - place of articulation experiments - strict dividing line
Formant
- characteristic resonance of a particular vocal tract - peaks, frequencies with greatest ammplitudes
Consonants
- constricted vocal tract - may have alternative sound source - voiced and voiceless
Glottal Source Characteristics
- from vocal fold vibration - consists of harmonics
low frequencies
- human ear is less sensitive - lack of sensitivity becomes more pronounced for softer sounds
Acoustic Characteristics of Radiated Sound
- max gain (increase in intensity) from lip radiation is 6 dB - 6 dB per octave roll-off of radiated acoustic spectrum
What are the features of complex sounds?
- natural sounds are usually complex - every complex sound = composed of simple periodic sounds (Fourier analysis) - Complex periodic - frequencies of the contributing simple periodic sounds are always whole number multiples of the lowest frequency - these whole number multiples are often called harmonics
cues for voiced vs. voiceless
- position matter - duration of the preceding vowel - aspiration (ex: big/bic)
Vowels
- produced by relatively open vocal tract - nucleus of a syllable - all vowels are voiced
wave-surfer identification for VOT
- silence before the release - the release (burst) - the onset of voicing (good F2) - the duration from the release to the onset of voicing
stress timed
- stressed syllables last longer - unstressed syllables show vowels reducing ex: english similar to mora timed
What is the frequency composition of aperiodic sounds?
- the 'whole number multiples' does not apply - instead, random distribution of frequency components - there is no f0
Tube/Formant Resonances
- the tube will resonate best (the natural resonant frequency) at a frequency that has a wavelength that is 4x the length of the tube
What is the upper airway (a tube-like structure) responsible for?
- transforming the source sound into speech - the transfer function specifies different vowels
Place of articulation
--Bilabial •Tongue + fixed point of articulation (i.e. lingua-alveolar) • Pharynx/glottis (/h/ is a glottal sound but it different than breathing because the glottis is partially open whereas in breathing it is completely open)
highlighted equation under maximal gain
-12dB per octave roll-off at the source (larynx) and we gain 6 dB per octave at the lips +6dB/octave gain -6dB per octave roll-off of radiated acoustic spectrum
Sampling Theorem
-A signal may be represented exactly if it is sampled at at least twice the highest frequency (the Nyquist frequency) -It's guaranteed that when upon playback, all distortion will be above Nyquist frequency -So, if upon playback the signal is low-pass filtered at the Nyquist frequency, the signal will be recorded perfectly (in terms of frequency, not always amplitude)
In what way is a sine wave simple??
-All energy is at one frequency!
Complex wave
-Anything thats not a sine wave! -Expressed as a sum of sine waves +amplitude, +Frequency, +Phase -Fourier Analysis- converts time-domain complex wave to frequency domain
articulatory sequence
-Articulators move to produce consonants -Movements overlap ---Coarticulation -Usually v and c movement - Sometime c and c
Description of Pitch Contours
-Average F0 = Male 120, Female 225 -Sentence level variation because you run out of air -Linguistic variations used to express linguistic meaning and intent -Fo to express emotional states
Moving Average Filter
-Based on sliding window of averaged samples. -Original signal consists of list of n samples long -Filtered signal is list of n minus w (window length) samples long -Each sample in the filtered signal consists of the mean of the previous w samples from the original list. (Pic is an example of moving average filter)
dB HL
-Based on threshold of audibility of typical individual at particular frequencies. -Basis of the audiogram
Analog to Digital conversion and Processing of Speech
-Beginning: Going from air pressure, to digits, back to air pressure -Low-Pass Filter guarantees that nothing can get through thats higher than it can deal with -Analog to Digital: Nyquest/Sampling theorem -Optional Processing: Speeding up, slowing down. Commercials that speed up fine print at the end do this.
Primary Spectral Energy
-Bilabial: 500-1500 Hz -Alveolar: above 4000 Hz -Velar: 1500-4000 Hz
Spectrum of Release Burst + Aspiration
-Bilabials (p,b): energy broadly distributed across all frequencies or concentrated in lower frequencies -Alveolars (t, d): rising spectral envelope -Velars (k,g): Mid-frequency range contains the most energy -Try saying the plosives in a whisper to hear the frequency of the burst. •Pitch of /t/ is highest and /p/ is lowest. •Variable intensity with /t/ loudest
Utterance Levels
-Breath Group (Fo will go down, 1 curve) -Phrase Level (2 curves, up-down, up-down) -Word Level (multiple curves) -Phoneme/Syllable Level (breaks in curve)
RMS level is the foundation of other amplitude measures
-Common in speech and hearing is the decibel -Decibel is logarithmic scale -We hear vast range of levels -Decibel is a ratio or comparison -Based on 20 micro pascals RMS
Analog Representation of Speech
-Continuous (every time has a value) -Simple Equipment -Trouble with noise and distortion -Difficult to maintain -Inflexible
Physical Characteristics of Cavity Resonators
-Direct relationship between cavity length and its resonant frequencies -Basis of relationship is the cavity length and wavelength of sound -Wavelength ( λ ) is defined as the distance traveled by a periodic wave during one repetition of the fundamental frequency
Digital Representation of Speech
-Discrete (only specific times have values) -Complex Equipment -Noise and distortion as low as desired -Easy to Maintain -Flexible
Pascal and Micropascals
-Ear can hear air pressure vibrations between 20 micropascals and 20 pascals -Analogous to Alternating Current (AC) -Vibration measure is RMS
VOT
-Exists on a continuum -F0 is also an acoustic cue -F0 tends to go down in anticipation of closure for voiced and voiceless stops -After voiceless stops, F0 is elevated momentarily. F0 remains flat after voiced stops
Acoustic Analysis
-F0 is produced by the vocal folds -F0 is the lowest frequency -VFs also produce harmonics
/l/
-F1 - 360 Hz -F2 - 1300 Hz -F3 - 2700 Hz
Sound Pressure dB SPL
-Force on a surface area perpendicular to the direction of the sound -Standard reference level is 20 uPa (or 2 x 10^-5 Pa; or .00002) -Most meaningful for SLPs and Audiologists -Easiest to measure (just a microphone is required) -20 log10 (P/Pref) *Pref=20 uPA
Lingual-Alveolars (s,z)
-Greater degree of constriction (narrower channel) yields higher energy and higher frequency noise. -Energy concentrated above speaker's F4
Human's toleration of sound
-Intensity ratio between faintest and loudest we can tolerate is 1 to a trillion (1x10^12), -120 dB is the approximate range of intensity that human hearing can perceive and tolerate. Eardrum would explode if exposed to 160 dB of sound!
Simple-Complex Waveform
-It is relatively easy to add sine waves to make a complex wave -more difficult to extract component sine waves from a complex -Basically what we do when we listen to speech
Spectrogram Info
-Low vowels ( ɑ, ɔ ) have a high F1 -High vowels ( i, u ) have a low F1 -Front vowels ( i, ɪ, ɛ ) have a large distance between F1 & F2 -Back vowels ( ɑ, ɔ, o, u ) have a close F1 and F2.
external auditory meatus resonants at what freq?
3450 Hz (3000-4000 Hz).
Digital representation of Speech
-Most accessible representation of speech is the air-pressure waveform -most powerful tools for speech analysis, synthesis, manipulation, and training are computers -First step=convert an air-pressure waveform to computer readable digital waveform
Lingual-Palatal (ʃ, ʒ)
-Narrow constriction (further back than lingual-alveolars) -High frequency noise (not as high as alveolars) -Energy concentrated above speaker's F3
silence (stop gap)
-Occlusion to release -Voiceless stops: complete silence -Voiced stops • varying amount of silence (depending upon transglottal flow) • Voicing is low amplitude due to damping. • Seen as voice bar on spectrogram
Tube closed at one end and open at the other (/ə/)
-One quarter wavelength fits -So does three quarters, and five quarters... (pattern) -Formula
De-Constructing Speech Sounds
-Primary de-construction tool is Fourier analysis -Any complex, periodic wave can be broken down into a sum of simple sinusoids with appropriate frequencies, amplitudes and phases. -These component waves may be combined and displayed as a spectrum
Coherent Sounds
-Pure tones or sounds that are mostly composed of a pure tone -Speakers wired in parallel -Amplifiers -Noise-canceling headphones
RMS measures and Electricity
-RMS corresponds to the DC amplitude -120 volts AC household voltage=120 volts RMS which is the same as 170 volts peak.
White Noise Wave
-Random waveform -Equal energy at all frequencies -Often created by air turbulence -No fundamental frequency, no harmonics, aperiodic -In time domain its a bunch of squiggles, in frequency domain it's a flat line at the amplitude
Crest Factor
-Ratio of peak value to RMS -RMS is lower for speech than sine given the same peaks
Frequency Domain
-Requires two graphs: Amplitude and Phase -Amplitude most important, phase required for completeness (to recreate wave) -Time is discarded (infinitely repeating signal assumed)
How do you know whether the sampling rate is fast enough to represent ll the "bumps" in the waveform?
-Sample at More than twice the highest frequency in the signal
Digitization Parameters
-Sampling Rate: determines frequency resolution -Bits of Quantization: Determines amplitude resolution
Tube Models of the Vocal Tract
-Simple tubes amy be used to model the vocal tract -Bends in the vocal tract have little effect on resonance -Simplest tubes provide reasonable model of the schwa vowel
Complex Waves Summary
-Simplest unit of sound is the sine wave. Sine waves may be combined to create any periodic sound--such as the glottal source. -White noise is aperiodic and can be generated by air turbulence in the mouth. -Source-filter theory of speech production-Models speech as a combination of sound source
Acoustic Cue for manner
-Slow formant transition (75-250 ms) •compared to diphthongs 350 ms
Square wave
-Some complex waves are typified by the combination of F0 and specific harmonics. -Formed by F0 PLUS ODD HARMONICS
Sound Power
-Sound energy per unit time (usually in watts) generated by a sound source -Standard reference level is 10^-12 Watts -Rate of flow of energy
Sound Intensity dB IL
-Sound power per unit area (usually in Watts/m^2) -Rate of flow of energy -10log^-10(I/Iref) *where Iref=10^-12
Turbulence noise production
-The constriction functions as a nozzle -air exiting the construction forms a jet -as jet mixes with surrounding air, turbulence is generated -turbulence is associated with eddies
Compression & Rarefaction
-The force of an impulse or vibration pushes out against air molecules. -This carries compression away from the source- leaving rarefaction in its wake. -If air had no elasticity, the effect of an impulse would stop at its point of initiation.
Why do objects make the sounds that they make?
-They are mechanical systems. -They have properties that create vibration (Periodic & aperiodic) -Periodic vibration usually involves three properties of the object (Mass, resistance, stiffness).
Traditional Description of Vowels
-Tongue Height (High/Mid/Low) -Tongue Advancement (Front/Back) -indistinct articulation
covert contrast
-the speaker produces a measurable, reliable distinction between two sounds, but listeners do not readily perceive the contrast the child produces -speaker has more knowledge of sound contrast than we think based on transcription alone--shown by instrumental analysis -need to be measured multiple times
Affricate
-tʃ, dʒ -have a stop gap followed by intense frication
When you strike a mechanical system it creates....
-velocity, acceleration and displacement (which are based on the physical properties mass, resistance, and stiffness.)
nasals
-velum is lowered the sound and air can pass into the nasal cavity creating this sound -all voiced
v, ð
-voiced -low energy, diffuse spectrum -voicing feature adds a prominent concentration of energy in the low frequency region.
f, θ
-voiceless -low energy, diffuse spectrum -high frequency
Glide
-w, j -F1 starts very low and then rises to the F1 of the following sound -F2 and F3 also begin similar to those of /i/ and /u/ and shift toward the F2 and F3 values of the following sound
five lax vowels
/I/ /ɛ/ /ʊ/, ә/ and /ɝ/
r
/_/ has the lowest F3 of any English sounds
Low, back vowel
/a/
high F1 and low F2
/a/ has ...?
high F1 and high F2
/ae/ has...?
three neutral
/ae/, /a/, and /ɑ/
nonsibilants
/f v θ ð h/ less noise energy than sibilants, formant transitions are primary acoustical cue, voiced nonsibilants will have quasi-periodic pulses, noise spectra are fairly flat and diffuseand expected to have antiformants due to narrow constricction
fricatives
/f, v, s, z, sh, Ʒ, θ, ð, h/ lower intensity than vowels, aperiodic features, wide frequency range, no clear formant frequencies, voiced fricatives have vertical striations
What phonemes are non-sibilant
/f/ /v/ /0(th)/ /Q(th)/
Which phonemes have flat, low-frequency spectral peaks
/f/ /v/ /0(th)/ /Q(th)/ non sibilants; anterior
Consonants: Fricatives
/f/ /v/ /θ/ /ð/ /s/ /z/ /ʃ/
Give an example of how a single articulation can produce 2 different speech sounds.
/f/ and /v/ are articulated the same way, but there is phonation present during the production of /v/ and not during the production of /f/
Difference between the weak, voiceless fricatives /θ/ and /f/
/f/ energy concentration is 500+, /θ/ the energy is 2000+ with trailing tail. -Both have light noise and longer duration.
Consonants: Glottals
/h/
what is the glottal fricative
/h/
what is the voiceless version of preceding or following vowel
/h/
Glottal Fricatives
/h/ narrow vocal folds, they don't vibrate Frequency range: depends on what's around low amplitude acoustic energy on its own can be completely coarticulated it depends on the vowel you are producing acoustic energy in vowel range of /i/ or /u/ place is glottis
/j/ has similar formant structure to which vowel?
/i/
High, front unrounded vowel
/i/
The glide /j/ is initiated as a vowel-like sound that is similar to the vowel...
/i/
low F1 and high F2
/i/ has...?
Name the point vowels and include a description of their general formant pattern in relation to this front/high/tense vowel
/i/, /a/, /u/ - Low F1, High F2 for /i/, higher F1 lower F2 for /a/; Higher F1, Lower F2 for /u/
seven tense vowels
/i/, /e/, /ae/, /u/, /o/, /ɔ/, and /ɑ/
/ng/ has a very similar formant patterning to
/k/ and /g/
nasals
/m n ng/ low intensity and frequency is about 300 Hz, short duration, *voiced with vertical striations, low intensity formants (antiformants)
which nasal is the lowest in frequency and shortest in diameter
/m/
Consonants: Nasals
/m/ /n/ /ŋ/
nasals
/n, m, ŋ/ •Occlude oral cavity & open velopharyngeal port; all nasals = voiced •Formant structure & can be syllabic, like vowels, but have significant constriction. •Acoustic evidence for manner: - Nasal murmur = very low F1 (250-500 Hz) (large nasal resonating space & narrow opening) - Low energy of all formants (high damping) - F2, F3 vary
Which nasal is higher in frequency and slightly longer in duration
/n/
Which nasal is highest in frequency and longest in duration?
/ng/
voicing
/p t k/ with a VOT of 25-80ms (mean of 45ms) are phonetically distinguished from /b d g/ with a VOT of -20 to 20ms (mean of 10ms) by what?
Consonants: Stops
/p/ /b/ /g/ /t/ /k/ /d/
what phonemes do aspiration occur on in english?
/p/ /t/ /k/
/m/ has a very similar formant patterning to
/p/ and /b/
The brief cessation of airflow emitted from the vocal tract underlies the acoustic period of silence characteristics of
/p/, /t/, and /k/.
Language Specific for English Stops
/p/,/t/,/k/
What are the three acoustic characteristics of stressed syllables?
1. Higher F0 for the heavily stressed syllable (higher pitch) 2. Greater duration of the stressed syllable (longer) 3. Greater intensity for the heavily stressed syllable (louder)
Acoustic Features of Suprasegmentals
1. Pitch (Fo) and Pitch Variation = Intonation 2. Loudness (intensity) and Loudness Variation = Stress 3. Duration (length) 4. Pausing (rhythm perception of the brain; makes you uncomfortable when disrupted) 5. Patterns -Tonation -Intonation (change in pitch) -Stress/Emphasis -Duration -Rate
Acoustic Production/Perception Cues
1. Place of Valving: where the constriction is 2. Degree of Valving: size of the space air goes through 3. Duration of Valving: how long air is pushed through 4. Voicing Overlay: combination of aperiodic and periodic sounds 5. Formant Transitions 6. Rise Time
Four Acoustic characteristics in temporal order?
1. Silent closure 2. Transient aperiodic release burst 3. Continuous aperiodic frication 4. Continuous aperiodic asperation
Stop Features
1. Stop Gap 2. Noise Burst 3. Aspiration 4. Voice Onset Time 5. Transitions 6. Stop Gap 7. Release (+/-)
Perception of Transition Duration
1. Stop: 40-60 msec 2. Glide: 60-100 msec. 3. Vowel + Vowel >100 msec.
What Three forces act on the vocal folds for vibration
1. Stress 2. Strain 3. Shearing
Cues to Nasal Manner (Nasal Murmur)
1. Voiced 2. Low intensity (softer than the neighboring vowels; meaning lower amplitude) 3. Relatively steady state formants 4. Low frequency resonance (usually below 500 Hz but often below 300 Hz)
Sound Visualization
1. Waveform (frequency/time) 2. Spectrum (amplitude/frequency) 3. Spectogram (frequency/time/intensity)
Nasal Acoustic Information
1. Weak Formants -Anti-Resonances -Nasal Cavity Damping (absorption of sound makes the bandwidths wider and less distinct) 2. Nasal Murmur -Extremely low formants (below 500 Hz F1) 3. Place of Articulation: F2 and F3 -Wide formants; blend into each other -F2 transition similar to stops 4. Vowel Coloring: blend of a vowel and the consonant after it (one sound takes on the characteristics of another) 5. Nasality (resonance)= sound coming out of nasal cavity 6. Nasal Emission (flow) = air coming out of nasal cavity
What are the several categories that models fall into?
1. a strong linguistic basis and emphasis 2. the goal of speech production is to attain one or more targets 3. a focus on the role of timing in speech 4. a focus on the role of feedback in speech
Explain one famous experiment by Liberman et al that tested categorical perception "in counterbalanced fashion" two separate procedures were tested:
1. an identification task; either through forced choice = multiple choice or open response sets. 2. a discrimination task; which is most similar to x: a or b? the results were there appeared to be critical (categorical) boundaries only between certain stimuli in the sets, but not the majority of theories suggesting crucial points where perception "flip flops" between different speech sounds. this result was the same regardless of order of the task, prior knowledge of the stimuli, forced choice/open set format. other possible variations in categorical perception are distinctions in manner and voicing.
What are the other classifications of assimilation?
1. anticipatory (or right-to-left) assimilation 2. carry over (or left-to-right) assimilation 3. assimilation of manner of production
What 4 kinds of info are available to a speaker for feedback?
1. auditory 2. tactile 3. proprioceptive 4. central neural
What does changing F0 do in terms of intonation?
1. expresses differences in attitude ("that's a pretty picture.") 2. use of rising intonation can turn a sentence into a question ("today is tuesday")
manner cues for fricatives
1. have aperiodic noise component 2. duration of the noise is relatively longer than aspiration noise after voiceless stops or the fricative noise in affricates
What are the 4 places at which constrictions are created for fricatives?
1. labiodental 2. linguadental 3. alveolar 4. palatalveolar
Describe different examples of Linguistically Oriented Models.
1. model of speech production using phonetic and physiologic data together 2. examined spectrograms and derived a set of 12 features of speech (voiced and voiceless) and assigned each speech sound one specific feature - a "distinctive feature analysis" 3. redesigned the distinction feature system in articulatory terms: sounds were labeled rounded", "high tongue", etc, they generated a set of 27 features based on cavity, manner of articulation , and source 4. Linking speech perception to production: speech sounds are encoded in the acoustic signal due to how they produced (first to really propose this)
What are the 2 differences between non resonant consonants vs resonant consonants?
1. nonresonants lack formant structure and openness of resonants 2. audible noise is present in nonresonants
place cues for affricates
1. palatal (post alveolar) 2. stop burst and formant transition (F2)
What are the essential components of the motor theory of speech production?
1. perception makes use of articulatory knowledge of speech production 2. speech makes use of special perceptual properties. for example, categorical perception (speech sounds are based on critical perception boundaries; here we see a significant parallel with the quantal theory). another special property would be to produce speech.
What are the contributing cues for manner in stop plosives?
1. presence of spike (across entire freq domain); by itself perceived as a "pop" or "click" - aperiodic, very short noise. "pop talk" 2. Presence of a burst, following the spike (which in the case of a [p] sound is a minimal form of aspiration that resonate right between the lips when they open. 3. presence of a VOT, VTT or stop gap. which is the use of a stop in the middle of a sound sequence (either long or short based on plus or minus voicing). 4. short duration of f2 adjustments in vowels; the abruptness of the plosion causes f2 adjustments also to be brief and quick i nature.
Spectrogram
3D visualization of sound (frequency, time, intensity
speech production systems may be compose into what?
1. sound source and 2. filter
manner cues for affricates
1. stop gap (silence) followed by a burst then a sharp rising fricative noise 2. both the rise time (amplitude) and duration (length) of a full fricative ia about double of the fricative portion of an affricate.
voicing cues for fricatives
1. voiced fricatives have a periodic component (F0) fundamental frequency 2. voiceless fricatives are not associated with voicing.
voicing cues for final stops
1. voicing during closure is the most salient cue for stop voicing in the final position; voiced stops have voicing during the stop gap 2. duration of the stop gap; final voiced stops have a longer stop gap than their voiceless counter parts 3. length of the preceding vowel; vowels are shorter before a voiceless stop than before a voiced stop at the same speaking rate 4. F1 falls at the end of the vocalic portion of a voiced stop
voicing cues for medial stops
1. voicing during closure is the most salient for stop voicing in medial position; voiced stops have voicing during the stop gap 2. duration of the stop gap; unstressed medial voiced stops have shorter stop gaps than their voiceless counter parts 3. length of the perceding vowel; typically vowels are shorter before a voiceless stop than before a voiced stop at the same speaking rate 4. F1 transition for voiced stops
voicing cues for initial stops
1.initial voiceless stops have longer VOT that voiced stops 2.low vs high starting position of F1: initial voiceless stops have a higher starting position of F1 than voiced stops 3. relatively larger vs relatively small F1 change; initial voiceless stops have a smaller F1 change than voiced stops 4. voicing during the stop gap can also be a cue; initial voiceless stops can have voicing during the stop gap
in breathing for speech, the breathing in cycle takes about - percent of the duration of the whole cycle
10
If you have the harmonic 200, 300, and 400 then the fundamental frequency is...
100!! (Subtract the difference between harmonics)
general roll-off
12 dB per octave
Triangular wave source (Sound From Larynx) (U); loss
12 dB/octave roll-off
doubling of frequency
12 dB/octave roll-off, so we lose 12 dB per octave
F2 locus for (average adult males /d/
1800 Hz F2 transition direction depends on the F2 *F2 locus- when you have a release gap and it goes to the following vowel
When was Delayed Auditory Feedback discovered?
1950
1. bite-block, 2. artificial palate, 3. sudden occlusion of the airway, and 4. sudden mechanical perturbation to the jaw or lip
4 examples of perturbation studies
Syllable initial prevocalic stop CV
1st Closure (Stop gap)- vocal tract closes (no sound) (voiced may have vocal fold vibration) (exhaling) 2nd Release of articulatory constriction- air pressure comes out fast (release quickly), perceived as a noise burst (aspirated or unaspirated) 3rd transition- transition to the vowel (formant transition)
Syllable final postvocalic stop VC
1st transition- vocal tract is moving constriction 2nd closure- (stop gap)- may release stop or not 3rd release (noise burst) or no release noise burst
1. path, 2. trajectory
2 components of spatial dimension
electroglottography
2 electrodes on each side of thyroid cartilage, 1 electrode emits low current that is transmitted to other electrode when VF are in contact, signal reflects VF contact are, air is insulator so current wont pass
Coarticulation
2 or more articulators move at the same time to produce 2 or more phonemes
1. coarticulation, 2. suprasegmental factors
2 reasons there is not a one-to-one relationship between the acoustic features and the perceived consonant?
closed glottis = ?
2 sources of sound, period sound of phonation, aperiodic sound of airstream passing through constriction = voiced fricative
Closed Glottis
2 sources of sound: periodic sound of phonation and aperiodic sound of airstream passing through constriction
What is the reference value of dB SPL?
20 micro Pascals.
what is the frequency of a nasal murmer
200-300hz
what are our most sensitive frequencies?
2000-5000 Hz
The eardrum strengthens the signal by about ____ dB
25 dB
What is the amount of time for voice onset?
25 ms
1. acoustic targets, 2. articulatory gestures, 3. aerodynamic pressures
3 theorized output targets
Overall the middle ear provides a boost of perceived sound level by about ____ dB
30 db
F2 locus for (average adult males /g/
3000 Hz, F2 transition always falls on the vowel's F2
The speed of sound in air is approximately...
34,000 cm per second, 1125 Feet per Second, 767 Miles per Hour (Not the vacuum option)
A classroom had two identical window air conditioners. The sound pressure level from each of them measured alone was 73 dB
76
F2 locus for (average adult males /b/
800 Hz, F2 transitions always rises to the vowel's F2
How many dB SPL is a .25 Pa sound?
81.94 dB SPL Math: 20 x log10(.25/.00002) 20 x log10 x 12500 20 x 4.096 =81.94 dB SPL
Voiced VOT
< 20 msec (may be between 1 to 19 msec) may be 0 may be negative msec: voicing begins before the articulatory release (prevoicing) may not be able to measure VOT if voicing is continous through out the stop gap Manner and voicing cue
Timing for VOT
< 20 msec voiced > 20 msec voiceless
voiced
<20 ms
energy=
=amplitude
voiceless
>25 ms
Fourier Analysis (prism)
A prism can subdivide light into its component frequencies.
3dB Down Point/Half-Power Point
A reduction in intensity of one half is equal to a decrease of about 3dB. The frequency at which the intensity is 3 dB less than the peak intensity of the resonant frequency.
Stop on Spectrogram
A Complete STOP in airflow (Complete Obstruction of Airflow) Obstruction is followed by a large wide-band burst of energy as the obstruction is released
An acoustic tube model of the vowel /i/ has formant frequencies determined by the resonances of what type of tubes?
A Helmholtz resonator plus one tube open at both ends plus one tube closed at both ends
Aspiration
A brief hiss of air following the burst - Present for voiceless stops (pie), not for voiced stops (buy) - No aspiration in s‐clusters, likely due to persistence of lack of voicing (stop) - Release of a voiceless stop may occur with or without aspiration (pie vs apple)
Anti resonance
A filtering effect of the vocal tract characterized by loss of acoustic energy in a particular frequency region.
What is a spectrum?
A graph of amplitude over frequency - it provides an analysis for a single point in time (line spectrum)
What is a limitation to the Peterson and Barney study?
A limitation of the Peterson and Barney study is that formant frequencies were calculated from isolated and deliberate productions. It also showed that even in these unnatural conditions, some vowels still have overlapping formant configurations.
How are thresholds of hearing and equal loudness scales determined?
A logarithmic scale compresses the range
Sound intensity (dB IL)
A measure of power, doesn't depend on size or shape of environment, (dB IL equation)
How to produce a fricative
A narrowing of the vocal tract. Air moves faster through narrow areas because of the constriction.
What is a quasi-periodic tone?
A pattern that repeats itself at almost regular intervals
What is periodicity?
A pattern that repeats itself.
Aspiration
A period of voicelessness after stop release
How does perception overcome the challenges of these inconsistencies of vowel production ( re: limitations of simple target theory)?
A possible explanation for the successful consistent perception of otherwise variable stimuli is the "vocal tract normalization" hypothesis by Liberman.
Unreleased stop
A stop without the release burst, usually at the end of a word
Speech models fall into 4 categories:
A strong linguistic basis and emphasis The goal of speech production is to attain one or more targets A focus on the role of timing in speech A focus on the role of feedback in speech
Spectrogram
A type of short-term running spectrum in which sounds are analyzed in a 3D pattern of time, frequency, and amplitude. Shows pattern of energy in phonemes. Intensity show in different shades of gray. Darkest areas = higher intensity or amplitude; region of energy. Shows the acoustic correlates of information sources in speech.
In the vowel perception theory, what is the only certainty?
A vowel identity is somehow coded in its formant configuration (relative frequency positions) of formants 1 and 2 (much of the following applies to semivowels, as well). precisely how we extract vowel perception from these formants (through hearing or in a neurological sense) is not entirely known. there are many discrepancies with respect to this so called "Simple target theory"
Voiceless consonants have...
APERIODIC laryngeal source; supraglottal noise sources
What is the threshold of hearing in dB SPL?
About 0 dB SPL
Affricates
Acoustic Cues: 1. Duration: 75-130 msec. 2. Rise Time: 33 msec. -Transitions: 73-150 msec. -Spectograms: looks like fricative, but shorter -Pre-Vocalic is shorter than post-vocalic (e.g. "judge")
Vocal Tract Length
Adult male: 17-18cm Adult female: 14-15cm Young child: 6-8cm and shape differs
Offglide
After transition-relatively steady state formants
Vocal Tract Transfer Function
Air particles vibrate most effectively at the open end of the tube (air moves freely), and least effectively at the closed end of the tube • The open end will have a velocity maximum (pressure minimum) • The closed end will have a velocity minimum (pressure maximum)
Egressive
Airflow is an outward flow from the lungs
Open Glottis
Airstream is only audible at the point of constriction
Output Spectrum
All spectrums put together
Coarticulation
Simultaneously articulating more than one phoneme. Anticipatory (forward) & retentive (backward)
Air Pressure
Alternating pressure transfers energy much better than static pressure
Which places of articulation of fricatives have high energy formants
Alveolar and Postalvelolars
Which places of articulation of fricatives have narrower bands of high frquency
Alveolar and postalveolar
Describe the sensory receptors on the alveolar ridge.
Alveolar ridge has more reception than posterior part of the palate
In what stop is the front cavity is short, high frequency energy
Alveolar stops
What stop has a high F2 (1800hz) and F3
Alveolar stops
What is the Y axis on a spectrum?
Amplitude
What is the Y axis on a waveform?
Amplitude
Describe the movement of an affricate.
An alveolar closure is made (like for /t/ or for /d/), then the closure is released and the tongue retracted to the postalveolar position on the palate, with the same shaping as for production of /sh/ and /zh/
What are acoustic cues?
An aspect of the acoustic signal that has a role in distinguishing between one phoneme and another.
Compression, pathologies, loudness during adduction...
affect resistance * In a healthy cycle this is predictable
Name given to sharp reduction in amplitude; when sound is absorbed by the shunt?
Anti-Formants
How is resonance influenced by our "variable resonator?"
Any cavity has the tendency to produce a standing longitudinal wave form as its resonance. This waveform (and its odd multiple) process a number of "critical points" for resonance; resonance is changed when a structure is deliberately moved into that spot. These points are those where there may be "maximum pressure" or "maximum velocity of movement" If something blocks or narrows "maximum pressure (p)" this will raise that particular formant frequency. If an area of maximum velocity (v) is penetrated, the formant moves downward in frequency.
Release Burst
Aperiodic sound following silent gap
Lateral Liquid
Area behind tongue acts as shunt resonator = antiformants. (Whenever there's an extra cavity, you get antiformants) Energy largely in lower frequencies. Great deal of variability in formants, F2 influenced by surrounding vowels.
Diphthongs: Offglide
Articulatory ending point of the diphthong
Diphthongs: Onglide
Articulatory starting point of the diphthong
Steady-State Formants
Articulatory there is little or no movement.
formants
As pitch changes, the harmonics move through the (blank)
Speaking Rate
As you speak slower, the duration of your sounds will get longer (vowels more than consonants) 1. Segmental Duration: you're just making sounds longer by the rhythmic pattern is maintained 2. Pause Duration: also slows down the speech, but not fluidly, doesn't sound natural 3. Syllable Deletion: drop sounds to speed up speech, less intelligible 4. Undershoot: when you don't get the full sound but you approximate
aspirated
after release
Aspiration during VOT
Audible release of air between noise burst (plosive) and the following vowel -VOT for aspirated consonants are typically longer; sounds breathy
4 Kind of Info available for Feedback
Auditory Tactile Proprioceptive Central Neural
~9 cm
Average length of a child's vocal tract
~15 cm
Average length of the female vocal tract
~17.5 cm
Average length of the male vocal tract
good
BETWEEN categories of sounds, discrimination is good or poor?
Antiformants
Bands of frequencies with damped acoustic energy
Frequency Domain (Spectrum)
Based on Fourier Transform
Mechanical Resonance
Based on Mass, Stiffness, and Resistance Mass & stiffness results in a "natural frequency" Resistance results in a decay of amplitude over time called "damping" Tuning fork has excellent resonance characteristics It has one natural frequency and slow decay Energy imparted to a tuning fork results in free vibratory motion at the natural frequency or resonant frequency Other objects have varying quality resonance curves
your practice
Based on theory: if you think your speech is special, would you use non speech sounds / if you think general auditory skills are the most important maybe you would use non speech Based on evidence: new info on mirror neurons / new info on mcgurk effects in autism Based on clinical experiences: integrate theory, evidence and practice
Onglide
Before transition-relatively steady state formants
Release burst/stop burst
Brief, transient aperiodic noise burst following the silent gap. 10-30ms (longer for vl). Vertical line extending into high frequencies. Usually seen in positions other than final. Bursts of voiceless stops are longer than voiced and include aspiration (noise generated by turbulence as air moves through the glottis during the time in which the folds are starting to close for the following voiced sound).
Rhotic Liquid
Bunched or retroflex, retroflex production results in a slight lowering of F3, bringing it closer to F2.
4 months
By what age can infants discriminate basic contrasts of their native language?
CAP vs CAB
CAB is longer because of continous vocal fold vibration, but no VOT for either *previous letter helps determine what comes next
What is happening- 'cats' is produced with a word- final /s/. However 'Dogs' is produced with a word-final /z/
Carry-over (left to right) assimilation
Categorical vs. Continuous Speech Perception
Categorical Perception of Consonants: there are limits to their acoustic boundaries -VOT could vary between -20 and 20 and we wouldn't hear a difference Continuous Perception of Vowels: no given place that is one absolute vowel sound, it is a continuum that you can blend from one category into another
2nd Acoustic Cue of Stops: Release of Burst
Caused by turbulent air. Observed in spectrogram as gray area.
Complete Assimilation
Change from an allophone of one phoneme to an allophone of another phoneme
Formant transition
Changes in formant frequencies that occur during the transition from one speech sound to another
Liquids
Clear tongue positions, F3 is low compared to other sounds
How do you change from a narrow band to a wide band in praat?
Click spectrum spectrogram setting delete one 0 in window length (wide- band = .005 SEC, narrow band= .05 SEC)
Two Occlusions of Stops
Close of VP port, Closure of tongue/lips (bilabial, lingua-alveolar, lingua-palatal)
Affricates
Combination of stops and fricatives
Most of the sounds we hear comes from multiple incoherent sources, this means that they:
Come from independent sources
Most of the sounds we hear comes from multiple incoherent sources. This means that they: - are binaural -Come from independent sources -Do not add logarithmically -Are samples of "frozen noise" -Are the result of turning up a volume control
Come from independent sources
How to produce a liquid
L: tongue may contact alveolar ridge, air comes out laterally around tongue, vf vibrate. R: tongue bunched back or pointed backward and air flows around the tongue. Look similar to vowels. Formant transitions to surrounding vowels.
How to produce a stop
Complete blockage of the vocal tract, pressure builds up, then it bursts and is released. Voiced=vocal fold vibrations
How to produce affricates
Complete obstruction of the airway. Pressure builds then longer burst of energy/noise.
Stop Gap
Completely obstructing the flow of air that is coming through the vocal tract -Produced in syllable initial position, final position, or continuous speech/syllables -Articulator is pushed up against another one to constrict the air (and sound) from coming out -50-150 msec. -Voiced Plosives: voice carries over from the voiced consonants around it (Voice Bar present) -Voiceless Plosives: "buttercup" would have no voice bar, but we're lazy and say "buddercup"
Voicing?
Complex periodic
What are acoustic speech sound characteristics
Complex periodic - vowels Random - voiceless fricatives Complex periodic + random - voiced fricatives Transient - plosives (burst) Quiescent - plosives (closure)
For which types of sounds are harmonics present?
Complex periodic sound
What are non-resonant consonants
Consonants that have restricted air flow
Features of Consonants
Constriction of airflow (valve) = turbulence -Aperiodic: disturbed airflow or noise -Anterior to the point of constriction determines the acoustic spectrum of that sound -Characteristics dependent on acoustic production and perception cues
Perturbation
Constriction of vocal tract
Noise?
Continuous aperiodic
What are formants?
Created by the filter, create odd number multiples of the fundamental frequency, as a result of RESONANCE
Oral Constriction
Created by the position of the tongue relative in relation to oral cavity space, often the hard palate
How do you make a diphthong sound like a pure vowel?
Decrease the transition time between vowels
place
Depending on the ______ of articulation, certain frequency ranges will be given more amplitude and others will be attenuated.
Neural Processing and Suprasegmentals
Depends on language and task, what is the listener trying to process? -Left Hemisphere = syntactic information for the right hem. -Right Hemisphere = prosodic information at phrase and sentence level -Superior Temporal and Interior Frontal Cortex -Tonal languages (e.g. mandarin) have suprasegmentals at the word, phrase and sentence level that carry significant syntactic information
Tense Vowels are produced with greater muscle constriction; produced at the extremes of articulatory posture, with tongue higher in oral cavity; tense vowels are longer; lax vowels are shorter
Difference between tense and lax vowels
Males have lines closer together for their harmonics because the harmonics have a lower Fo Females have wider spaced harmonics (wider gap between lines) Females have a higher Fo Females have shorter vocal folds
Differences between male & female harmonic structures
Contrastive Stress
Differentiate between two words that differ only by a syllable
Stridents
Directing air flow against a surface, more intense acoustic energy
Shearing
Displacement along the vocal folds when they come back together (Displacement is both lateral/medial and anterior/posterior)
Harmonic Spacing
Distance between harmonics on a spectrum
Which item below is not an advantage of digital over analog representation of sound -Flexibility, once the speech has been encoded -Absolutely perfect copies of original recording -Virtually perfect recordings -Ease of cataloging (tagging, etc) -Distortion is impossible
Distortion is impossible
Antiformants/Nasals
Divergence of air into oral cavity introduced ______ into the picture, as some of the sound energy is trapped within the oral cavity. Opposite of formants. ______ act as stop-band filters, damping the harmonic frequencies. Look like weak-intensity formants. Frequencies of the _____ depend on where the oral blockage occurs.
Acoustic features of Voiceless Stops
Do not have continued phonation throughout period or closure In the pre-vocalic position, this continued phonation feature is not usually present
Transfer Function
Doesn't represent sound. Represents frequency response of vocal tract. Shows formants. It's the filter.
What is the X axis on a spectrogram?
Duration
What is the X axis on a waveform?
Duration
What is the definition of a period?
Duration of a cycle/ time it takes for a cycle to complete
Formant transition
During the formation (closing phase) of a stop occlusion and just after an occlusion is released, the rapid movements of the articulators cause sudden changes in the resonance peaks of the vocal tract. These changes occur during the transition from one speech sound to another. The rapid change in frequency of a formant for a vowel immediately before or after a consonant. The F2 transition is a very important acoustic cue to the *place of articulation* of a consonant. The F1 transition signals information about the *manner of articulation* of a consonant. Changes in formant frequencies that occur during the transition from one speech sound to another (that's the simpler definition from the book)
How is a complex sound displayed in a spectrum?
Each line represents a harmonic. horizontal axis = frequency vertical axis = amplitude (dB)
3 Connected Tubes
Each tube has its own natural/resonant frequency and, therefore, responds better to a different range of frequencies. The resonant frequency of the entire system is different from each of the separate tubes.
What differentiates diphthongs from monophthongs on a spectrogram?
Even where vowels are perceived as being steady-state monophthongs, the acoustic representation often indicates some articulatory movement. Although diphthongs typically show marked shifts in the spectrographic patterns, this varies depending on lots of things. Therefore, it's not always easy to make a categorical distinction between monophthongs and diphthongs.
-Introducing antiformants and dampening acoustic energy -Introduction of noise from turbulent nasal airflow emissions -Decreasing intra-oral air pressure, thereby decreasing clarity of consonant production
Excessive nasal resonance can decrease intelligibility by:
Fricative at 1200 HZ?
F & TH
Diphthongs and semivowels are characterized by some form of change in formants
F 1, 2, 3
Difference between /f/ and /v/
F's duration is long, v's duration is shorter. -Both have light noise, and start anywhere above 500Hz
What is the difference between F0 and the formants?
F0 is a source (glottis) characteristic Formant frequencies are contributions of the filter (vocal tract)
Independence
F0 is the rate of vocal fold vibration. The source function and the transfer function are relatively independent of one another: source and filter aren't necessarily connected; you can change one without changing the other. Harmonic spacing will change with varying F0.
intonation
F0 varies over longer stresses (global vs local)
What is the equation to calculate fundamental frequency?
F0=1/p
vocal tract transfer function for male
F0=100, F1=200, then 400 and etc. this is a smaller range since F=0
vocal tract transfer function for female
F0=200, then 400, and so on
What acoustic information is necessary to perceive vowels
F1 and F2
What formant frequencies are essential for the acoustic analysis of voiwels
F1 and F2
front vowels
F1 and F2 are far apart and F2 adn F3 are close together
Semivowel Liquids
F1 and F2 are similar -Formant Structure /r/ 1. F3 Transition: rapidly falls and then rises for a VCV 2. Dark /r/ - CV "root" = Posterior Tongue 3. Light /r/ - VC "early" = Palatal Tongue -Formant Structure /l/ 1. Complex (lateral emission of air) 2. Similar to /r/ without lowering of F3 because you are splitting airflow: creates anti-resonants, wider bandwidths and a murmur
VC for /w/
F1 decreases and F2 decreases
VC for /j/
F1 decreases and F2 increases
When vocal tract closes in preparation for final stop production...
F1 falls
manner
F1 formant transition tells about blank?
How are vowels distinguished by the frequency position of mainly F1 and F2?
F1 frequency rises with increasing openness of the vowel - the higher the vowel, the lower the first formant F2 frequency rises with increasing frontness of the vowel F1: opener vowel = higher frequency F2: fronter vowel = higher frequency
CV for /j/
F1 increases and F2 decreases
CV for /w/
F1 increases and F2 increases
glides
F1 profile and reduction in amplitude
When vocal tract is opened after initial stop production...
F1 rises
CV: formant transitions
F1 transition always moves from low frequency up to the vowel's F1 direction of the F2 transition is sensitive to place of articulation for the stop consonant
rises
F1 usually ___ for stops
Vowels
F1, F2, & F3 peaks shift. every Person has it in a different place for the same sound. We know which vowels are which because of the relationship between F1, F2, and F3. We don't look at value of frequency but the spectral envelope or relationship between formants.
F1, F2, and F3 placement characteristics
F1: Oral Cavity F2: Tongue Shape F3: Tongue Tip
/i/ typical formant pattern
F1= 270 hz, F2= 2290 Hz
Typical formant pattern for /u/
F1= 300Hz, F2= 870 hz
Typical formant pattern for /a/
F1=730 Hz F2= 1090 Hz
This vowel formant is influenced by tongue advancement/location of constriction
F2
bilabials (CV)
F2 and F3 dont rise as steeply
velars (CV)
F2 and F3 start together
place
F2 plus F3 formant transitions tell about?
changing the place of articulation of a nasal changes the ____
F2 transition locus
The later [l] and rhotic [r] are characterized by relatively neutral position of formant ____
F2.
The defining formant characteristic of /l/ and /r/ includes
F3
The acoustic difference between /r/ and /l/ resides mainly in the...
F3 formant
WHat is the difference between axes on line spectra and waveforms?
Line spectra show amplitude by freq, waveforms show amp by time
What differentiates the [l] and [r] is the presence or absence of a sharp drop/rise in formant _______. explain the kind of movements that need to occur for r-coloring.
F3. [r] is the only sound/phoneme for which this drop is clearly seen. it's established by moving the tongue tip or back of the tongue in critical posts for affecting the third formant. there are two such posts resulting in retroflex or bunched production of this sound. it is not enough to simply hold the aforementioned parts of the tongue in these pots. there has to be a movement to or from or both for the F3 drop to occur and produce a subsequent 'r-coloring'
the liquids /l/ and /r/ differ on the basis of ___, which is low in ___ and high in ____.
F3; /r/; /l/
T or F: Resonances (formants) do not characterize the acoustic signal for any consonant sounds...
FALSE
T or F: Voice onset time is the time elapsed between the onset of articulatory closure and the release of a stop...
FALSE
Center Frequency
FC. Greatest vibratory response/natural frequency. Depends on characteristics of resonator.
Lower Cutoff Frequncy
FL. Frequency below FC. Noted by line; left of line = unresponsive
Upper Cutoff Frequency
FU. Frequency above FC. Noted by line; right of line = unresponsive
What is the passive theory (sensory only) by Fant?
Fant: specialized innate/built in sensory filtering mechanisms are responsible for perception, which are based mostly on acoustic distinct feature theory. a common pool of distinct features for production/perception for speech is presumed to play a role. these features are thought to be located i the linguistic center of the brain. Presumably, there are innate templates (acoustic) for matching and feature detectors. the perceptual features of speech are more or less detected automatically. do not need to refer to production to perceive speech.
Development of Suprasegmentals
Fetus: responds to sound stimuli and prosody during 3rd trimester 0-6 months: response to biologically driven needs -2-3 months: linguistic discrimination emerges based on adult prosody -6 months: production of wide range of suprasegmentals -6-12 months: learned prosodic patterns of pitch, rhythm and pausing ->12 months: integration into adult like patterns
constriction
Filtering is imposed by the cavity in front of the _________ and in certain conditions by the cavity behind the _______.
Wide-Band Filter
Filters more frequencies at once than narrow filter
___ formants change due to mouth openings and tongue position
First
Diphthongs are characterized by what?
First 3 formants
Fricatives Production
Narrow constriction but not complete occlusion.
Fricative on Spectrogram
Narrowing, obstructed (not completely) Tend to Have Higher Frequency Variation
subglottal pressure can increase ____
Fo
Research indicates that the greatest cue for stress is _____ followed by _____ and then ______
Fo, duration, amplitude
4
For a tube closed at one end and open at the other, the tube will resonate best (the natural resonant frequency) at a frequency that has a wavelength that is how many times the length of the tube?
F1 increases and relationship to F2 is unclear
For vowel height, what happens as the back vowels become more open (low)?
F1 increases and F2 decreases
For vowel height, what happens as the front vowels become more open (low)?
Antinode
For what is formant frequency lowered by constriction?
Node
For what is formant frequency raised by constriction?
Node
For what is minimum volume velocity or maximum pressure?
Antinode
For what is volume velocity maximum or pressure minimum
Stress
Force when the vocal folds come back togethers. The actual contraction of the muscle
For formant 1, where is is critical pressure point at and its critical max velocity point at?
Formant 1 has its critical pressure point at the glottis; and its critical max velocity point at the oral opening.
Which formant is said to be mostly responsive to tongue front to back positions?
Formant 2. Tongue front makes f2 go up; tongue back makes f2 go down.
Which formant results in the perception of the presence of the [r] sound?
Formant 3. Formant 3 identifies that an [r] happened. Retroflex [r] the tongue tip curls up behind alveolar ridge. in a bunched [r] the back of tongue moves close to velum.
2 Rules of Perturbation
Formant Frequency is raised by constriction at the nodes & Oral Cavity. Formant Frequency is lowered by constriction at the antinodes & Pharyngeal cavities
Which is the loudest formant?
Formant I (lower frequency)
Cues to Nasal Place
Formant Transition (mainly F2) provide place cue for nasals (as demonstrated by Malecott's 1956 experiment)
Nonresonants lack...
Formant structure and openness of resonants
Shift
Formant transition
Stop-Glide-Dipthong Series
Formant transitions goes from short to medium to long.
There is no end of combs of formants ___ & ___ in vowels in the mouth and vocal tract.
Formants I and II.
Formants 1, 2, 3
Formants responsible for differentiating vowels
What is the X axis on a spectrum?
Frequency
What is the Y axis on a spectrogram?
Frequency
Spectrum
Frequency (x) by Amplitude (y)
The sound spectrum shows..
Frequency and amplitude DOES NOT SHOW:wavelength, phase, period
What is meant by place theory?
Frequency is encoded in the ear according to location
Spectograms
Frequency, Time, Amplitude
Frication
Fricative noise. Has a narrow spectrum. It's concentrated at different frequencies depending on the specific consonant.
In what consonant do articulators form constrictions and occlusions within the vocal tract that generate aperiodicity ( as in noise) when air flows through them?
Fricatives
The nonresonant consonants of English are the ____, the ____, and the _____.
Fricatives, affricates, stops
Fricative Spectral Energy
Front cavity will act as an amplifier-filer; as you change the shape and length, it will amplify different frequencies (acoustic features of the front cavity make the difference in sound) 1. Strident (Concentrated Spectrum) -Large front cavity (s, z, sh, dz) -Those that have concentrated energy; smaller frequency range -Greater power 2. Non-Strident (Diffuse Spectrum) -Small front cavity (f, v, th) -Energy is spread out over a much wider band -Less Power
What is the difference between fundamental frequency and pitch?
Fundamental Frequency is the rate at which a system vibrates, while pitch is the perception of fundamental frequency.
What changes the harmonic series?
Fundamental frequency
What is the difference between fundamental frequency and formant frequency?
Fundamental frequency is the rate at which the vocal folds vibrate and formant frequency is the size and shape of resonating cavities
Period of glottal wave (area over time) depends on ___.
Gender and age
normative data
Given that formant frequency depends upon vocal tract length & resonating cavity size, no absolute values for F!, F2, & F3 exist.
Formant Transitions
Going from a plosive shape to a vowel shape -Voicing starts, vocal folds start vibrating, bending of formants to get to steady state Acoustic Features (Consonant-to-Vowel formant blending) F1= Manner of Production/Degree of Constriction F2= Place of Articulation -Tell you what consonant and vowel are being produced -Stops will always bend up when looking left to right -The starting point for F2 will be pointing to the spectral energy for that consonant (e.g. /b/ will be between 500-1500 Hz)
Front vowels have a ___ f2
HIgh
Where do release bursts occur?
Initial and medial stops; bursts are longer for voiceless and shorter for voiced sounds
Primary resonance for nasal consents?
Nasal Murmur (250HZ)
White noise has all the following characteristics except:
Harmonic-Based
Voiceless consonants
Has a supraglottal noise sources; aperiodic laryngeal source (noise, aspiration)
Stops
Have the greatest amount of breath stream obstruction
Combined Tubes: /i/
Helmholtz resonator plus a tube open at both ends and a tube closed at both ends
When Does Subglottal Pressure Affect Intensity?
High Frequencies only
High vowels have a ___ tongue body or a ___ F1
High, low
Acoustic Characteristics of Stressed Syllables
Higher F0 Greater duration Greater intensity
Properties of Stress
Higher Fundamental Frequency Intensity (Louder) Longer in Duration
Primary Stress
Highest level of stress, usually seen on the second syllable
The second phase of stop production is called the...
Hold or closure/stop gap
Maximum Flow Declination Rate
How Quickly the vocal Folds Close * the slope down ~People who have problems with this are more likely to have a higher declination rate ~It closes faster because of Bernuolli and Mass Model
What are feedback mechanisms of speech important for?
How a speaker controls production of speech, like how much does the speaker monitor his actions or how does a speaker produce speech with little or no feedback regarding speech output?
Lateral X-ray Image, Axial CAT Scan, Ultrasound, Computed tomography (CT), Magnetic resonance imaging (MRI)
Instruments used to visualize the vocal tract
Harmonics
Integer multiples of the fundamental frequency H1 = F0 H2 = 2F0 H3 = 3F0
Attenuation Rate/Role-Off Rate/Rejection Rate/Slope
How rapidly the resonator decreases in its intensity of response to different frequencies. Measured in dB/octave. Less than 18dB shallow. +90dB deep.
25 ms
If VOT is 25 ms or greater= the plosive is voiceless. If VOT is less than 25 ms, the plosive is voiced.
What produces nasal resonance?
Nasal cavities
Aperiodic
If a wave does not repeat, it is represented like any other data. The x axis is usually time or distance, can only draw specific conclusions about the part of the wave we measured
Periodic
If a wave repeats, its more efficient to show just one cycle of the wave and indicate that it repeats, x-axis is degrees or radians, draw bigger conclusions about the wave
If something blocks or narrows maximum pressure (p), this will _____ that particular formant frequency.
If something blocks or narrows maximum pressure (p) this will RAISE that particular formant frequency.
What is the difference between conductive and sensorineural hearing loss?
If the hearing loss is just a conductive loss, then bone conduction thresholds should be the same as for normal hearing. If bone conduction thresholds are raised, there is impairment of the cochlea, auditory nerve or the higher auditory nervous system.
Nasals Description
Nasal consonants are produced with nasal radiation of sound
The tube will resonant best (the natural resonant frequency) at a frequency that has a wavelength that is 4x the length of the tube
Important "rule" for a tube closed at one end (when the tube will resonate best)
damped
In nasals, the acoustic transmission tends to be heavily _____ do to increased length and absorptive nature of nasal cavity
What are the physical properties of a sound?
Intensity/amplitude Duration/time Frequency
Place Cues for Liquids
In order to distinguish /r/ from /l/ the first 3 formants are needed, but F3 can separate /r/ from /l/ (/r/ has a low F3 of about 1500 Hz)
Extrinsic Muscle Activity
Indicates glottal inefficiency
What is the relationship between period and frequency?
Inverse relationship- the shorter the period the higher the frequency
Relationship of Volume and Pressure
Inversely related
no
Is every /d/ the same? (yes or no)
The above plot shows the sum of two sine waves, one at 100 Hz and the second at 200 Hz. The resulting wave:
Is periodic
What is descriptive research?
It describes variables of importance, describes their differences, or describes their relationships (spectrography very helpful)
Describe the pascal to dB SPL conversion formula.
It is a logarithmic scale of relative amplitude.
What was another study about VOT by Eimas et al on categorical perception?
It was a study conducted on very young babies who do not know about speech yet; it was to prove that such babies are biologically ready to have categorical perception for voiced voiceless (with VOT around 20ms).
Why was 20 micro Pascals chosen?
It was estimated that a sound of this size was at the threshold of normal human hearing.
Why is aspiration useful?
It's a way to make voiced/voiceless distinctions when examining the acoustics of stops
What suprasegmental feature is demenstrated: A name vs. An aim
Juncture
When given a pair of phonemes (e.g., /p/ and /b/) be able to identify the acoustic cue that differentiates the two.
KNOW THE CONSONANT CHART!
Jedbbdn
Kdndndn
VOT and Age
Kids produce distinct VOTs around 11 years old. Before then, short lag. Elderly have more variability.
What stop has a lower F2(800hz) and F3
Labial
Which places of articulation of fricatives have broad frequency bands
Labio- and linguadentals
Which places of articulation of fricatives have very low energy formants
Labio- and linguadentals
Constrictions in Fricatives
Labiodental Linguadental Alveolar Palatal
what places of articulation make up non-sibilants
Labiodentals and linguadentals
Lip movement Jaw movement
Later descriptions
If the vocal folds close faster, slope: energy:
Less steep of a slope Greater energy
glides (semivowels)
Lingual-alveolar /j / & bilabial /w/
Approximates
Liquids and glides, Characteristics: Limited articulatory constrictions that alter resonant frequencies, Classification based on syllable position, Formant transitions typically faster than for vowels
In order to properly model the vocal tract with acoustic tubes, the wavelength must be _____ with respect to the length of the tubes.
Long
Frication
Long duration of energy, continuous sound, complex aperiodic waves
Vowel on Spectrogram
Look for the formants Vowels are always voiced
Diphthong on Spectrogram
Looks like packman, kind of
Nasal Formant
Low energy from nose. A very low frequency, high intensity component of a nasal. Nasals have additional formants about this (called nasal formants, N1...) but the antiformants are more important in characterizing a nasal.
Pharyngeal (Glottal) /h/
Low energy, broad spectrum
What are the formant values for /i/ and /j/
Low f1, high f2
Which voice will be richer in harmonics? One with a low or high fundamental frequency.
Low fundamental frequency
Sonorant
Nasals, liquids, and glides which are similar to vowels. They are characterized by free airflow, articulation shapes vocal-tract cavities, formant frequencies, and periodic laryngeal source meaning they are all voiced.
• Source=vocal fold vibration; all vowels, many consonants
Nearly periodic complex waves source and examples
To produce a vowel..
Need relatively open vocal tract, vocal folds must vibrate, and tongue is in a certain position in the oral cavity/may or may not have lip rounding.
Types of Filters
Low-pass: lets in low frequencies, amplitude is high in lower area High-pass: Lets higher frequencies through, amplitude is higher in higher area Band-pass: Lets middle frequencies get through, amplitude lower at lower/higher ends Band-stop: Lets low and high through and not middle frequencies, amplitude is highest at lower/higher ends (Amplifies the ones let through and weakens the ones not let through)
Higher the vowel
Lower F1
Increased oral cavity length
Lower F2
Which voice is richer in harmonics? Male, female, or a child.
Male
How do you tell the difference between male and female /i/
Male and female speakers will have the same spectral envelope for the same sound, just different frequencies
Simple Harmonic Motion creates sounds that:
May be plotted as a sine wave
Glides
May involve a gliding motion from a partly constricted state to a more open state -Palatal Glide /j/ -Labio-Velar Glide /w/
Sound Pressure (dB SPL)
Measure of pressure at a location, level at receiver. level the ear hears or the microphone transducer. (dB SPL Equation)
-Airflow as it is emitted from the nasal cavity -Nasalance
Measuring factors contributing to nasality:
The amplitude of a sound wave is typically measured using the RMS or Root Mean Square method. This has the effect of:
Measuring the equivalent of static pressure
Elastic Medium
Medium must be able to keep the pressure disturbance going beyond the initial point of change
Stops (/p/, /b/, /t/, /d/, /k/, /g/)
Place: bilabial, alveolar, and velar Voicing: voiced and voiceless Manner: complete blockage of airflow, rapid pressure change
/h/
Place: glottal Manner: fricative Voice: voiceless Aperiodic Turbulent Friction, Muscles: Lateral Cricoarytenoid, Manner- presence of aperiodic noise, Place- low, Voice- No voice cue
F3
Most important in distinguishing rounding of the lips. If American English has a round/unround pair at a certain height, this might be more important.
Nasal formants
Most intense, lowest frequency
Diphthongs
Move from one steady state to another. Movement from the characteristic formants of one pure vowel to another. (Resonance characteristics change during production.) Longer in duration than monophthongs/glides. Onglide/offglide: relatively steady state formants of the onglide, then transition.
Does resonance change from one nasal to the next?
No because the nose is made out of cartilage that can not be moved
Partial Assimilation
No phonemic change occurs in the sound, only a phonetic change
/r/
No tongue tip contact with alveolar ridge, often retroflexed, often has lip rounding
Is there pressure build up for vowels?
No, relatively open vocal tract positions so there is no pressure build up around a closure (no distinctive elements of noise, like consonants). Air just goes through.
Can consonants can stand by themselves and be meaningful?
No, they nearly never can stand by themselves and be meaningful.
Aspiration
Noise generated by turbulence as air moves through the glottis during the time in which the folds are starting to close for the following voiced sound
What is a burst?
Noise produced at the place of articulation (when the closure is released)
Given that formant frequency depends upon vocal tract length and resonating cavity size, no absolute values for F1, F2, and F3 exist
Normative data on formant frequencies
Segmentals
Not defined by individual speech sounds Duration: juncture and length of phonemes
What is juncture?
Spacing in speech (we don't speak with a lot of breaks)/ the relationship between sounds within words or between words within continuous speech
Turbulent airflow
Obstacle disturbs the flow of air.
Acoustic affect of introducing obstruction into a turbulent airstream such as we produce fricatives?
Obstruent increases the degree of turbulence and increases the intensity of sound
Nasals
Occluded oral cavity, the nasal cavity has very different acoustic features -Built up pressure behind the point of occlusion, the sound and pressure will back up and be diverted through the nasal cavity -Split airflow and sound= Anti-Resonances --suppress harmonics -Total constriction in oral cavity -Can serve as the syllable nucleus (e.g. "button")
Formant Transitions
Occur from a voiced sound preceding a stop or from a voiced sound post-stop, or both
Tube Resonance/Resonant Frequency of a Tube Characteristics
Occurs in any tube/pipe that contains air. A tube's resonant frequency is related to its physical characteristics: tube length, tube geometry, status of the tube ends.
frication
On a spectrogram, ______ looks like a wide band of energy distributed over a broad range of frequencies.
weak
On spectrograms, antiformants look like extremely ___ intensity formants
filtering
Once turbulence is generated, the noise energy is subjected to ______ by the vocal tract
Consonants
One or more areas of constriction of vocal tract Source of sound • Voicing • Turbulent airflow • Or both • Less energy, greater meaning Consonants differ greatly • Degree of constriction • Presence or absence of noise • Nasality
Consonant production
One or more areas of the vocal tract are narrowed by some degree of constriction. All consonants have a manner, place, and voice.
An acoustic tube model of the vowel /a/ has formant frequencies determined by the resonances of what types of tubes?
One or more tube(s) open only at one end.
Tubes Closed at Both Ends
One-half wavelength fits nicely So does a full wavelength And so does one and a half Formula: (picture)
The velopharyngeal port must be ___ during the production of nasal consonants.
Open
Open Tube
Open at both ends. The end pressures are atmospheric/ambient pressure. Half a wave fits inside the tube = Half-Wave Resonator. Areas of greatest pressure are somewhere in the tube, never at the ends. Node will always be at the ends. Starts with half a wave (1/2) for F0. Then increases by 1/2 each resonant frequency: 2nd resonance = one whole wave (2/2), 3rd resonance = 1.5 waves (3/2) and so on.
Closed Tube
Open at one end and closed at other. Closed end contains antinode/greatest pressure. Pressure at open end is ambient pressure. 1/4 wave fits inside the tube = Quarter-Wave Resonator. Starts with 1/4 wave for F0, then increases by 2/4 wave each time: 2nd resonance = 3/4 wave, 3rd resonance = 5/4 wave and so on. F0 has wavelength 4x the length of the tube. Higher resonance frequencies are odd number multiples of F0; harmonics for the resonator are odd number multiples.
Verticle Phase difference
Open inferior to superior
Differences Between Vowels and Consonants
Open vs. Closed VT Aperiodic noise source for most consonants Consonants (generally) cannot serve as the syllable nucleus
PRTU
Order of the graphs for acoustic characteristics
/ð/
Place: interdental Manner: fricative Voice: voiced Periodic Laryngeal Source, Muscles: Superior Longitudinal, Manner- presence of aperiodic noise Place- low, Voice- Presence of Phonation
Open Loop Feedback
Output is preprogrammed, no feedback needed
lip rounding
Over-simplified categorization of place of articulation
Fricative place of articulation
Oversimplified categorization of place of articulation. Acoustic evidence for place of articulation: fricative noise spectrum, formant transitions
Source filter theory
P(f)= U(f) . R (f) P=spectrum of the sound pressure wave exiting the lips. U=glottal volume velocity T= transfer function of the vocal tract R=radiation characteristics at the lips
• P=spectrum of the sound pressure wave exiting the lips • U=glottal volume velocity • T= transfer function of the vocal tract • R=radiation characteristics at the lips
P(f)=U(f)*T(f)*R(f) means...
Relationship between frequency and period
P=1/F (reciprocal relationship) e.g. .005 seconds=1/2000 cycles
Voiced consonants have...
PERIODIC laryngeal source
Lexical Stress
Pattern of stress within words
In the plot shown, the solid line labeled "A" indicates the _______ pressure
Peak
Closed Loop Feedback
Performance of system is fed back in for check
3rd Acoustic Cue of Stops Voice Onset Time
Period of time between release of stop and the onset of voicing
1st Acoustic Cue of stop: Silence
Period of time between stopping of airflow and continuation with phonating
Semivowels
Periodic sound wave; open voicing and semi-constricted -Subdivisions: glides and liquids -Characteristics: 1. Constriction interval <100 msec. or 40-50 msec. for less carefully articulated speech 2. Initial articulatory position (can be in front or have a vowel in front of it) 3. Rapid transition to vowel (60-100 msec.) similar to a diphthong -Perception: 1. Formant Transitions (F2, sometimes F3) 2. Location of semi-constriction
/b/
Place: bilabial Manner: stop Voice: voiced Periodic Laryngeal Source, Muslces: Orbicularis Oris, Levator Palatini, Manner-silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice- +phonation, -phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration (final position)
/p/
Place: bilabial Manner: stop Voice: voiceless Transient Aperiodic Laryngeal Source, Muscles: Orbicularis Oris, Levator Palatini, Manner- silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice- +phonation, -phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration (final position)
Source Spectrum
Phonation from the Larynx
The _______ is minimum amount of pressure needed to sustain vocal fold vibration
Phonation threshold pressure (PTP)
What are the 4 different ways to measure fundamental frequency?
Pitch Contour Analysis Voice Report NarrowBand Spectrogram Waveform
Syllable stress as a result of these 3 things
Pitch, length, duration
Cognates
Place and manner of articulation are the same. Voicing varies.
/l/
Place: alveolar Manner: liquid Voice: voiced Periodic Laryngeal Source, Muscles: Levator Palatini, Palatoglossus, Rapid formant changes; damping
/n/
Place: alveolar Manner: nasal Voice: voiced Periodic Laryngeal Source, Muscles: Levator Palatini, Palatoglossus, Place- high average duration
/d/
Place: alveolar Manner: stop Voice: Voiced Periodic Laryngeal Source, Muscles: Superior Longitudinal Muscle, Levator Palatini, Manner- silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice +phonation, -phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration (final position)
/t/
Place: alveolar Manner: stop Voice: voiceless Aperiodic Laryngeal Source, Muscles: Superior Longitudinal Muscle, Vertical muscle fibers, Levator Palatini, Manner- silent or near silent closure interval; transient release burst, Place- F2 transitions, frequency of most intense portion of release burst, Voice- +phonation, -phonation, presence or absence of aspiration, VOT/F1 onset, closure duration (medial position), preceding vowel duration final position)
/m/
Place: bilabial Manner: nasal Voice: voiced Periodic Laryngeal Source, Muscles: Levator Palatini, Palatoglossus, Anti-resonance; nasal murmur;, F2 transitions, lowest and highest
Formant Transitions
Position from the on glide to the offglide
Descriptors of Consonants
Presence or absence of voicing Place of Articulation Manner of Articulation
The Pascal (Pa) is a unit of
Pressure
Noise Burst
Pressure release escaping the point of constriction 1. Manner Cues: -Duration: 5-40 msec. (average 10 msec.) -Rise Time: 10 msec. 2. Place Cues: Primary Spectral Energy (varies with vowel context) -Frequencies determined by where they are produced in the oral cavity
What are harmonics?
Produced by the source, create whole number multiples of the fundamental frequency
Tense Vowels
Produced with greater muscle contraction and produced more according to the vowel quadrilateral
Vowel Production
Produced with more open vocal tract
Monophthongs
Produced with relatively constant tongue positions. Steady state vowels. Wide, dark stripes. Concentrated intense energy around harmonics that are amplified near resonances. Corner vowels: tongue far from neutral position as possible.
Sound Spectography
Provides a spectral picture of the acoustic wave
Linguistic Stress
Putting more emphasis (meaning) on certain portions of the utterance than others -Needs more physiological emphasis (more power) -Acoustic Features of stressed syllables/words: 1. Higher F0 2. Greater Amplitude 3. Longer Duration -Levels of Stress: syllable level or sentence/phrase level -Pauses, Duration and Utterance Junctures: --Occlusion duration signals syllable, word or utterance juncture
What is the speech perception theory that is related to it all but may not be a theory of speech perception per se?
Quantal Theory by Kenneth Stevens. He looked at all languages in the world and what they have in common. languages of the world use perceiving acoustic features that are associated with certain critical regions where we form consonants (apparently this hypothesis does not effectively apply to vowels, however). certain articulatory changes produce little acoustic change, whereas other minimal adjustments have major acoustic consequences. the quantal theory predicts hat the languages of the world have largely formed around these so-called "quantal changes" in acoustic speech signals. this theory doesn't carefully separate acoustic and perceptual features; it tends to focus on acoustic changes. our hearing system seems to work better at the freq ranges these quantal changes occur, and provides evidence for theory.
Digital Recording Problems
Sampling Rate -Aliasing, Reduced Bandwidth Quantization -Peak clipping, reduced dynamic range, quantization noise.
subglottal pressure
air pressure immediately below the glottis
Glides
Quick tongue movements, short duration, characterized by formant transitions
How can you identify vowels on a spectrogram?
RATIO of formants
In the plot shown, the dashed line labeled "B" indicates the _________ pressure.
RMS
Transient Aperiodic Waves
Rapid pressure change some consonants (/p/, /b/)
Source spectrum and resonant spectrum are NOT related
Reason formants DO NOT change on spectrogram when the fundamental frequency changes
consonant clusters
Spanish does not have these, while English does (it depends on the language)
equation under standing wave patterns
Recall, that λ = c/f Rearrange to Fn =c/ λ Fn = formant number c = velocity of sound (34,000 cm/sec) L = vocal tract length (17.5 cm)
Voice bar
Reflects the energy of the F0 of voicing. A dark band at the bottom of a spectrogram.
F2
Related to the length of the oral cavity. If the lips are rounded, the oral cavity is extended. F2 is higher when the oral cavity is shorter. Where tongue is in mouth is where the split is. Tongue advancement: higher frequency when tongue is front and lower frequency when tongue is back.
F1
Related to the volume of the pharyngeal cavity as well as how tightly the vocal tract is constricted. Tongue height: lower frequency when tongue is high and higher frequency when tongue is low.
Transglottal pressure
Relative difference between the pressure above and below the vocal folds ~The "driving" pressure ~needed to keep VF in Vibration
S to Z Reliability
Reliability is questionable and unreliable because you are measuring how you elicit voice, not the quality of voicing
Output Function
Represents the sound when it comes from your lips. System loses 6dB per octave at the lips --> radiation characteristic. Original glottal spectrum has been filtered/transfer function applied. Formants stay the same. Amplitude of frequencies is different. Because of the filter, frequency components attenuated. How sound generated by the vocal folds is modified based on the resonances of the vocal tract.
Why are fricatives thought to be the most precisely articulated or distorted in disorders like dysarthria?
Require a very small degree of constriction to be properly articulated and may require a finer degree of motor control
Silent Gap/stop gap
Silence prior to release. Voiceless stop- initial position=cannot be seen on spectrogram. Other positions=visible as a blank space between the preceding sound and stop. Voiced stop-voice bar is sometimes apparent.
Graphical representation of the frequency and intensity of the sound pressure wave as a function of time
Spectrogram
What is one obvious practical problem with the vocal tract normalization theory?
Research has demonstrated that we can hear vowels correctly, even if a speaker did not use any point vowels up until that point. of course, it may be possible still that the so called speaker normalization process occurs on other parts of speech than merely the point vowels. for example, coughing, throat clearing; consonants; also, the physical appearance of a speaker could somehow be a factor.
What Determines Damping?
Resistance!
What are formants?
Resonances of the vocal tract - the peaks on a spectrum The peak of lowest frequency = F1 The peak of highest frequency = F2
Which anatomical structure do we ascribe formant frequency?
Resonating cavities in the vocal tract
Souce= vocal folds and vocal tract, What is: resonator, sound, manner and examples
Resonator= vocal tract, sound= mixed periodic and aperiodic, manner= voiced stops, voiced fricatives, and voiced affricate, examples= /b/ /g/ /z/ /v/
Source= Vocal folds what is: Resonator, sound, manner, and examples
Resonator= vocal tract, sound= periodic, manner= vowel, diphthongs, semivowels, nasals, examples= /i/ /u/ /ai/ /ou/ /w/ /j/ /m/ /n/
Source= vocal tract, what is: resonator, sound, manner, and examples
Resonators= vocal tract, sound= aperiodic, manner= stops, fricatives, affricates, examples= /p/, /s/, /k/, /f/
Falling Intonation
Results from decreased cricothyroid activity; seen as product of running speech
Rising Intonation
Results from increased vocal fold tension, which result of increased cricothyroid muscle
In general, the male glottal waveform:
Results in a spectrum with more high-frequency energy than that of the female
What involves numerous accented or stressed portions that ...."occur with some regularity, regardless of tempo (fast or slow) or tempo changes within the pattern (accelerate, retard)
Rhythm
During normal inhalation your...(anatomy)
Rib cage goes up and out while your abdomen pushes out
RMS
Root means squared -Most meaningful measure of sound amplitude -for sine wave it equals .707 of the peak value -for any other wave it must be calculated either sample-by-sample or symbolically
Fricative at 4,000 HZ?
S
When given a phoneme be able to identify the acoustic cues for manner of articulation
SEE ACOUSTIC CUES IN CONSONANT CHART
When given a phoneme be able to identify the acoustic cues for place of articulation
SEE ACOUSTIC CUES IN CONSONANT CHART
When given a phoneme be able to identify the acoustic cues for voicing
SEE ACOUSTIC CUES IN CONSONANT CHART
Fricative at 2,000 HZ?
SH
Cul-De-Sac Resonance
Same as nasal resonance but in oral cavity
Spatial Target Models
Say that a speaker can still produce a sound accurately even in the face of disruption
Glides
Semivowels. Different from diphthongs: more rapid Formant transitions than diphthongs. No steady-state portion. Transitions are very short and really just look like movement from one sound to another. Lips typically rounded for the labiovelars, lengthens vocal tract, increases volume, which lowers all formants. F1 for both glides starts very low. /j/ is like /i/ and /w/ is like /u/.
Uniform/Symmetric Resonator
Sharply/narrowly tuned transmits (responds to) a narrow range of frequencies. Narrowly tuned resonator responds slowly to driving frequencies (amplitude grows slowly until it reaches its greatest levels). Sharply tuned resonator is lightly damped (once forced into vibration, takes a long time to fade away).
Resonance Curve/Filter Curve/Transfer Function
Shows the response of the resonator at different frequencies. Response is greatest at/near the objects natural frequency.
The above plot:
Shows three periods of the wave
Acoustic Features of Stops
Silent Gap Noise burst at moment of release Rise time and fall time First formant frequency changes as a result of articulation and coarticulation
Fricatives
Sound Source: Spectral Characteristics of the airflow past the point of constriction -Manner of Production: produced through aperiodic random airflow -Acoustic Cues: 1. Duration: average 130 msec. 2. Rise Time: approx. 76 msec.
Nasal
Sounds produced with an open velopharyngeal port, hence nasal emission of the airstream.
Radiant Spectrum
Sounds tend to take on more energy with higher frequencies at the lips
Glide
Sounds with production requires the tongue to move quickly from one relatively open position to another in the vocal tract; one of the manners of consonant articulation. EX: /w/ or /j/
Glide
Sounds with production requires the tongue to move quickly from one relatively open position to another in the vocal tract; one of the manners of consonant articulation. EX: /w/ or /j/
Transient aperiodic waves
Source: rapid pressure change Some consonants, such as /p/ and /b/ (stops)
Continuous aperiodic waves
Source: turbulent flow through a supreglottal constriction (noise) Many consonants, such as /s/ and /f/ (fricatives)
Nearly Periodic Complex Waves
Source: vocal fold vibration All vowels, many consonants
Sound Propagation
Sound Propagation. Sound propagates through air as a longitudinal wave. The speed of sound is determined by the properties of the air, and not by the frequency or amplitude of the sound.
R-Colored Vowels
Some of the /r/ sound is heard in the following vowels
tensed vowels
Some vowels are longer or shorter than other vowels depending on context, but without context, some vowels are intrinsically longer, which ones?
If the vocal folds close slower, slope: energy:
Steeper slope Less energy (because you lose air faster, can't talk loud because of the lower amplitude)
The places of articulation of the nasal consonants are identical to those of the...
Stop Consonants
Supraglottal noise sources include...
Stop bursts and frication
What articulatory feature is cued by differences in the VOT (special parameter)?
Stop voicing Long VOT = [ -v ] Short VOT = [ +v ]
Consonants
Stop, fricatives, nasals, affricatives, liquids, glides
Suprasegmental Features of Speech
Stress, Intonation, and Duration
Articulators
Structures used to produce sounds of speech
Continuous Aperiodic Waves
Turbulent flow through a supraglottal noise Many consonants (/s/, /f/)
Glottal (Source) Spectrum
Successive harmonics lose amplitude at rate of 12 dB per octave. At the level of the larynx. Glottis/vocal folds --> source
true
T/F at any one instant in time, the vocal tract shows adjustments for more than one sound
false (transitions are not important for sibilant perception, but they are important for nonsibilants)
T/F formant transition location depend on the articulation, so the transitions are important perceptually for sibilants
true
T/F there is no fixed transition pattern for perception
What is an appropriate response to the following question? WHICH one of those green books is yours?
THIS is my green book
T or F: A phonemic change results from a complete assimilation
TRUE
T or F: Affricates are characterized by the acoustic features of both stops and fricatives...
TRUE
T or F: All fricatives are characterized by the use of an aperiodic source of sound
TRUE
T or F: Consonants can be produced using periodic sources of sound, aperiodic sources of sound, or a combination of both types of sources...
TRUE
T or F: The glide consonants of English are articulated with rapid articulatory movements that cause changes in formant frequencies.
TRUE
T or F: The second formant transition to or from neighboring vowels provides information about stop place of articulation...
TRUE
Glottal Volume Velocity
Takes a while to for vocal folds to open and once open air flows quickly through before snapping shut. ~Airflow through the glottis
Tense/Lax Duration
Tense- longer in duration Lax- shorter in duration
Where has it been suggested that the feedback system for speech is housed?
The CNS
Density
The amount of mass per unit volume
How does tongue position interact with F1 and F2?
Using [i], [a] (other a) and [u] for reference, approximate positions of F1 and F2 can be estimated for other vowels.
The open end (lips)
The antinode is what end of the tube?
Antinode
The area of largest amplitude of vibration of a sound wave. Maximum pressure.
Node
The area of smallest amplitude of vibration of a sound wave. Minimum pressure.
The narrow-band spectrogram below shows the following information regarding the utterance:
The articulators were changing position and the fundamental frequency was falling.
What is wavelength?
The distance a sound wave travels during a full cycle
In regards to limitations of the simple target theory, explain dynamic nature of vowel sounds.
The dynamic nature of vowel sounds in context (beginnings and endings of vowels reveal transitions typical for preceding and following consonants; yet these transitions do not throw off perception). in other words, what moment during a vowel production defines its identity? are we are responding to the center frequencies? where would we measure formant frequencies?
2000
The first antiformant for /n/ occurs at around ______ Hz
3000
The first antiformant for /ŋ/ occurs at around _______ Hz
Cutoff Frequencies
The frequency at which a resonant system is unresponsive. The point where the intensity of the transmission is reduced by one half (unresponsive).
How does aspiration happen?
The glottis is open at the moment of stop release, allowing the breath stream to flow freely into the upper vocal tract without phonation
Voice Onset Time (VOT)
The interval of time between the release of a stop consonant and the onset of voicing
The "length rule"
The longer the length of the tract, the lower all the average formants will be
What is fundamental frequency (f0)?
The lowest frequency contained in a complex periodic sound
spectral roll-off
The lowest harmonics have the higher amplitude, while the higher harmonics have the lower amplitude.
The first formant of a vowel is always created by:
The lowest resonance of any tube or combination of tubes used to model the vocal tract
What is acoustic reflex?
The middle ear provides protection in the form of the acoustic reflect (stapedius muscle can stiffen the movement of the stapes; the eustachian tube furthermore equalizes pressure). Reduction of about 10 dB below 1000 hz (if volume reaches 85-90 dB). stapedius muscle stiffens.
What are the spectrographic differences between front and back vowels?
The more front the vowel, the higher the second formant front vowels - higher F2 back vowels - lower F2 The closer F1 and F2 are to each other, the more back a vowel is
The closed end (glottis)
The node is what end of the tube?
off quarter
Using the ____ ________ wavelength relationship, we can determine the lowest resonance for the /s/
What is Rhythm?
The pattern of stress on a series of syllables / speaking w/o pauses in speech, occurring with some regularity
The place of fricatives
The presence of a dominant relatively high frequency spectral peak (sibilants); and non sibilants. They are differentiated because of the use of single or double contsrictions; also their placements differ between anterior and posterior. in volume, the sibilants are louder than non-sibilants.
Suprasegmental Features of Speech
The prosody/melodies of natural speech -Segments: individual sound categories of our language (phonemes) -Combine phonemes through blending and transitions to create meaningful utterances 1. Linguistic information and structure: -Question vs. Statement (intonation) -Noun vs. Verb (stress) 2. Psychosocial: intent and affect 3. Speech Rhythm: based on variations in stress locations and pauses
Vocal fundamental frequency?
The rate at which the vocal folds vibrate
Transducers
Used to measure and move energy. Transforms signals from one form of energy to another. -Microphone: A transducer that changes a sound signal into an electrical signal. (Or, converts air pressure into voltage variation) -Loudspeaker: A transducer that changes an electrical signal into a sound signal
Relationship of formants and vowels
The relationship among formants helps you distinguish one vowel from another. High front vowel /i/ has the most space between F1 and F2 and low back vowel /a/ has the least amount of space between F1 and F2.
How are Formants and Tract Length related?
The resonance of a tube are related to the length of that tube. Longer tubes respond to lower frequencies. Shorter tubes respond to higher frequencies.
Formants
The resonant frequencies of the vocal tract
Formants
The resonant frequencies of the vocal tract. The first 2-3 are most important and occur below 5000Hz. They're related to the volumes of the oral and pharyngeal spaces. Containers with large volumes respond to lower driving frequencies & vice versa.
Hooke's Law
The restoring force is proportional to the magnitude of displacement/distance
Nasal Murmer
The sound generated while the oral cavity is closed at the place of articulation and the nasal sound is being radiated exclusively from the nostrils. Happens during entire sound/spectrogram-entire area of nasal energy.
What does a spectrogram display?
The spectrum of frequencies in a sound or other signal as they vary with time
Fall Time
The speech with which the acoustic signal falls to minimum intensity for a syllable-final stop
Rise Time
The speed at which the maximum intensity of the acoustic signal is achieved for a syllable-initial stop
Utterance Level Declination in Fo
The tendency for Fo to decrease over the course of an utterance 1. Statement = Falling F0 (can have different pitch contours based on emotionality and/or intent) 2. Question = Falling + Rising Fo (doesn't always have to be rising, can override the intonation pattern) 3. Within utterance of Fo Contour Variations -Individual syllables may receive a slight upward inflection, whereas the overall pitch of the utterance decline or remain relatively flat -Typically associated with the length and complexity of the utterance as well as intent
Elasticity
The tendency of a volume of mass to return to its original volume following compression
Voice Onset Time
The time between the release of articulatory blockage to the beginning of vocal fold vibration of the following vowel; coordination between laryngeal and articulatory systems
Voice Onset Time
The time between the release of the plosive (burst) and the onset of the following vowel -Primary acoustic cue for perception of voicing 1. Voiceless: 40-80 msec. 2. Voiced: -10-20 msec. -Influenced by rate and phonetic context -Development: perception at birth, production 15-20 months
Frication
The turbulent noise of a sound. The hissing element of a speech sound, such as an affricate. A sibilant fricative noise is stronger than in non sibilants.
is not
There (IS or IS NOT?) a one-to-one relationship between the acoustic features and the perceived consonant?
1. tongue height, 2. tongue advancement
There are two ways to describe vocal tract articulatory posture by:
Acoustic-Auditory Target Models
There can be variations on the articulation of sounds, like vowel formants, but the listener can still recognize the sound accurately
In regards to limitations of the simple target theory, explain interpersonal variability.
There is interpersonal variability in vocal tract size and shape (male/female/children's vocal tracts are physically different which leads to different formant frequencies for vowels). Often all formant frequencies are shifted because of a vocal tract size difference; also, proportional differences are possible, depending on the relative size of the oral cavity and the length of the pharyngeal cavity.
How does pitch relate to frequency?
They are the same.
Voicing Cues for Liquids and Glides
They both have f0 (because both are voiced)
F1 & F2 Plots for men, women, and children
They get bigger as frequency gets higher. (Men->women->child)
Are semi-vowels considered consonants or vowels?
They have some aspect of vowels, but they are considered consonants.
While producing a vowel, if a speaker raises his or her fundamental frequency, what happens to the formant frequencies?
They stay the same.
If two spectra with equally-spaced harmonics were measured from two different vowels, and the first spectrum tilted down less than the second one, which one would have the higher pitch?
They would have the same pitch
What is an appropriate response to the following question? Is that your RED book?
This is my GREEN book
o Nearly periodic complex waves o Continuous aperiodic waves o Transient aperiodic waves
Three sources of speech sounds
Waveform axis
Time (x) by Amplitude (y)
Voice Onset Time
Time between burst and when vocal folds start vibrating for vowel following stop. 4 Possible VOT values: 1) Prevoicing VOT lead-when voicing occurs before the release burst 2) Simultaneous voicing-VOT and release burst occur at the same time. (For voiced stops) 3) VOT with short lag-onset and vibration occurs just after release burst. (For voiced stops). 4) VOT with long lag-voiceless stops. Bigger VOT.
Voice Onset Time
Time from release of stop closure (marked by the burst) to onset of voicing Burst + frication + aspiration Longer for voiceless stops
Why is the dBA measurement used?
To take the fact that we have different sensitivities at different frequencies into account
/g/
Tongue dorsum mid back (Formants), moves to high back, stops air (Stop gap). Build up air pressure, explode (Burst), short air flow before tongue gets far from dorsum (frication), air continues to flow (aspiration). Tongue dorsum moves to low back (Transition), and vowel space resonates source (Formants).
/n/
Tongue front high front (Coarticulation), tongue root into pharynx (Vowel formants). Tongue apex moves to alveolar ridge stopping oral air flow, tongue dorsum moves a bit forward (Transition). Velum opens and resonates source (Damping of signal). Tongue dorsum moves back and root moves into pharynx (Transition). Vowel space resonates source (Formants).
[i]
Tongue high - low first formant tongue front - high second formant
/s/
Tongue high in front, low in back (Formants). Tongue apex rises to constrict air at alveolar ridge (Noise). Tongue apex lowers a little (Transition) to high front. Vowel space resonates source (Formants).
/ʧ/
Tongue high in front, low in back (Formants). Tongue apex rises to stop air at alveolar ridge (Stop gap). Tongue taps a bit releasing air (Multiple Bursts). Apex drops and constricts air (space resonates Frication Noise). Tongue apex barely moves down (not much Transition) to high front. Vowel space resonates source (Formants)
How do changes in the Vocal Tract affect vowels?
Tongue position changes for each vowel and filter characteristics change accordingly. Change vocal tract, that changes filter, which changes the output.
Tongue height and tongue advancement
Traditional description of vowel formation
Transfer Function
Transfers energy to frequencies. Another name for resonance
A transducer is a device that:
Transforms energy from one form to another, concerts electrical vibration to air pressure vibration, and converts air pressure vibration to electrical vibration.
• Source=rapid pressure change; some consonants such as /p/, /b/; stop consonants
Transient aperiodic waves source
Vocal Folds don't have to close completely to vibrate
True
True or false: vowels are the loudest sounds with longest durations
True. vowels are the loudest sounds with the longest durations. vowels function as nuclei of syllables. phonation energy comes out easy and efficiently for vowels.
Diameter of Tubes
Tubes may vary in diameter along their length - tube resonances change (F1-F3) - some frequencies boost or reduce
The resonance of a tube is based on one-quarter, three-quarter, five-quarter (and further odd-numbered) wavelength resonances for which types of tubes?
Tubes open at one end and closed at the other
3 phases of stop articulation
Two occlusions Intraoral pressure build-up Release of pent-up air pressure in oral cavity
Incoherent Sounds
Two or more independent sources: -Blenders, lawnmowers, window air conditioners, cars, etc... -Most common sound sources
Liquid
Two semi vowels produced with relatively prominent sonority and with some degree of lateral emission of air; one of the consonant manners of articulation. Also called approximants or laterals.
Liquid
Two semi-vowels produced with relatively prominent sonority and with some degree of lateral emission of air; one of the consonant manners of articulation. Also called approximants or laterals.
Underlying Waveforms
Two sine waves extracted from complex wave
Combined tubes: /a/
Two tubes closed at one end and open at the other
o Air particles vibrate most effectively at the open end of the tube, and least effectively at the closed end of the tube
Uniform tube model- at what ends do air particles vibrate most and least effectively?
o The open end of the tube (lips) will have a velocity maximum (pressure minimum) o The closed end of the tube (glottis) will have a velocity minimum (pressure maximum)
Uniform tube model- what happens to velocity and pressure at the open end (lips) and closed end (glottis)
Decibel
Unit for comparing the intensity of two different sounds; not a unit of absolute measurement. -Often forced to be an absolute by comparing to (a barely audible) .00002 Pa -This is dB SPL -Logarithmic scale, ratio measure.
What are some methods used to interfere with articulatory movements?
Use of: bite blocks to interfere with jaw movements, metal plates to interfere with labial closure, palatal prostheses to alter alveolar ridge
Place Cues for Glides
Usually the first two formants are sufficient to differentiate between /w/ and /j/ (/j/ has a high F2, similar to the one characteristic of /i/, while /w/ has a low F2 like /u/)
Mass Model
VF are compressed and spring back
Special name of period from stop release burst to the onset of voicing for a following vowel?
VOT - Voice onset time
What is simultaneous voicing?
VOT = Zero, Voiced Stops in English (B, D, and G)
If onset precedes stop release...
VOT is negative
If onset of phonation follows stop release...
VOT is positive
What is prevoicing VOT lead?
VOT measure is negative; vocal folds are vibrating before articulatory release
auditory cues
VOT, F1 cutback
Stops: VOT and Voicing
VOT- time interval from release of articulatory constriction to start of vocal fold vibration for the following vowel VOT represents articulator- laryngeal coordination
High, front, tense, unrounded
VPM for /i/
High, back, tense, rounded
VPM for /u/
Low, front, lax, unrounded
VPM for /æ/
Low, back, tense, unrounded
VPM for /ɑ/
Low-mid, front, lax, unrounded
VPM for /ɛ/
High, front, lax, unrounded
VPM for /ɪ/
High, back, lax, rounded
VPM for /ʊ/
Compact spectrum with a peak about 1.5-2 KHz?
Velar
Which fricative has a nearly flat spectra?
Velar fricatives
What stops have a converging F2 (3500Hz) abd F3
Velar stops (F3 is velar pinch)
Nasals
Velum is lowered, nasal cavity is couple to vocal tract
Voicing Bars
Vertical bars on spectrogram that represent voicing
glottal pulses
Vertical lines in the voiced section that go up on the spectrogram, which are the fundamental frequency that you can measure in how many cycles per second
How to produce glides
Vf vibrations, moving from one vocal tract position to another.
Source
Vibrating object, impulse, or other means of causing the initial condensation or rarefaction
Compare wide vs narrow-band
Wide: dark, easy to find bands of energy. Narrow: more faded, shows component harmonics, time resolution isn't good
Bandwidth
Width of dark frequency area on spectrogram
Nearly Periodic Complex Waves
Vocal Fold Vibration all vowels, many consonants
Stop Gaps
Vocal Folds are adbuct (open) • Total or near‐total absence of energy Voiceless stops: • complete silence Voiced stops: (shorter gap) • Varying amount of silence (depending on transglottal flow) • Voicing is low amplitude due to damping • Seen as "voice bar" on spectrogram
Which anatomical structure do we ascribe fundamental frequency?
Vocal folds
Fricative /h/
Vocal folds are closed together such that sufficient airflow will generate an aperiodic noise. Similar to state of glottis for whispering. Spectrum is highly dependent on the following vowel.
Glottal Area Function
Vocal folds are typically addicted for about 60% of each cycle. A _____ shows the state of the glottis from cycle to cycle. Opening, open, closing, closed. Not seeing acoustic energy but opening of glottis over time. Closing happens longer. Use this to see atypical open to close ratio. Can find frequency of the vocal folds.
How to produce nasals
Vocal folds vibrate and vp port must be open. Obstruction in the oral cavity as well. Oral and nasal cavity resonance, only exits through nasal cavity. /m/-obstruction at lips /n/ obstruction at alveolar ridge /ng/-obstruction at velum
Vowels
Vocal sound produced by relatively free passage of the air-stream through the larynx and oral cavity; the nucleus of a syllable; voiced, greater energy and less meaning, always voiced
Resonant Spextrum
Vocal tract configuration that allows for resonance. NOT SOUND
Filter Characteristics
Vocal tract filter is frequency dependent - Allows certain frequencies to pass through the filter with greater amplitude than other frequencies - Frequencies can get intensified
voiceless fricatives
Vocal tract source may be the sole sound source.
What is VOT?
Voice Onset Time - the delay in onset of voicing (relevant in aspiration)
Voice onset time in terms of consonants
Voice: less than 20 ms Voiceless: more than 25 ms The time between the release of consonant and onset of voice
If VOT is less than 25 ms, the plosive is...
Voiced
Fricatives
Voiced: /ð/ /v/ /ʒ/ /z/ Voiceless: /θ/ /f/ /ʃ/ /s/ weak-----strong
If VOT is 25 ms or greater, the plosive is...
Voiceless
Which Sounds have greater intramural pressure?
Voiceless greater than voiced
The job of a microphone is to convert air pressure variation into:
Voltage variation
Dialect cues
Vowel Duration and diphthongization
Formant Transitions
Vowel production begins while the stop is being released. Superimposed on transient noise. About 50ms. Usually easier to detect for voiced stops because of the continuity of voicing energy between the stop and vowel. The slope depends on the place of articulation and vocal tract positioning for the following sound.
vowels form the approximate shape; tongue height and advancement are not completely accurate
Vowel quadrilateral (what form its approximate shape and what are not completely accurate?)
Diphthongs
Vowels that change resonance characteristics during production
What are the only sounds that can occur alone by themselves and still be meaningful (syllabic) in some contexts?
Vowels. Vowels in isolation can occur; hard to do with consonants.
Sources of energy
Vowels: vocal tract is relatively open. Periodic. Sound source is the vocal folds. Consonants: voiceless-if aperiodic-sound source is vocal tract. Voiced-if periodic and aperiodic-vocal folds and vocal tract.
poor
WITHIN categories of sounds, discrimination is good or poor?
Coordinating & Sequencing Articulator Movements
Waveform and kinematic data (x-ray microbeam) segmented into "units," demonstrating underlying kinematic activity of just a single point on the tongue, and how the movement does not closely correspond to the segmentation of the waveform
Waveform vs Wideband Spectogram
Waveform: time and amplitude Wideband spectogram: time and frequency
How are vibrations in the air transmitted to the cochlea?
Waves in air --> oscillation of bones --> waves in fluid --> neural impulses to the brain
1. nearly periodic complex waves, 2. continuous aperiodic waves, 3. transient aperiodic waves
What are the three sources of speech sounds?
Wide bandwidth
What bandwidth is formant structure?
Narrow
What bandwidth is harmonic structure
Where the different harmonics and formants are
What can a spectrogram show you?
-Provides detailed characteristics of the harmonics or formants -Provides spectral picture of the acoustic wave
What does a spectrogram do?
lower
What does lip rounding do to all the formants? lower or higher them? It makes the vocal tract longer
Strain
What happens when vocal folds are stretched. Produces a length change of tissue in direction of the force
The first
What harmonic has the lowest frequency?
ending point
What is an off glide
starting point
What is an onglide
The glottis (node)
What is the closed end of the tube
The lips (antinode)
What is the open end of the tube
Front to backness (is the tongue in front, middle, back of oral cavity) (horizontal on quadrilateral)
What is tongue advancement?
If vowels are open (low) or high (closed) (vertical on quadrilateral)
What is tongue height?
Wide bandwidth
With what bandwidth do you see a number of harmonics?
Narrow bandwidth
With what bandwidth do you see one harmonic?
Give a second example of partial assimilation.
When a front vowel follows /k/ (as in "key"), the tongue is usually farther forward on the palate than it would be when a back vowel follows /k/ (as in caught)
Anticipatory
When a sound is influenced by a following sound
Carry-Over
When a sound is influenced by a preceding sound
What's an example of carry-over assimilation?
When a voiceless sound following a voiceless sound remains voiceless ("cats"), but a voiceless sound preceded by a voiced sound becomes voiced ("dogs")
Lombard Effect
When in a noisy setting, people will speak louder so they can hear themselves and -as a result- can be heard by others. ~is used to treat Parkinson's speech
What is categorical perception?
When listeners are: - able to categorise the stimuli consistently - unable to discriminate between stimuli in the same category
Retroflex
When the tip of the tongue is curled up and back
Shunt/side-branch resonator
When the vp port is opened, the nasal cavities are acoustically coupled to the rest of the tract. Energy in the oral cavity is at a dead end. Longest for /m/ first antiformant~1000Hz, shorter for /n/ ~2000Hz, shortest for /ng/ ~3000-5000Hz. The oral cavity becomes this. Contributes to antiformants.
Does tube length affect formant/resonant frequencies?
When tube length changes, formant frequency changes (frequency at the peaks)
lips- the open end of the tube
Which part of the tube will have a velocity maximum (pressure minimum)?
glottis- the closed end of the tube
Which part of the tube will have a velocity minimum (pressure maximum)?
What are the frequencies contained in a complex periodic wave?
Whole-number multiples of a lowest fundamental frequency.
Measuring Amplitude
Why? -Equipment: Avoid distortion and to calibrate. -Humans: Hearing damage, comfort, and diagnosis. How? -Static Pressure -Dynamic Pressure -Microphone & Meter
Spectrogram: Filters Wide-Band & Narrow-Band
Wide-band/broad-band filter: Filters 300-500Hz. Energy from several adjacent harmonics are added together. F0 = count the number of individual vertical lines per unit time. F1 & F2 visible as dark concentrations of energy. Shows for short amount of time = 3-5ms. Can determine F0. Excellent time resolution Narrow-band: Filter 45-50Hz = resolution is much finer; can see harmonic components active/amplified. Cannot isolate each cycle of vibration. Identifies individual harmonics. A little longer = 20ms. Excellent harmonic resolution.
Are vowels carriers of prosody?
Yes, (f0-frequency, intensity, duration)
Can formants occur simultaneously? Do they mix well at the same points?
Yes, all formants occur simultaneously, while a number of their (p) and (v) points overlap with their own kind respectively. However, they do not mix well at the same point which is the reason why formants occur only as odd multiples in the acoustic resonator of the vocal tract; most other resonators occur in simple multiples.
Are vowels necessary for consonants?
Yes, consonants are superimposed onto vowels at the beginning or ending of a vowel
Are resonant poles different than harmonics?
Yes, they go up in odd multiples, whereas harmonics/overtones are simple multiples.
Is resonance present regardless if you have or voice or whisper?
Yes, voice just makes more energy.
Are vowels always voiced?
Yes; non-nasal (unless nasal assimilation)
Are vowels dominant speech sounds perceptually and physically/physiology?
Yes; they contain much sound energy.
For [l] and [r] what is formant configuration
[l] - f2 is middle; f3 is high [r] - f2 is middle; f3 dips
affricate
___ acoustic features: rise time, duration of frication, relative amplitude in third formant region, stop gap
nasal
___ acoustics: many spectral peaks, but most have low amplitude. antiformants. nasal formant. highly damped formants
fricative
___ articulation: narrow constriction in the vocal tract, when air flow rate is high, turbulence results, perceived as turbulent noise, relatively long duration compared to stops
voiced
___ stop shows vertical striations during the period of closure and remnants of formant frequencies
voiceless
___ stops (highest intraoral pressure) follow the stop release with frication (aspiration)
coupling
______ of the oral and nasal cavities also causes antiresonances or antiformants
energy
______ will be longer in duration because fricatives are continuous sounds.
antiformants
______can occur when the vocal tract is bifurcates or radically constricted
nasalance
a ratio of the nasal energy to the overall combined nasal and oral energy as measured from the acoustic pressure waveform
What are sibilants characterized as
a "distintive" hissing noise
Stress serves as what?
a "pointer" telling the listener which information is most important in an utterance
low
a ___ F3 is a distinctive property of rhotic sounds - both consonant /r/ and r-colored central vowels
aspiration (coarticulation effect)
a brief hiss that occurs sometimes after voiceless stops , never after voiced stops. this is likely a function of function of transition of vocal folds from no voicing to voicing (vocal folds moving back to phonation position)
vowels
a category of speech sounds produced with unobstructed vocal tract, usually produced with vocal excitation but not always (whisper), excitation due to glottal vibrations
Why is feedback so important in children?
a child trying to say a certain word tries it out, senses articulator movements and positions, gets tactile and acoustic results, and compares the output of the word with the stored sound pattern of the adult production
resonance
a cold will disrupt (blank) and soft palate problems could affect it.
cleft palate
a craniofacial abnormality that arises when the palatine bone fail to fuse completely during gestation -results in abnormal airflow and resonances--hypernasalization -bilateral most traumatic- requires the greatest surgical repair --need multiple surgeries because as the child grows the seam falls apart --have to wait at least 3 months to let structures settle
Frequency range in a noise burst release
a cue to place of articulation
the nasal murmur is due to
a formant resonance of the vocal tract from larynx to nostrils
How is a higher F0 for the heavily stressed syllable attained?
a function of increased vocal fold tension, increased expiratory effort leads to increased subglottal pressure and then to extra effort in the larynx
Opening the VP port creates:
a large resonating cavity resonator
effector
a level of motor programs for enactment
executive
a level of motor programs for information processing
voiceless stops typically have
a long voice oncset time and a strong release burst
mandible
a major regulator of oral cavity opening during speech
Diphthongs are characterized by what? and what are the 3 stages?
a more or less gradual change in resonance due to fairly slow changes in tongue position (or shape) and mouth opening (rounding). there appear to be 3 stages in a diphthong on-glide: relatively steady period during which there is no change glide: a gliding/sliding pattern of formants f1, 2 off glide: a relatively steady state during which there is no change; the off glide is shorter in duration than the on glide.
Falling intonation is seen as what?
a natural product of running speech
What is aspiration?
a period of voicelessness after stop release, seen in the three English voiceless stops
Feedback seems nonessential for who?
adolescents and adult speakers, speech has already fully developed
What is a servomechanism?
a self-regulation machine where the device output is fed back into the system
vocalic
a set of landmark that has frequency and amplitude of f1
voiced stops typically have
a short voice onset time and a weak release burst
Fricative
a sound source is created by a severe constrictions within the vocal tract, rather than just at one end of the vocal tract.
nearly periodic complex waves
a source of speech sounds that has focal fold vibration--all vowels and many consonants
transient aperiodic waves
a source of speech sounds that has rapid pressure change--some consonants such as stops like /p/, /b/
continuous aperiodic waves
a source of speech sounds that has turbulent flow through a supraglottal constriction (noise)--many consonants, such as fricatives like /s/, /f/
What is an affricate?
a stop with a fricative release through a narrow constriction
unit of analysis
a theoretical issue composed of sound, syllable, word, and gesture
dynamic systems
a theoretical issue: explain the mechanism that constrains the potentially infinite number of degrees of freedom of speech production system to a few useful degrees of freedom.
output targets
a theoretical issue: perhaps the CNS has some goal or target output for which it controls muscle activity during speech
coarticulation
a theoretical issue: the adjustment of articulator movements to target more than one speech sound simultaneously. - temporal coordination of multiple articulators
motor programs
a theoretical issue: a pre-structured set of central commands capable of carrying out a movement. sensory feedback is an integral part
Oral release of a stop teilds
a transient noise source/ release-burst
Vowels
a vocal sound produced by relatively free passage of the air stream through the larynx and oral cavity; these have an open vocal tract; the nucleus of a syllable (unless word has syllabic consonant)
SVS coarticulation (silence)
a vowel in isolation shows this?
What does pitch rise at the end of an utterance signal?
a yes/no question
What sound is considered a low, back vowel? What are the articulatory configurations associated with this vowel?
a, High F1, Low F2
What sound is considered to a high, back rounded vowel? What are the articulatory configurations associated with this vowel?
a, High F1, Low F2
The loudest vowel that we have is
a, c (backwards c)
What are the cardinal vowels?
a, i, u
What three point vowels assist in understanding the extreme positions in articulatory and vowel formant frequency data
a, u, i
VOT for voiceless stops
abducted, not vibrating during voiceless stop gap adduct and build up pressure, then release it it takes at least 20 msec to do this before vibration
What is the dB SPL of .1 Pa?
about 74 dB SPL
What are we trying to explain in speech perception theories?
about speech perception in general, about the perception of specific sounds (vowels, consonants, etc), and about speech in naturally occurring speech or contrived controlled situations.
What is the frequency for most of the energy in /s/?
above 4000 Hz
suprasegmentals
above the segments (prosody)
harmonic doubling
abrupt appearance of a harmonic series at 1/2 F0 (in between the harmonics)
What do the walls of the nasal cavity do to sound?
absorb sound
Venturi Effect
acceleration of air through a narrow channel
which cranial nerve enervates the sternocleidomastoid muscle
accessory
assimilation
acoustic affect of coarticulation, change in a speech sound's acoustic feature because of context dogs s=/z/
Nasal phonemes have a strong resonance around 200-300 hz called a
acoustic murmer
duration
acoustic result of tense-law dimension
What do active speech perception theories have in common?
active theories have in common that they somehow use info about features of speech-production process in explaining speech perception.
During nasal sounds, the Palatoglossus muscle may _____
actively lower velum
When two waveforms are played at the same time they create a new waveform by
adding the waveforms
roundness
adds to length of vocal tract
formant transitions may vary on....
adjacent vowels
the typical falling intonation contour near the end of a declarative statement - results from the economy of expiratory effort - is due to falling subglottal pressure as air is exhaled - may be accompanied by a switch to the pulse register - is described by "breath group" theory - all of the above
all of the above
Formant 3 has a lot more (p) and (v) points that offer more opportunities to manipulate them which allows what?
allows for more opportunity to make those effects.
What is assimilation?
alteration in the movement of a single articulator
higher (4 kHz - 12 kHz compared to 3 kHz)
alveolar sibilants have ___ frequency energy range than palatal sibilants. but spectral irregularities aren't important in perception
the affricates in english consist of a(n)
alveolar stop followed by a short palatal fricative
What places of articulation make up sililants
alveolars and postalveolars
place cues for fricatives (amplitude)
amplitude of a fricative relative to the vowel is louder than other fricatives (stridents such as "sh, s, z, zsh"
What are stop-plosives?
an abrupt sudden release of air.
What kind of movement is VOT?
an acoustic movement that describes coordination between laryngeal and articulatory systems
biphonation
another second set of a F0 and its harmonics, having a double series
some fricatives have just
aperiodic, subglottal noise with no periodic source
maximum sensitivity of the human ear for sound
approximately 3,000 to 4,000 Hz
Axial CAT Scan
approximately the level of the 4th cervical vertebrae; bone (high density) is white; soft tissue (low density) is shades of gray; air is black
What are the acoustic features of affricates?
are a combo of stop and fricative features 1. silent gap, with and without phonation 2. release burst as in stops with extended duration of aperiodic frication noise (as seen in fricatives)
stop consonants
are characterized by a closure in the vocal tract is released rapidly
The sound at the lips contains the same harmonics as the glottal source, but...
are shaped by the transfer function and includes harmonics with a modified amplitude
What is assimilation of manner of production?
articulators are placed in a different location resulting in a different manner of sound
offglide
articulatory ending point of the diphthong
formant transitions
articulatory movement from stop to vowel entails a formant movement, important for perception, 50ms in duration *as vocal tract changes, formant freq change
onglide
articulatory starting point of the diphthong
F1 increase and F2 decreases/relationship to F2 is unclear
as the front vowels become more open (low) then what?
What describes how speech sounds become like neighboring sounds
assimilation
What are the two context effects?
assimilation and coarticulation
The movement of one articulator is characteristic of ____, whereas the simultaneous movement of two articulators is characteristic of _____.
assimilation; coarticulation
place of articulation
associated with formant transitions (especially F2) but can be difficult to see on spectrogram
nasal murmer
associated with lower amplitudes and resonant frequencies compared to surrounding vowels
voiceless unaspirated
at release
Where does fricative noise originate?
at the articulatory constriction
the voice bar in voiced stops arises because
at the time of stop closure; the pressure in the vocal tract is 0 cm H2O, so phonation can continue until the pressure builds up
What is central neural/internal feedback?
audition and action are external feedback systems, their information is delivered to external receptors
the contraction of the posterior crico-arytenoid muscles has the effect of drawing the vocalic muscles toward/away from the midline
away
average VOT for initial stops (voiced)
b 1msec d 5msec g 21 msec
evidence that categorical perception is innate
babies and monkeys have it
glides w and j are distinguished
based on F2, j has high F2 and w has low F2
liquids r and l are distinguished
based on F3, r has lower F3 and l has higher F3
When given a phoneme (i.e. /b/ or /p/)
be able to specify place, voice, and manner. LOOK AT CONSONANT CHART BELOW!
When given a phoneme (i.e. /p/, /t/, /k/)
be able to specify the associated acoustic cues.
Why are semivowels and nasals resonants?
because they are characterized by a relatively free flow of air and thus formant structure (nonresonants have little or no formant structure)
VOT (begins and ends)
begins at the release of articulatory constriction ends at the start of vocal fold vibration no VOT when stop is in syllable final post vocalic position (VC): so we use the preceding vowel duration instead
tri-top theory
berkowitz audition; tactile; visual (tri) top-down using theory to drive clinical practice if theory can't explain what is seen in clinic, its not perfect special populations are special
100 - 8,000 Hz
best part of curve for speech and hearing
what are the points of constriction for /s/ and /z/
between the alveolar ridge and the tongue and he opening between the upper and lower incisors
Lateral X-ray Image
bone (high density) is white; soft tissue (low density) is gray; air is black; there is a lack of depth in the image, all structures appear on the same plane
What does the end of a declarative sentence mean?
both a decrease in F0 and intensity - heavy sigh, pitch falls as lung volume decreases
The eardrum is a ______ band resonator
broadband resonator. during low freq it moves entirely; only smaller sections vibrate with higher freq.
vocal tract is a
broadly tuned filtered resonator that is highly damped
which of the following is not true? nasal are charactereized by ____ -strong, low frequency nasal murmur (formant) of the nasal passages - build-up of pressure and release burst - complete blockage of the oral cavity - weak formants (anti formants) due to the absorption of sound by the oral and nasal passages - formant transitions as cues for place of articulation - all of the above
build-up of pressure and release burst
vocal fold vibration
builds air pressure up below them to blow them apart, come back together because of the Bernoulli Effect and elastic recoil
two major tongue constrictions
bunched (tongue bunched up towards hard palate and retroflex (tongue curling up to palate) ALSO need pharyngeal constriction
final stops may or may not have
burst
burst spectra for alveolar
burst has most of the energy around 3000-4000HZ for male speaker and an uptilt
burst spectra for labial
burst has most of the energy under 600HZ and it has an overall downtilt
burst spectra for velar
burst is linked to the F2 of the following vowel and is usually a few hundred hertz higher than the F2 of the following vowel (tend to have narrow spectral peaks)
what is the analysis by synthesis theory
by Kenneth Stevens. sensory only. s-s. in speech perception, auditory patterns that are recognized are compared/matched with self-generated auditory models (of how the listener would produce these same acoustic patterns.) hears "beatcha" and applies their own phonological knowledge to infer that the speaker means "beat you". in contrast, the motor theory of speech perception explains that the patterns are motor in nature. in this theory the patterns used in perception are entirely acoustic. this theory was no longer used by Stevens in favor of the later development of his quantal theory.
How is restricted airflow in the nonresonants created?
by articulators forming constrictions in the vocal tract, thus aperiodic noise is created as the airflow passes through
How has proprioception for speech been explored?
by interfering with articulator positions and movements and studying compensatory strategies
Fn=
c/ λ
consonant vowel formant transitions
can be hard to see depending on vowel, and rate of transition
turbulence
can occur alone or in addition to vocal fold vibration
With regard to speech, define categorical perception.
categorical perception is defined as the phenomenon that we can discriminate speech sound differences only as well as we can identify them (= give them a speech sound label). the prediction of abrupt/categorical perception shifts was tested through use of "pattern playback stimuli"; stimuli could be presented this way with one characteristic systematically varying across dimension.
consonants
category of speech sounds usually produced with vocal tract obstruction, produced with or without excitation
the release burst stops is filtered by the ___
cavity anterior to the closure point
What is the different between central neural/internal feedback and proprioceptive feedback?
central feedback is generated by muscle activity, but does not tell about muscle activity itself, as proprioceptive feedback does
the non-mucular part of the diaphragm is called the
central tendon
Vocal folds..
change aerodynamic to acoustic energy
What is complete assimilation?
change from an allophone of one phoneme to an allophone of another phoneme
assimilation
change in articulation of speech sound that makes it more similar to articulation of neighboring sound
most important cue for the perception of stress/intonation
changes in frequency
VOT very well studied
changes with prosody development: VOT shorter in children Speech disorders: VOT often affected, often continuous voicing
independence of source and filter
characteristics of source and filter can vary independently without affecting the characteristics of the other
stops
characterized by a complete closure somewhere in the vocal tract /p,b,t,d,k,g/ relatively short in duration, low intensity, wide frequency rage (high, mid, or low) 2 acoustical cues: gap and release
Vowels have more sound ___________ than consonants
energy
place cues for stops 1
energy peak in the spectrum of the burst applies mostly to non-final stops
actual glottal volume velocity
energy roll-off is a function of the speed and completeness of vocal fold closure. In general, the low frequency harmonics dominate.
Fletcher & Munson
equal loudness in phons (graph) two dots on same line are equally as loud highest line = dangerously loud (IRB issues) curves represent equivalent loudness level (intensity as perceived by the average ear) always have frequency, intensity, phons
place cues for fricatives (formant transitions)
especially F2are especially relevant for distinguishing labiodental fricatives than dental ones. (F2 is generally starts lower for labiodental fricatives than for dental ones
shape
every vowel has a different (blank)
aerodynamic targets
evidence from studies of individuals with velopharyngeal incompetence or hearing impairment. aerodynamic stability is an important regulating factor
perturbation studies
examine effect of disturbance to speech production system. It can be anticipated/unanticipated, transient/static, biomechanical/acoustic/aerodynamic alteration
speech science
explore timing and contact patterns for sounds -not amount of pressure -not sounds with no contact
The manner of fricatives is the presences of a relatively _____ period of noise.
extended period of noise (frication).
1. genioglossus, 2. styloglossus, 3. palatoglossus, 4. hypoglossus
extrinsic muscles of the tongue (4)
low amplitude fricatives
f v th d h
nonsibilants/non stridents
f, th, v
Weak Fricatives
f, v, θ, ð -All have medium-gray to nearly invisible random energy -Place of articulation determines frequency range -Voicing determines the duration and existence of voice bars or striations
diphthongs acoustic properties
f1 and f2 are touching but near end start to separate
f2 decreases in frequency as
f1 increases
the glides /w/ and /j/ differ on the nasis of ___, which is high in ___ and low in___.
f2; /j/; /w/
different types of intonation
falling-statement, declarative sentence rising- yes/no question level- unfinished sentence, to be continued
t/f sibilant fricatives have less energy than non-sibliant fricatives
false
t/f the left and right lung have the same number of lobes
false
t/f the trigeminal cranial nerve has only sensory branches
false
anticipatory
features a sound appear earlier than the sound, forward coarticulation /aem/ vowel is nasal
retentive
features of a sound carry over to the next one, backward coarticulation /no/ vowel is nasal
A wide-band spectrogram is best for showing the...
filter characteristics of the speech
the vocal tract acts like a
filter reshaping the spectrum of the source
in what part of a syllable is the /l/ considered dark
final
the many-to-one configuration of neurons to muscle affords us
fine motor control
transitions
following the steady state, formant frequencies change direction according to the following sound
p b
for /__ __/ F2 and F3 rise slightly
k g
for /__ __/ F2 and F3 separate steeply and rapidly
t d
for /__ __/ F2 falls and F3 rises slightly
Give an example of short lag.
for /b/, vocal folds begin adduction at the same time of labial occlusion - adduction continues during the hold phase, and at the same time of the release phase, the folds are still adducted, ready for phonation - for /b/, there is a short (0-10 ms) positive VOT value
When is fricative energy very low?
for /f/, /b/ and voiced and voiceless /th/ due to the lack of resonating cavity anterior to point of constriction
how loud a sound feels depends on
how much intensity it has (dB SPL) on the y axis what frequency it is (Hz) on the x axis
Give an example of long lag.
for /p/, vocal fold begin adduction at some point during the hold phase, so the glottis is still open at the moment of release - VF adduction is not complete until sometime after the stop release, during articulation of the following vowel - for /p/, there is a long (between 60-70 ms) positive VOT value
When is fricative energy high?
for /s/ and /z/, there is a high frequency, high energy noise
closed end of the tube (glottis)
for our uniform tube model, air particles vibrate least effectively where?
open end of the tube (lips)
for our uniform tube model, air particles vibrate most effectively where?
Why don't you hear aspiration in voiced stops?
for the voiced stops, /b,d,g/, the glottis is closed at the moment of stop release, forcing the breath stream to set the vocal folds into vibration, sending vibrating air (phonation) into the upper vocal tract
/i ɪ e æ ɛ/.
for these vowels, many people use the genioglossus to lower the tongue, others lower the jaw
"schwa" (upside down e)
for this neutral vowel sound we model the vocal tract like a uniform tube
/i/
for this sound, the genioglossus pulls the back and root of the tongue toward the front. becasue the tongue is a muscular hydrostat
/u/
for this sound, the styloglossus pulls the tongue up and back. The hyoglossus pulls the tongue down and back. So the tongue squishes up and back.
short
for voiceless fricatives the front cavity is so ______ it has little filtering effect on the noise energy
concentrations
for ð, there are other __________ of energy at 1500, 2500, and 4000 Hz.
strain gauge
force of lip movement
transitions
formant ______ occur much as they do for other consonants -when oral articulation changes, bends in the formant patterns are seen
children's speech
formant frequency will lower with age due to the changing length of the vocal tract, most dramatic change at puberty, productions become faster and more reliable, nasalization is reduced.
major cue to tell you which diphthong was produced
formant glide, especially the rate of change
Whare are glides characterized by
formant structures similar to vowels
shorter
formant transitions are often _____ for /l/ than for /r/
/b/, /d/, /g/, /p/, /t/, /k/
formant transitions in the high and low vowels preceding and following what? (6)
1. silence, 2. burst noise, 3. voice onset time 4. post-stop vowel formant transition
four acoustic cues
What re the cues for place on stop plosives?
frequency locus of the burst (a brief moment during aspiration) best visible when initial plosive is VL. bust is high- alveolar placement (t, d) burst is middle/spread out- velar (k,g) burst is low- bilabial (p,b)
which is stronger frication or non-sibilants
frication (noise)
Voiceless fricatives
frication noise is sole source
on a spectrum displayed, a voiceless fricative shows
frication noise with a frequency range comparable to that of its voice cognate
initial affricative may look like
fricative
____ have little to no formant structure and often are the result of turbulent airflow through constrictions within the vocal tract
fricatives
The nonresonant consonants of English are the
fricatives, the affricates, and the stops.
consonants range
from vowel-like sounds with relatively open vocal tract EX: glides [w][j] to sounds produced with severe vocal tract constriction [s][t][f]
F2
front vowels have high F2 and back vowels have low F2
the precentral gyrus (motor strip) is located on the - lobe
frontal
tongue muscles
functionally: apex, lamina, dorsum, root. intrinsic and extrinsis muscles
What is the vibrating frequency of the vocal folds
fundamental frequency
intonation
fundamental frequency, across the phase/sentence, tendency for declination, rises for yes/no questions, emotions
Wideband (or Broadband)
generates a display of formants, vertical striations indicate intermittent measurement, broad bands of energy show formants, center of each band of energy is the estimated frequency of the formant, black spaces indicate silence , filters set at 300-500hz. DOES NOT resolve energy to show individual harmonics, Obtains information about timing of changes in vocal tract (VOT, center frequency, multiple harmonics adjacent to one another)
Narrowband
generates a display of harmonics, narrow, horizontal bands represent harmonics of glottal source, darker bands represent harmonics closest to the peaks of resonance in the vocal tract, blank spaces indicate aperiodicity, filters set between 30 and 50 hz. Used to measure fundamental frequency and intonation, NOT used for making temporal measurements *duration and VOT)
a common name for consonants (j) as in you and (w) as in we is...
glide
/w/ and /j/ are
glides
F2 transitions
glides
semi-vowels
glides /j w/ function as onglides to vowels, function as consonants but have open vocal tract- so sometimes V and sometimes C
F2 and F3
glides acoustics: distinguished among themselves by ___ /w/: F1 and F2 are low /j/: low F1 and high F2
Sonorants
glides or semivowels /w/ and /j/ liquids /r/ and /l/ voiced similar to vowels
F2 = ____ ; F3 = ____
glides, liquids
What type of phonation onset has the shortest vocal rise time?
glottal attack
high frequencies
gradually disappear with age and go away first
glottogram
graph of laryngeal waveform, PGG or EGG, useful info about VF closure events, little info about open phase, periodicity, amp and shape of waveform measured qualitatively
greater energy
greater displacement of air particles =
women's speech
greater harmonic spacing: formant frequencies may be more difficult to estimate, H1 and H2 amplitudes are 6 dB stronger, higher F0 and more open, steeper spectral tilt, lower power
voiceless stops in English typically have voice onset time values of
greater than 25 msec
what are cilia?
hair cells
A spectrum of sound provides information about individual
harmonics
vocal tract resonates more strongly toward
harmonics within tube bandwidths
Voiced consonants
has a periodic laryngeal source
What does F1 do after stop
has a rising transition
stop consonants
have a brief stop gap, complete occlusion of airflow
what are non-sibilants
have a relatively flat spectrum
manner
how the airstream is modified as it passes through the vocal tract
What is auditory feedback?
hearing one's own speech, air and bone conduction
Ultrasound
help visualize tongue movement
For [j], f2 is ____
high
Large resonator yields ____ damping
high
When /i. is syllable-final: tongue dorsum is:
high
vowel in meat
high F1, close to F2, gap between F3
vowel in a lot
high f1, close f2, gap f3
vowel in cat
high f1, mid f2 and mid F3
F1
high for low vowels and low for high vowels
/i/ distinctive feature
high frequency energy from resonance within oral cavity- small oral cavity, large pharyngeal cavity
Sibilant
high frequency fricative speech sounds; to be a ___ , it needs more energy.
Sibilant
high frequency fricative speech sounds; to be a sibilant, it needs more energy. These are /s/, /z/, /ʃ/ and /ʒ/
What are sibilants and what is the volume
high frequency spectral peak. the sibilants are louder than non-sibilants.
VP port closure is moderate for
high vowels
low
high vowels have a (blank) F1
how is /w/ produced
high, back tongue position, rounded lips
How is /j/ produced
high, front tongue position
Which stop plosives are high and which ones are low when initial plosive is VL?
high- t, d middle- k,g low- p,b
in vowels that follow the plosive...
high- velar k,g mid- alveolar t,d low- bilabial p,b
infant's speech
higher F0 and formant frequencies, intonation is rise-fall, flat, fall *not consistent phonation types: harmonic doubling, biphonation, vocal tremor, noise components, nasalization
spectral roll-off
higher frequencies lose energy
the /s/ has a ____ - frequency spectral peak than the /sh/ because the /s/ is produced with a ___ cavity anterior to the constriction
higher; smaller
thyroarytnoids, use LCA and interarytnoids to close vf, use PCA to open vf
how do we stop air from coming out?
four
how many effective degrees of freedom during speech are there?
The sampling theorem (also known as the Nyquist theorem) is fundamental to digital audio. It proves that...
if the sampling rate is at least twice the highest frequency in a sound, and that if proper low-pass filtering is done, the output sound will be identical to the input sound
large
if the tongue is high front, such as eee- tiny front, (blank) open space in back of tongue
smaller
if the tongue is low back, such as ahhh- makes pharynx what?
close
if tongue height is high it is (blank)
open
if tongue height is low it is (blank)
When are VOT values negative?
if voicing onset precedes stop release
tense
if you can end a word with a vowel it is what?
low pass
if you want to focus on hearing the low frequencies
broad band spectrograms are shown to
illustrate change in vocal duration that depends on voicing feature of postvocalic consonant
retentive (backward)
in "sweet" start lip rounding in /s/
Speech sounds vary intrinsically in what?
in duration
In what is VOT measured?
in initial stops, with 4 categories of values
Where can intonation patterns be seen?
in phrase, word, or sentences
When does tactile feedback occur?
in speech via the articulators contacting one another, or air pressure changes in the glottis or subglottal region
How do speech production models tend to be expressed?
in terms of natural language: verbal descriptions, charts, definitions, and rules
How are assimilation and coarticulation differentiated?
in terms of: - number of articulators involved in each effect - number of speech sounds involved in each effect
anticipatory (forward)
in the "am" start lowering velum in /a/
Give an example of partial assimilation.
in the phrase "eat the cake", the 't' is produced in the lingua dental, rather than the alveolar position due to the influence of the following voiced /th/, thus the new /t/ is an allophone of the original phoneme, thus the change is phonetic not phonemic
Give an example of complete assimilation.
in the phrase "ten cards", /n/ is produced with the tongue dorsum on the velum, in preparation for the velar sound /k/ - this tongue position will actually produce the nasal /ng/ which has a lingua-velar place of articulation, thus there is complete assimilation of /n/ to /ng/
Where is aperiodic noise created?
in the vocal tract
Where is pre voicing VOT lead usually seen?
in voiced stops in Spanish, French, and Italian
maximal gain
increase in intensity= 6 dB (10 log10 4 = 6)
Decreasing the opening of the oral cavity results in..
increase pitch
stress
increased effort during the production of a syllable, increased intensity, fundamental frequency, and duration emphasis on syllable or word in a sentence/ compound word
rate
increased speech rate tends to have a direct correlation with movement velocity of orofacial structures
Amplitude of harmonics decreases as frequency ___.
increases
contracting the diaphragm increases/decreases the volume of the chest cavity
increases
F1
increases as jaw opens , as mouth gets wider, this increases
What does rising intonation result from?
increases in vocal tension that makes folds vibrate faster, which is a result of increased cricothyroid muscle activity
the relationship between volume and pressure is such that when volume - , pressure - and vise versa
increases, decreases
What increases pitch?
increasing length, decreasing mass, increasing tension NOT: Increasing mass
spatiotemporal index
index of consistency of movement across 10 repetitions of an utterance. sum of the SD for displacement of 10 data points at 50 equally spaced points
False
infant perceptual system is the same as the core perceptual system of adults. True or False
Feedback is incredibly important for who?
infants and younger children
The number of ways of producing vowel like sounds is virtually _____.
infinite.
What is tactile feedback?
information receives from touch, stimulation of touch receptors
What is internal feedback?
information within the brain about motor comments before the motor response (brain tellings muscles what to do) - information loop is entirely in the brain
the term "relaxation pressure" refers to the pressure that is created when we relax the muscles used for
inhalation
In what part of a syllable is the /l/ considered light
initial
auditory templates and feature detectors
input/cue is matched to templates
The oral and pharyngeal cavity during the phoneme /a/ is lager or smaller than /i.
larger
Mirror, flexible and rigid are types of
laryngoscopes
The fundamental and its harmonics decrease in ____ as they increase in _____
intensity/frequency
What are the 3 limitations of the simple target theory?
interpersonal variability, dynamic nature of vowel sounds, and ideal frequency target for vowel formants
What suprasegmental deals with pitch change
intonation
1. superior longitudinal, 2. inferior longitudinal, 3. transverse, 4. vertical
intrinsic muscles of the tongue (4)
quantal theory
invariance in the signal that we attend to; grew out of distinctive features when you cross a quintal boundary the sound changes ex: changes from vowel to fricative to stop and this is what we are attuned to
cite two components of a sound system
inventory, constraints, phonological rules
Complete assimilation
involves a change outside of phonemic category. Example- "ten cards' - the /n/ in 'ten' is articulated with the dorsum of tongue on the velum in anticipation for /k/. Produces velar-nasal /ng/ which is a different phoneme then /n/.
Partial assimilation
involves a phonetic change (One allaphone to another ) example: Eat, Eat the cake. The tounge tip typically makes contact with the alveolar ridge for /t/ in 'eat'. In "Eat the cake" the tongue is held on the articulation of the /Q/ in "the". Therefore /t/ has been assimilated to the place of articulation of /Q/
what is the motor theory of speech perception? (liberman et al)
involves a sensory and motor component. s-m this theory is a kind of an umbrella theory with multiple components. the first one is the most important. there does not appear to be a 1-1 relationship between acoustic signals, and our speech perception processes. cues are determined by their specific production context. we are aware of how the speaker's vocal tract is used in such contexts. this knowledge helps in comprehension. so we encode and decode audible speech rather than encipher/decipher it. speech is an acoustic code
The cerebellum is/is not part of the CNS
is
the spinal cord is/is not part of the CNS
is
What is diphthongization?
is a process that occurs on any pure vowel if it is spoken in a stressed diphthong like manner. it represents an example of allophonic, rather than contrastive variation.
F0 Declination
is the Tendency for F0 to Decrease Over the Course of the Utterance
what happens to the intraoral pressure during release of a stop
it drops
when /h/ is voiced -
it is umbedded between voiced segments
What happens to the intraoral pressure during closure of a stop
it rises
How is Rhythm defined
its defined according to the timing of syllables and the timing of the space between them
durational cues
juncture cues (breaks/no breaks) ex: keeps talking/keeps stalking pre-boundary lengthening: at semantic/phrase boundaries, longer duration of final (few) syllables than if they occurred in mid-sentence
Liquids
l (lateral) and r (rhotic) Have resonant frequencies (formants) that change fairly rapidly (faster than for dipthongs)
what stop has no constriction in front, so all frequencies
labial stops
what is the place of articulation that stops can be made?
labial, alveolar, velar
What are the four places of articulation
labiodental, linguadental, alveolar, and post alveolar
The vowel bounds are strongly subject to:
languages, dialectical variations, phonetic contexts, and even individual speaking styles (oral vs throaty resonance). Perception of vowels is able to see through this variability and interpret them correctly.
Does /u/ have a large or small oral and pharyngeal cavity
large
Active (motor) theories of speech perception
leans twoard motor theory of speech perception. active theories in general have the best opportunity to explain the phenomenon of invariable perception, despite the apparent lack of stable acoustic characteristics in the actual speech signal, like allophonic variation and coarticulation. it is possible that how sounds that are perceived as the same may differ depending on speaker, phonetic context, and specific multi influences between sounds. a BLUR characteristic.
primary fricative energy can be calculated based on
length of cavity anterior to constriction f=34,400 cm/4 x xcm
2 cues to voicing in fricatives
length of noise, duration of preceeding vowel, length of voicing bar
Pitch is determined by (on the level of the coal folds)
length, mass and tension of vocal folds
Consonants have_____, but _____ than vowels
less energy, more meaning
Nonstridents
less noisy, have acoustic energy across a very broad range of frequencies (can't say specific range) difficult to distinguish non stridents not because of voicing, but because the filters are so similar in length (interdental and labiodental)
What muscle is active during oral speech sounds
levator palatini
F3 is ____ for /l/
level
when the superior longitudinal muscle of the tongue contracts the tip of the tongue
lifts upward
To what extent can the Central Neural/Internal Feedback Hypothesis be tested?
limited by current investigation techniques, which do not allow for safe investigation (unethical bc must interrupt a person's brain processes)
H&H theory (hyper & hypo)
lindblom explains variability in production speaker and listener work together (signal, linguistic knowledge, contextual knowledge) hyper and hypo: if your signal is too messy, clean it up; vice versa
What do linguistically oriented models use?
linguistic and phonetic analysis to describe speech, such as the use of the International Phonetic Alphabet
Vocal tract during /u/ is lengthened due to
lip rounding and protrusion
What's an example of articulatory blockage?
lips closed
What areas have very large amounts of touch receptors?
lips, alveolar ridge, and especially the tongue
/r/ and /l/ are
liquids
F3 transitions
liquids
Approximates
liquids and glides
What is the formant pattern of fricatives
little or no formant structure compared to vowels/semivowels
major cue that tells you which vowel was produced
location of formants
major cue to identifying which semivowel was produced
location of formants; formant transition to/from vowels
cue to place of production for stops
location of most intense frequencies in burst
cue to place for fricatives
location of the most intense frequencies
voiceless stops beginning a word are usually produced with a long/short delay of voicing onset for the next vowel
long
everything else being equal, the difference between male and female frequencies of glottal cycles is due to that face that men have shorter/longer glottal folds resulting in higher/lower frequencies
longer, lower
Speech is what type of wave
longitudinal
post vocalic nasals
lose intensity during nasal
Back vowels have a ___ F2
low
For [w], f2 is ____
low
Is the amplitude high or low for a nasal murmer?
low
when /l/ is syllable-initial: tongue dorsum is
low
f3 is ____ for /r/
low (will drop)
vowel in sit
low F1, gap high F2 and close F3
high damping
low energy of all formants
describe the formants of /w/ and /u/
low f1 and low f2
vowel in bought
low f1 close f2 and gap high f3
vowel in ooze
low f1, close f2, gap with high F3
vowel in bet
low f1, gap high F2 and close f3
vowel in book
low f1, small gap f2 and high f3
vowel in got
low first formant, large gap to F2 and close to F3
nasal formant
low frequency around 300 Hz, but the highest energy. consonant energy is reduced because of boogers, higher formants have reduced energy, location changes with place of articulation
During stop gap for voiced
low in amplitude can be voiced all the way through or partially
VP port is looser for
low vowels
high
low vowels have a (blank) FW
Low vowels have a ____ tongue body, or a ___ F1
low, high
which of the following are examples of coarticulation? - lowering the velum a bit early when saying /an/ - tapping the tongue on the palte during production of /l/ - lip-rounding during the /s/ in /su - lowering the velum a bit early when saying /an/ and tapping the tongue on the palate during production of /l/ - tapping the tongue on the palter during production of /l/ and lip-rounding during the /s/ in /su/ - lowering the velum a bit early when saying /an/ and lip rounding during the /s/ in /su/ - all of the above
lowering the velum a bit when saying /an/ and lip-rounding during the /s/ in /su/
oral cavity during /a/ is increased by
lowering tongue passively by lowering jaw and lowering jaw by actively depressing tongue
Lip protrusion ____ formants- why?
lowers, it elongates the vocal tract
quarter wave resonator
lowest resonance of neutral vocal tract has a wavelength that is 4 times the length of the tube
F1
lowest resonant frequency of the vocal tract
What are the nasals?
m, n, ng
aerodynamic stability
maintenance of stable air pressure and airflow
What are the 3 ossicles and what dB boost do they give?
malleus, incus, and stapes + 5 dB
What is experimental research?
manipulate experimental variables. (e.g. independent variable or experimental treatment) while controlling the conditions of the study. (pattern playback strategies are very helpful).
What is a phonemic change?
manipulating formants I and II; vowels, dipthongs, semi-vowels behaviors; tongue heigh height; tongue advancement; lip opening (rounding, unrounding) manipulate formant 3: [r] sound as opposed to [l] nasality: couple or decouple the nasal cavity; cup de sac resonance
semivowels are divided into
manner classes
Plosive
manner of consonant articulation made by sudden release of air impounded behind an occlusion in the vocal tracts. Used synonymously with "stop".
Assimilation can occur according to
manner, place, and voicing
What has been found about /u/?
many have found that the lips begin to round for /u/ well before the actual vowel is to be produced
What does (p) stand for?
maximum pressure
What does (v) stand for?
maximum velocity
Antinode
maximum vibratory amplitude; formant frequency is lowered by constriction; maximum volume velocity or minimum pressure
Stops: Stop Gap voiced
may have voicing through all or part of the stop will be low in amplitude can see periodic vocal fold vibration from both wave form and wide band spectrum
phons
measure of human loudness/sensation
nasalance
measurement is ratio of nasal and oral sound pressure, trace looks like air pressure/flow waveform , different norms for different passages
electromyography
measurement of electrical activity in muscles caused by synaptic transmission from a motorneuron, measures muscle action potential (MAP), performed by comparing the electrical signal as it passes from one electrode to another electrode
nasometer
measures nasal resonance via nasal and oral microphones partitioned by a sound-separating plate
What are the average fundamental frequencies (f0) for men, women, and children?
men- 138; women- 270; children- ~ 403 for F1
direct realism
not really a theory you perceive changes in your environment when they are relevant to you; the actual objects of perception are directly perceived we are active perceivers, constantly learning still gesture based
schemata
novel sound production learned through motor programs, which are enhanced through repetition
Semivowels are considered consonants because they occur on the _____ of words
nucleus
semivowels never act as the ____ of the syllable, on the ____
nucleus/ periphery example: you - open and resonant, but next to vowel.
Frequency refers to
number of cycles per second
Suprasegmentals depend on:
numerous physical changes, convariation of several acoustic variables, and degree of contrast between variables across several syllables
obstruents
obstruct vocal tract - Tongue moves from vowel to obstruent - Tongue moves from obstruent to vowel - Locus (target) frequencies of obstruent
which of the following, if any, is not true about vowel neutralization or "vowel reduction"? vowel reduction ___ -involves keeping the articulators in a more central position that would normally be assumed in producing that vowel -results in formant values that are closer to those of the schwa -can be done without loss of intelligibility, as long as the speaker is careful to meet the perceptual needs of the listener - occurs more commonly on stressed syllables -all of the above -none of the above
occurs more commonly on stressed syllables
The vocal tract resonates at odd or even multiples
odd
triangular wave
odd harmonics; 12dB per octave roll-off. We know that the source is not this, and it has odd and even harmonics
higher resonances are at
odd numbered multiples of the lower resonance
sensory integration
often missed in these theories
consonants
one or more areas of relative constriction of vocal tract. source of soung: /+voiced and /+turbulent airflow (/s)
consonant
one or more areas of vocal tract narrowing by some degree of constriction (partial or complete)
why aren't place cues for affricates discussed?
only 1 place for production of English affricates
Nasals
only phonemes that sound exits the nasal cavity. Occlude oral cavity and open velopharyngeal port. ALL NASALS ARE VOICED Nasal resonance is constant.
F2 transitions and stops
only thing that changes is the start
What is short lag?
onset of vocal fold vibration follows shortly after release burst, voiced stops in English, range from -20 ms to +20 ms
Nasal sounds require a ____ VP port
open (lowered velum)
What are the two kinds of feedback systems?
open and closed loop
lips
open end of the tube (vocal tract)
vowels vs consonants
open versus closed vocal tract is seen as amount of energy passing through vocal tract ex-vowels have more energy
Most speech sounds are
oral
The vowel quadrilateral can be visualized within the
oral cavity
VP port needs to be tighter for
oral obstruents (require airtight seal)
during the production of nasal consonants the airflow is blocked in the - cavity but allowed to go through the - nasal cavity
oral, nasal
What condition may result if eustachian tube fails to permit pressure equalization when opened
otitis media
name 3 parts of the ear
outer, middle, inner
What is an open loop system?
output is preprogrammed, no feedback needed
long lag
over 40 ms
Rising intonation can do what?
override the natural inclination toward falling pitch to express excitement, ask a question, etc
average VOT for initial stops (voiceless)
p 58 msec t 70 msec k 80 msec
the postcentral gyrus (sensory strip) is located in the - lobe
parietal
What is a burst filtered by
part of vocal tract in front of constriction
Aperiodic sound sources can be generated by
partial adduction of the VFs, Various locations along supraglottal VT, Forcing the airstream through a constriction
What do the passive theories of speech perception say?
passive theories assume perception remains in the sensory processing domain entirely, somewhat like perception "falls into place automatically" without our active participation. don't need to refer to production to perceive speech- Fant.
x-ray microbeam
paths of tongue and jaw movements: /kaek/ normal loud speech. upper trace represents the midsagital contour of the palate.
intonation
pattern of fundamental frequency change in the production of an utterance, production of a statement or question, pitch rise and fall plus stress
juncture
pauses in speech stream
formant
peaks of energy on spectral slice
landmark detections using points of minimal and maximal change
perception is based on what?
when learning ones 1st language what comes first
perception of phoneme production of others
What is a closed loop system?
performance of system is fed back in for check
simple harmonic motion
periodic movement where a proportional amount of movement occurs during vibration pattern
vowels
periodic source, which means they are voiced and produced by VF
vocal tremor
periodic variation of frequency and amplitude
Voiced stops can cause small periodic sound because of
periodic vocal fold vibration
the spinal and cranial nerves are part of the - nervous system
peripheral
formants
perks in the spectral slice are (blank) because they are the ones that have the highest energy
How is aperiodic noise created?
phonated or unphonated breath stream is sent through constrictions formed in the vocal tract and the combination of strong airflow and narrow constriction makes the airflow turbulent and creates frication
temporal complexity
phonemes become shorter when syllable length increases, speech rate becomes a primary consideration, acoustic cues are not tightly bound, blending occurs so rate is met
What are the 2 kinds of diphthongs?
phonemic and non-phonemic
Complete Assimilation
phonemic class changes EX: Velarization of n/ before /k/ in "ten cards"
What 3 ways can vowels be analyzed?
phonemic distinction, articulatory properties, and acoustic characteristics
What is a phonemic diphthong?
phonemic or contrastive diphthongs are unique and independent sounds; they are relatively long in duration; for example, cow boy I. some acknowledge the Iu diphthong (typical for UK english and some words on East coast) as well.
Degree of VP closure varies with _____
phonetic context
What is the perceptual correlate of frequency
pitch
We hear increases in ______, _____ and _____ with stress
pitch (Fo), intensity (Amplitude) and length (duration)
Suprasegmentals involve variations in
pitch, loudness, and duration
Increases in this can also increase frequency
pitch/ subglottal pressure
Direction of F2 transition is a cue for
place of articulation
Consonant classification occurs along the following three demensions
place of articulation, manner of articulation, voicing
phonetic cues
place, manner, and voicing specific to language
Consonants defining features
place, manner, articulation, voicing
This is how consonants are classified
place, manner, voicing
what aspects of consonants are perceived categorically
place, manner, voicing
Audibly released stops are also called
plosives
semivowels are considered consonants due to their
position of occurrence
If everything is okay, feedback will be what?
positive
What are the points of constriction for /sh/ and /3/
posterior to the alveloar region, lips are rounded and protruded
in unreleased final stops, the ___ will be shorter if the stop is ____.
preceding vowel; voiceless
On perception of fricatives, list some important aspects that affect it
presence of fricative noise (manner) intensity (sibilants vs non sibilants) spectral cues for placement no reliable info about the remaining necessary distinctions.
cue to affricate
presence of silence; followed by burst, followed by noise
3 cues to voicing in stops
presence/absence of voicing bar
gating task
present listeners with word fragments of progressively increasing length of 50 ms each
aerodynamic measures
pressure and flow measured in the oral cavity to estimate the pressure/flow at the level of the vocal folds. PAS= phonatory aerodynamic system
Nasals
primary resonator is pharynx-nasal cavity....shape cannot be altered oral cavity is dead end resonator antiformants: reduced energy in a frequency range...location is a result of place or articulation in the dead end resonator voiced and low in amplitude nasal murmur: low frequency formant (approx. 300Hz); higher formants significantly dampened formant transitions similar to stops surrounding vowel nasalized complete closure of the oral cavitiy
vocal fold vibration
primary source of sound for speech is the.....
1. ack of invariance, 2. relevant unit of perceptual analysis, 3. lack of segmentation, 4. perceptual normalization, 5. specialization of speech perception, 6. contextual effects
problems in speech perception (6)?
normalization
process of simplification by smoothing out "noise". variability that we can ignore wihtout loss of information
plosives
produced with a period of complete contact between two articulators that briefly stops airflow
tense vowels
produced with greater muscle contraction and are produced at the extremes of articulatory posture, with tongue higher in oral cavity
F3
productions of /r/ are often conspicuous in spectrograms by virtue of the marked changes in ___ between /r/ segments and adjacent sounds.
one way that speakers signal the end of the a phrase is by
prolonging the final phonemes
suprasegmentals
prosidy, overlay the sequence of connected speech, express subtle of differences in meaning: stress intonation rate
when the inferior lonngitudinal muscle of the tongue contracts, the tip of the tongue
pulls downward
passive theories of speech perception is
purely sensory
when the posterior part of the genioglossus muscle contracts the tongue is
pushed forward
Rising intonation often implies a
question
Sonorants: liquids
quick articulator movement good formant structure sustainable
The elements of change in semi-vowels occur more ____ than those characteristic for diphthongs.
quickly.
rapid opening or closing gesture
rapid rise/fall in intensity
dendrites are the part of a neuron that - impulses from other neurons
recieve
the internal muscles of the larynx are enervated by the - branch of the CN X (vagus)
recurrent
PAS (phonatory aerodynamic system)
red lines show when VFs are closed (valleys on diagram) orange show pressure (peaked when VF are valley)
VOT and rate of speech
reduced contrast at faster rates
definition for active theories of speech
refers to own production when trying to perceive
For diphthongs, research suggests that it is the direction and steepness of the change that matters most, and that it is not very important that the on glides and off glides actually hit perceptual target freq.
shows that our system for speech perception is very flexible in coding what is intended rather than what occurs in a physical sense; support the motor theory of speech perception.
The sounds /s/ and /z/ are sibilant or non sibilant
sibilant
are /sh/ and /3/ sibilant or non-sibilant
sibilant
A ____ fricative noise is stronger than in _____.
sibilant; non-sibilants
fricative energy varies between
sibilants vs non sibilants
Speech Banana
significant link between hearing and speech
The brief cessation of airflow emitted from the vocal tract underlies the acoustic period of ____ characteristics of /p/, /t/, and /k/.
silence
2 cues to 'stop'
silence of closed phase; burst after silence
the stages in a voiceless stop consonant followed by a vowel, in sequence, are
silence, release burst, frication, aspiration, phonation
What can account for perceptual differences in juncture
silence, vowel-lengthening, presence/absence of phonation or aspiration
closure may be completely
silent
coarticulation
simultaneously articulating more than one phoneme. this is important in the perception of certain consonants
What are complex waves composed of?
sine waves of different frequencies
nasal stops
some linguists call- /b, m/, /d, n/, /g, ŋ/ what?
what are anti-resonances
sound absorbed
what is measured in decibels
sound intensity
what occurs in the inner ear
sound is changed from mechanical energy to vibrations in fluid then to electric impulses
what is the difference between sensation and perception in terms of hearing
sound is received vs meaningful awareness
What is meant by sound magnitude?
sound loudness
nasal murmur
sound of a nasal, acoustic waveform of nasal consonants
Fricative Sound
sound produced by forcing the airstream through a narrow articulatory constriction; one of the consonant manners of articulation.
what occurs in the middle ear
sound waves are changed from acoustic to mechanical energy
what generally occurs in the outer ear
sound waves are conducted to the middle ear
categorical perception
sounds perceived with abrupt shifts between groups
What are the basics of source filter theory?
source = the role of sound from vocal folds filter = the role of the vocal tract in modifying this sound The vocal tract is a fleshy tube, the shape of which can be altered by actions of the speech organs. A sound is created by the vibrating vocal folds and it is then modified by passing through the tube.
Source is to ___ as filter is to ___.
source = vocal fold vibration filter = vocal tract
/w/
source for this is from the vocal vibration
/z/
source for this is same but with vocal fold vibration
when several impulses travel down different axons at the same time towards the synapse with another neuron the effect is - summation
spatial
articulatory gestures
speaker has internal cognitive map of spatial targets of the vocal tract that directs articulator movement.
What are target models?
speakers attempt to hit a series of targets to correspond to the sound they are trying to produce
What was found when looking into compensatory strategies of proprioceptive intervention?
speakers were found to compensate immediately when disrupted, and speech often can continue on normally ex. can still produce /u/ while holding lips back from rounding
association areas are areas that are not dedicated to any
specific function
What are the two acoustic cues to place of articulation of fricatives
spectrum and intensity
False. maturation occurs well into adolescence
speech motor control is mature after by age 12. True or False?
Speech production is useless without what?
speech perception
general auditory theory
speech perception is a generic listening skill that we practice a lot and get really good at; nothing special about it perception drives production
consrtiction
speech production is an aerodynamic phenomenon based on airflow
coarticulation
speech sounds are not produced in isolation but in context of syllables, words, and phrases, individual sounds lose distinctiveness, vocal tract adjusting for more than one sound
Consonants
speech sounds characterized by obstruction of the vocal tract compared to vowel.
the speed of formant transitions is related to the ___ and depends on ___ of articualtion
speed of the movements of the articulators; manner
What muscle stiffens in acoustic reflex?
stapedius muscle.
falling intonation often implies a
statement
the speech sound(s) that involve(s) a transient aperiodic sound source is/are/ the
stop and affircate
what do F2 and F3 depend on
stop and vowel
release of the closure
stop burst
vocal tract closure
stop gap
affricates have
stop gap followed by frication
Affricatives involve a sequence of articulatory shapes. This sequence is the articulation of a - followed by the articulation of a
stop, fricative
Affricates
stop-fricative sequence approx. palatal place of articulation (noise energy in same range as palatal fricatives) short rise time of acoustic energy may/may not see distinct release if release not distinct, may perceptually confuse with fricative
stops/plosives
stopping of air flow and than an explosion/burst we have pairs of voiced/voiceless stops in english we have glottal stop but it is not a phoneme it is an allophonic variation bilabials: /p/ /b/ alveolars: /t/ /d/ velars: /k/ /g/
In what manner of articulation is there complete articulacy closure in the oral cavity?
stops
Nasal sounds have similar constriction sites as ____ within the oral cavity
stops
which manner class(es) involve(s) the build-up of pressure within the vocal tract?
stops and fricatives
which manner class(es) has/have the fastest formant transitions?
stops and nasals
Non-resonant are
stops, fricatives, affricates- dissimilar to vowels
Obstruents
stops, fricatives, and affricates
What are the non resonant consonants?
stops, fricatives, and affricates (all have more restricted airflow)
Obstruent
stops, fricatives, and affricates which have blocked or restricted airflow, have aperiodic sound sources in the upper vocal tracts, and can be voiced or voiceless.
Obstruents
stops, fricatives, and affricates, Characteristics: Blocked or restricted airflow, Aperiodic sound sources in upper vocal tract, May be voiced or voiceless, Supraglottal noise sources, Stop bursts, Frication
coarticulation
the process by which adjacent sounds influence each other's articulatory and acoustic properties -overlapping/simultaneous production of more than one speech gesture -undershoot of the ideal articulatory target for a sound in isolation -normal speech approx 5 syllables per second -typical/normal/expected -NOT THE SAME AS ASSIMILATION
anticipatory (regressive) coarticulation
the properties of an upcoming target influence the realization of the current speech sound -ex. key and coo
A watt is:
the rate energy is transmitted (A measure of sound intensity)
What is the perceptual/ Acoustic cue for a diphthong
the rate of change between formant 1 and formant 2
cavity
the resonating space
Delayed Auditory Feedback may also be a result of what?
the result of forcing a speaker to attend to auditory feedback info which can conflict with articulatory movement info
path
the sequence of positions in space occupied by the articulator
Steady state
the set of formants that characterize a prolonged /l/ or /r/ -may not be evident in all productions
higher the voice
the shorter the vocal tract, the (blank)
What is the Simple Target Theory?
the simple target theory states that vowels are identified perceptually by their formant frequencies (norms for formants 1 and 2 were determined for all vowels by Peterson and Barney in 1952). The theory implies that for a vowel to be decoded, all we need to be aware of is the formant frequencies (I and II).
/s/
the source for this is turbulent airflow off of the alveolar ridge. the tongue tip is up.
Give an example of assimilation.
the speaker takes a "shortcut" and does not hit every articulatory position - one sound is produced in another, similar location to make articulation more efficient
What is the unit of stress?
the syllable
What is coarticulation due to?
the temporal overlap in articulatory gestures for vowels and consonants ("stoop" - lip rounding starts at the /s/)
What is voice onset time?
the time between the release of articulatory blockage to the beginning of vocal fold vibration of the following word
Voice Onset Time
the time from the release of the stop to the onset of voicing (vocal fold vibration) important cue to identity of stop consonants have a burst, only for stops useful for pre-stress syllable position (initial stops) voiced stops are shorter than voiceless can measure closure duration
VOT definition
the time interval between the burst and the onset of voicing
trajectory
the timing of the sequence of positions
Vowel Transitions
the transition between the vowel and consonant and the consonant and vowel. Used to identify the place of articulation for the consonant.
vowel transitions
the transition between the vowel and the consonant (VC) or the consonant and the vowel (CV)--this is an example of coarticulation
the frication noise of the /h/ is produced by ___ and is filtered by ___
the vocal folds held near the midline; the entire vocal tract
formant
the vocal tract has an infinite number of (blank)
rule of standing wave patterns
the vocal tract will resonate only only at odd-numbered multiples of the lowest frequency
3 and 3
there are 6 possible degrees of freedom. how many rotational and how many translational?
Describe the place feature of nasals
there are two essential rules, direction and duration. Direction deals with adjustments of F2 to or from vowel; what is high or low depends on the vowel. Duration is how long it takes you to make F2 adjustments. longest adjustment is ng (hardest to make are lingua-alveolar so transitions take longer) medium long n shortest m (bilabial transition is the shortest)
Describe the manner feature of a nasal accousitcally
there is a presence of a nasal murmur and an occurrence of antiresonance (a perception of muffling, or damping). a nasal consonant means that the nasal cavity is coupled to the vocal tract; there is a cul de sac resonance in the back portion of the oral cavity behind the closure. the entire oral cavity is closed and creates a cul de sac resonance
When the jaw is more open...
there is more pharyngeal constriction
formants
these are concentrations of acoustic energy and act like band pass filters.
What following procedures would help you optimize results in interpreting spectrographic images clinically re: vowels?
use norms that are appropriate for the individual client (consider gender and age when judging a production). When possible, use the client as his/her own control for comparison (use a lucky, chance, good production as model in therapy). Measure formant frequencies during "steady states";that is, in cases where vowels last long enough for steady states to be available. Train generalization of targets to a variety of contexts (don't assume a production should be identical across phonetic contexts). for example, in some contexts, it is normal that vowels "neutralize". Basically, in all applications use information about f1 and f2 as primary source for vowel feedback.
What is contrastive stress?
used to differentiate between two words that differ only by a syllable ("I told you to REceive the guests, not DEceive them.") - contrast may only be implied ("This is my red bike." - weakly stressed syllables are as important as strong ones for contrast
How is descriptive and experiment studies related?
usually a new field of study initially engages in descriptive studies, which produce the first insights and hypotheses for experimental studies.
1. differences in physical properties of the larynx and vocal tract, 2. age, 3. gender, 4. habits of articulation, 5. suprasegmental features/speaking rate
variability among speakers may be due to what? (5)
In what stop is the front cavity further back and has a longer front cavity and mid- frequency energy?
velar stops
In oral sounds, what port is closed?
velopharyngeal port
hypernasality
veloppharyngeal incompetence, increase in formant bandwidths, decrease in overall vowel energy, introduction on nasal formant, rise in F1 and lowering of F2 and F3, presence of antiformants
When analyzing Waveforms, this provides information related to amplitude (___ displacement)
vertical displacement
nasal murmur
very low F1 (250-500 Hz) large nasal resonating space and narrow opening
This measure is obtained by having a client maximally exhale following a maximal inhalation
vital capacity
Harmonics
vocal fold resonances
What is long lag?
vocal fold vibration is delayed for a long time after articulatory release, voiceless stops, range from 25-100 ms
Why is aperiodic noise heard when glottis is closed?
vocal folds = vibrating, vibrating vocal folds because they're closed, airstream passing through constriction
noise components
vocal folds don't close all the way,
the space called "glottis" is anatomically defined by
vocal folds in the front, arytenoids and cricoid in the back
Slope of the source spectrum (spectral roll off) varies by ___.
vocal intensity (talk louder, vocal folds close faster)
frequency dependent
vocal tract filter is what?
formant frequency
vocal tract length affects (blank)?
9 cm
vocal tract length of a child (then F1=34,000/(9x4) = 944.44 Hz)
Formants
vocal tract resonances
voiced fricatives
vocal tract source may be the secondary sound source
each arytenoid cartilage has two processes called...
vocalic and muscular
syllable- final /r/ is often _____ or realized as an __________
vocalized/ extension of the preceding vowel
This is the duration of the period of time between the release of a stop and the beginning of vocal fold vibratino
voice onset time
Voiced Sounds
voice onset time is 50 ms or more
Voiceless sounds
voice onset time is less than 50 ms
Consonants produced with a periodic glottal tone are
voiced
glottal sound
voiced
the speech sound in english that involves a combination of complex periodic and continuous aperiodic sources is the ___
voiced fricative
Periodic source + Continuous noise=
voiced fricatives
Periodic source + transient aperiodic source at release =
voiced stops and voiced affricate
Burst releases are stronger for _____ stops; intraoral pressure during closure is greater because the glottis is open
voiceless
Consonants produced with no periodic glottal tone are
voiceless
supraglottic sound
voiceless
a sound that has sustained noise and no "voice bar" in the spectrogram is likely to be a ___
voiceless fricative
Stops: Stop Gap voiceless
voiceless should be silent
Burst release is more intense for voiceless stops than voiced stops why?
voiceless stops- adducted (more air) voiced stops abducted
Stridents
voiceless- just aperiodic noise=source voiced-aperiodic noise and peroidic vocal fold vibration different between alevolar and palatal=where noise energy is alveolar- 4-8 range of acoustics energy Palatal: 2.5 KHz- 8KHz Noise energy goes up to 8000 Hz
a long interval of frication and aspirationg occurs in ___ stops because the ___ at the time of stop release.
voiceless; vocal folds are apart
What happens when pitch rise is used with an incomplete utterance?
when used with an incomplete utterance like "let me see...", the conversational partner is less likely to interrupt than if pitch fell at the end
favorite frequencies
when we change the space between jaw, lips, etc. we are changing the what? these are also called formants
bottom
where do you look for dampening--bottom or top? It holds nasal murmur and such
Restoring Force produces changes in the Momentum of the system:
where the restoring force is lowest the momentum (velocity) of the system is highest. Where the restoring force is highest the momentum (velocity) is lowest
vowel transition
where vowel overlaps with previous and following consonant
Narrow Bandwidth
which bandwidth resolves frequency information well (harmonic structure) but time information poorly
lower frequencies
which frequencies have greater energy?
the higher frequency harmonics
which frequency harmonics are resonated more?
because going from totally closed to open
why does f1 rise for all the stops?
wide band vs narrow band frequencies
wide band allows 300 hz (3 harmonics at a time) and narrow band only allows 50 hz (1 harmonic at a time)
bandwidth
wideness of the band is called a (blank)
In the process of speech production, the lexicon supplies the
words
sequence of segments (each of which consists of a bundle of binary distinctive features)
words are represented in memory as what?
higher because the center frequencies for /S/ are higher
would the high pass filter for /s/ be lower or higher for /S/?
Do voiced obstruents combine periodic and aperiodic sources?
yes
Can we produce a retroflex back R (instead of bunched)?
yes, by hitting right spot with something.
what are the 3 types of VOT
zero VOT, Positive VOT, Negative VOT
Norms of Conversational Speech
~ Average: 7 cm H2O ~Range: 3-12 cm H2O ~7-10 cm H2O is normative for speech
S to Z
~ Duration of how long a person can hold /s/ as opposed to /z/ ~should be equal durations. If greater than 1 then glottis is not completely closed ~1.4= Pathology
Vocal Fold Histology/ Viscosity
~ Layers are: 1. Epithelial 2. Superficial 3. Intermediate 4. Deep 5. Thyroarytenoid muscle as you go down, it becomes harder to vibrate Superficial layer is the most susceptible to damage b/c the easier the vibration the more susceptible to damage Lower viscosity means it moves more, so there is more damage
Smither and Hixon
~ Tests glottal resistance ~ Measured in cm H2O/ L/ Sec. ~ Have client elicit a syllable train ~ Not exact or specific but can tell you when there is a problem
Laryngeal Airway Resistance
~ measure of the amount of resistance the VF offer airflow ~Measures glottal efficiency
Transglottal flow
~How much air goes past a point in a certain amount of time
Horizontal Phase Difference
~Open posteror to anterior ~close anterior to posterior
Vocal Rise Time
~Where to take acoustic measure ~ When you get a sound to reach a steady state from onset to sound.
Phonemes we need to know
·/s/ 3500 Hz+, dark noise, longer duration. ·/z/ 3500Hz+, dark noise, shorter duration. ·/ʃ/ 2000+, dark noise, longer duration. ·/ʒ/ 2000 Hz+, dark noise, short duration. ·/f/ 500Hz+, light noise, longer duration. ·/v/ 500Hz+, light noise, short duration. ·/θ/ 2000Hz+, trailing tail, light noise, short duration ·/ð/ 2000Hz+, light noise, short duration ·/r? (or the other two) F3 below 2000Hz, clear formants and striations. ·/i/ F1 & F2 very far apart, clear formants and striations. ·/ə/ Short in duration, clear formants and striations.
voicing cues for affricates
ʤ will be the only one voiced showing F0 and periodicity
What is the formula for wavelength?
λ = c/f
Review of terms.....
• Power = Total output of sound source in all directions • Intensity = Rate of energy flow through a unit size • Pressure = Force applied perpendicular to the surface of an object • Watt = Rate energy is transmitted • Pascal = Force divided by area, 1 Pa = 1 Newton/meter2
Variability in Voice Onset Time
• Prevoicing: voicing begins just before release (negative) • Simultaneous: voicing begins on release (near 0) • Voicing begins after air is released (positive)
Acoustic Cues for Stops
• Stop gap • Burst • Voice onset time • Post‐stop vowel formant transition
Mobile Articulators
• Tongue • Mandible • Velum • Lips • Pharynx • Larynx
Release Burst
• Transient burst noise on release of the stop gap and impounded air • Duration approximately 10-30 ms for voiced stops and slightly longer for voiceless cognates • Observed in waveform as sudden change in amplitude • Observed in spectrogram as gray broadband
Resonances required for vocal tract model
• Tube open at one end closed at other - Quarter wave resonance • Tube open at both ends -Half wave resonance • Tube closed at both ends -Half wave resonance • Helmholtz resonance - "Jug" resonance
Why is digitizing speech SO NEAT
• Virtually perfect recordings • Absolutely perfect copies • Ease of cataloging and retrieving sound • Flexibility -- once speech is digitally encoded
formant transitions
•Articulation is seldom steady state for long •Articulators moving from one sound to another - coarticulation happens
vowel nasalization
•Coarticulatory effect •Portion of vowel closest to nasal consonant becomes nasalized. • Antiformants and formants (zeros and poles) • Acoustic evidence for nasalization - Visible lack of harmonic energy - F1 raised - Dampening of energy (lower spectral peaks) for F1, F2,F3 • Acoustic Characteristics of Voiceless and Voiced Affricates • Prosody
manner of articulation
•Complete, transient cessation of airflow •Constriction with continuous airflow •Fricatives constrict the air all the way from the glottis with continuous airflow
liquids
•Formant transition similar to vowels, with a steady state portion (depending upon context) •/l/ complex due to lateral emission - Formants & antiformants - Antiformants (zeros) =dampen energy - Arise from division of airflow in vocal tract - (similar to homorganic /n/) • /r/ - - F3 decreases (mid-palatal constriction) (VC)
Suprasegmental
•Frequency, duration, amplitude •Frequency of voicing •Duration of voicing •Amplitude of voicing
Periodic Complex Waves
•Fundamental frequency plus: •Some combination of harmonics •Harmonics are frequencies related to fundamental by a ratio of whole numbers. •For example: 1:2, 1:3. 1:4... •If this is the case, the frequencies are said to be in harmonic relationship with one another.
approximants
•Glides (Semivowels) -Lingual-alveolar /j / & bilabial /w/ •Liquids -Retroflex /r/ & lingual-alveolar /l/ •Constriction insufficient for Venturi effect & frication noise •But are consonants despite being relatively unconstricted & presence of formants but can not be syllabic nuclei •Central stream of airflow except /l/ •Like fricatives, lip rounding/protrusion important feature for some (not just /w/ but also /j/ & sometimes /r/ & /l/)
Dimensions and Common Units:
•Magnitude/Amplitude: Pressure (Pascals, Pa) •Frequency: Repetitions per second of wave period (Hertz, Hz) •Period: Duration of single repetitions (Seconds, Milliseconds) •Phase: Location in cycle expressed in circular scale (Degrees) •Wavelength: Length of one repetition in a medium
nasal airflow and acoustics
•Regulated by the velopharyngeal port •Excessive nasal resonance in the acoustic signal is perceived as hypernasality
intra-oral air pressure
•The air pressure within the oral cavity • Dependent upon - Degree of constriction of the phoneme - Intensity
Burst and Aspiration Noise of Voiceless Stop
•The voiced/voiceless categorization is not always straightforward. • VOT= time from release of stop closure to onset of voicing. Can be variable - Pre-voicing: voicing begins just before release - Simultaneous: voicing begins upon release - Voicing begins after air is released • <20 ms = voiced >25 ms = voiceless - Prevoiced, short lag, long lag - Under 20 ms is short lag, while over 40 ms is long lag
Labio-dental /f,v/ & lingual-dental /θ, ð/
•small anterior resonating cavity •Broad constriction •Low energy, broad spectrum