Speech Science Instrumentation Weeks 4-7

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Example of stress

"Declarative vs. interrogative" The BROWNS are the best. The Browns are the BEST. You are going to wear THAT. YOU are going to wear that.

Biofeedback with LPC Spectra:

"Real-Time LPC Response" function of CSL Sona-Match creates a spectrum that changes in real time as speaker produces different sounds. Clinician can make a template to act as target for biofeedback. Contexts for use: Speakers with hearing loss Children with articulation disorder or CAS ESL speakers

Cues for voicing

(Cues that help you perceive voices and voiceless sounds) VOT: (Voice onset time) most important for voiced-voiceless contrast. Duration between release of stop (burst) to voicing of vowel Other factors: Presence/absence of voicing bar during silent gap (determines voiced or voiceless consonant) Loudness of consonant burst (louder for voiceless) Fundamental frequency at onset of following vowel (higher for voiceless) Duration of preceding vowel Longer for a voiced stop compared to voiceless stop- especially in preceding vowel

Falsetto register

- 275 to 620 Hz in men and 490 to 1130 Hz in women - lots of tension provided by CT muscle - folds appear long, stiff, thin, and sharp around the edges - cover is lax, fold ligament is tense - extreme tension prevents folds from adducting completely causing a breathy quality - vocal ligament and body do not vibrate as fully as in modal and pulse registers

Identification vs. discrimination functions:

- Close to the category boundary: overlap of identification function and discrimination scores

Ways the acoustic signal of a glide is distinguishable from a vowel

- duration is longer in transition for vowel to vowel than glide to vowel (glides are heard between 60-100 ms) - glides have more constriction in the vocal tract - formant transitions are more rapid in glides than in diphthongs - no steady state portion of the formant as is seen in vowels

The voiced voiceless distinction in different languages

- english: voiceless = long voicing lag, voiced = short lag - french- prevoicing - Italian, Spanish, and Thai- short VOT for voiceless and no aspiration

Modal register

- full participation of cover and body - shorter than nonvibrating position - rapid onset, brief open phase, longer closing phase, short closed phase - range of 75 to 450 Hz in men and around 130 to 520 Hz in women - PTP = 2-7.5cm H2O

Pulse register

- lowest range of F0 that speakers can produce. - Males and females generate similar F0s, averaging around 48 Hz - folds are short and thick - false folds may come into contact with the true folds - medial edges of the folds loosely adducted - PTP = 2-5.5cm H2O - folds are fully adducted for 90% of cycle, with glottal opening and closing = 10%

Two reasons nasal sounds have a lower intensity than vowels

- mucosal lining of nostrils and antiresonances created by the trapped air in the oral cavity dampen the sound

VOTs for voiced stops

-20ms - +20ms

ban vs. bad

/a/ in ban is nasalized in anticipation of the n

not

/o/ would be nasalized due to effects from the /n/ (carryover)

VOT in development

Big differences between adult and child VOT Using the right VOT to mark voiced-voiceless contrast requires very precisely coordinated control of laryngeal and articulatory structures. Control of voicing tends to emerge late in development; not mastered until after around 6 years of age (Baken, 1996). Voicing neutralization errors are very commonly heard in child speech. Some errors may actually be instances of covert contrast (Macken & Barton, 1980).

Example: /r/-/l/ distinction in native speakers of Japanese:

Black dots = American listeners Curve is more significant (can hear a clear difference) Open dots = Japanese listeners Less significant curve (not as clear of a difference to them) Discrimination function (shown by two numbers on the x axis and characteristic shape) Americans have a steeper angle as to when they were clearly different Japanese do not have a clear distinction and would say the sounds are the same

Second formant frequency

Frequency of the second formant, F2, is determined by the length of the front cavity. We define the front cavity to include the space in front of the point of maximum constriction between tongue and palate. When this cavity is small, it resonates at a high frequency. Therefore, a more anterior tongue position is associated with a higher F2 frequency. The more front the vowel, the higher the F2 frequency. Low front vowels have lower F2 than high front vowels: Recall that tongue body is displaced back into the pharynx in low vowels, increasing the volume of the front cavity.

Duration

Generally changing length of vowel May be volitional or involuntary

Nasal murmur

Generated because there are extra resonances since sound is being filtered through two passages (sides of nasal cavity + nostrils) instead of one Strong, low-frequency energy, with high intensity Nasal cavities do not change size or shape, so resonant frequencies for these sounds are essentially the same (some differences due to different lengths of the closed oral tract)

Biofeedback Using the CSL:

Caution: Different-sized vocal tracts will have different resonant frequencies. Your client's vocal tract may be very different in size from yours, especially in the case of a child client. A template showing your formants may not be a good target for your client. You need to find or make a template from a speaker whose vocal tract is similar in size to your client's. Can ask me for help finding an appropriate template.

Nasal murmur

Characterized by: - nasal formant - antiformants Oral cavity becomes a shunt resonator and creates antiformants by trapping sound in the oral cavity (not allowing it to resonate through the nasal cavity). Nasal formant is low in frequency due to the coupling of space of pharyngeal and nasal cavities

A population to use biofeedback with

Children and adults with hearing loss typically have difficulty producing speech sounds that are not readily visible.

Conclusions: Categorical Perception:

Hear speech categories CP reflects general auditory sensitivities Depends on memory and other processes. Speech and hearing evolved together

Keep vs. kop

High front vowel of /i/ in keep influences the high /k/ sound, whereas /a/ influences low /k/ sound (anticipatory)

Stress

Higher muscle activity Increase in subglottal air pressure Increase in effort, intensity, pitch, duration, and formant pattern More clues to identity of phoneme or word

As vowels get lower in tongue height,

F1 increases

Falsetto register

Falsetto register is produced when the vocal folds are elongated and tightly stretched. Rate of vibration/pitch/F0 is high. At high level of tension, only the cover and not the body of the folds can vibrate--> less complex pattern of vibration. Produces a thinner ("flute-like") sound. Vocal folds may not meet in the middle; can create a breathy quality when air escapes audibly. Falsetto register requires a higher subglottal pressure than modal voice.

Biofeedback for /r/ Misarticulation:

Find or create an /r/ template for your client to use. Cue client to "change up" position of articulators to see if he/she can line up formants with peaks in template. Does pitch matter?- no

Vowel formants

First two formants are most important for speech perception. These formants (F1 and F2) are affected by resonance in two "containers": the pharyngeal cavity and the oral (front) cavity. Remember: Air-filled containers resonate at different frequencies based on their volume. Large = low frequency; small = high frequency Change the size of these containers to change vowel quality.

What sounds are not well visualized with EPG

For mid and low vowels, the tongue does not make contact against another articulator, so these sounds are not well visualized with EPG

Stop Consonants: Aspiration:

For sounds with a long VOT (voiceless), air is flowing rapidly through abducted vocal folds and unobstructed vocal tract. Fast-moving air creates weak turbulence noise in the glottis. Sounds like an /h/ fricative Called aspiration noise Aspiration is less compatible with sounds with short or negative VOT (voiced). Vocal folds are already close together when stop is released, impeding the rush of airflow.

Coronal view of tongue with ultrasound

Look at lateral bracing and central groove- important for production of /r/ and strident fricatives

Siz instrumental measures Zemlin (1998) suggests collecting in an initial evaluation of voice client

Maximum frequency range (MPFT) Children: 2 octaves Adults: 2.5-3 octaves Speaking F0 (SFF) Children: 300 Hz Adult females: 200 Hz Adult males: 100 Hz Maximum phonation time MPT) Children: 10 seconds Adults: 15-25 seconds Minimum-maximum intensity at various F0 levels (dynamic range) Ability to vary intensity by 20-30 dB in the mid-frequency range Periodicity of vibration Jitter less than 1% Noise generated by turbulent airflow (HNR) Normal amount of additive noise in signal

Using EPG for intervention

Most often used for children with functional articulation disorder, children with repaired cleft lip/palate. CLEFTNET = UK network providing access to EPG technologies Preliminary descriptive studies have shown promising results. However, high-quality evidence demonstrating the efficacy of EPG therapy in controlledstudies has not yet beencollected. Breakdown of etiologies treated with EPG biofeedback in existing literature. Source: Speech Science Research Centre, Queen Margaret University College

Antiformants (antiresonances)

Frequencies where acoustic energy has been damped Look like very weak formants Happens because nasal cavity is very sound absorbent Nasal and oral cavities resonate together; this results in a loss of energy (resonances of each can cancel each other out) Sound waves traveling through are damped at the frequencies of the antiformants Nasals are highly damped; this weakens intensity of formants of surrounding vowels Very complex sounds

Liquids /l,r/

Similar to glides in: Speed of articulatory movement Degree of oral tract constriction Action of the VFs (vibrating) /l/ Tongue tip contacts alveolar ridge, but openings remain on sides of tongue F1 is relatively low (oral cavity constriction); F2 and F3 are similar to those for midrange vowels Rapid formant transitions /r/ In English, F3 drops sharply to about 1,300 Hz (for males) Other /r/-colored sounds (like "schwar") also show this lowering of F3 F1 and F2 are similar to /l/

articulatory undershoot

Situation in which articulators fail to achieve the appropriate articulatory target. (too little tongue contact)- common for Parkinson's Disease- can be misread in typical speakers as a fast speaking rate

Consonants:

Sound classes: divisions of speech sounds based on their phonetic properties = a type of consonant Sonorants Produced solely by VF vibration Vowels, diphthongs, nasals, liquids, glides Obstruents Characterized by aperiodic noise within the oral cavity stops, fricatives, affricates

Discrimination functions

Were those the same or different? Two tokens presented, you say same or different Two pairs of numbers on the x axis (first is first presented, second is second presented) Pairs of 0-5, 5-10, 10-15- response would be "same" ("ba"- lower VOT) Pair of 15-20, 20-25, 25-30 = "different" (close to the category boundary) Pair 30-35, 35-40 = "same"- ("pa"- higher VOT)

Simultaneous onset VOT

articulatory release and voicing happen at the same time

Boundaries shift with the

context

Vowel perception is not categorical, but

continuous

Hypernasality on a spectrogram

darker band of low energy- the nasal murmur

VOT not a useful cue in utterance final position because

final stops are unreleased in English- primary acoustic cue is length of vowel - vowel is longer before voiced consonant, and shorter before unvoiced

Coarticulation

individual segments of words or individual phonemes influencing the phonemes they are adjacent to - two or more articulators move at virtually the same time to produce two or more different phonemes simultaneously

"Speech can't be totally '?', because different languages have different ??."

innate, phonetic inventories

Distinguishing acoustic characteristic of /r/

lowering of F3

Aspiration

noise caused by air escaping the VFs occurs in voiceless stops after the burst- non compatible with short VOTs (burst is present in both voiced and voiceless sounds)

Speakers who have difficulty changing pitch

pathologies or autism

Duty cycle of vocal folds

phases of a vocal fold vibratory cycle: phases include the interval during which the vocal folds begin to close, the interval during which they are maximally closed, the interval during which they begin to open, and the interval during which they are maximally open.

Burst

short (10-30 ms in duration) Random activity along multiple frequencies

Covert contrast

subphonemic difference between two sounds that is not perceptible to adults The child produces a measurable, reliable distinction between two sounds/categories. However— Child does not realize contrast in the same way as adult speakers Adult listeners do not readily hear the contrast as child produces it. A sound intermediate between two sounds

Segmentation problem

the listener's problem of dividing the almost continuous sounds of speech into separate phonemes and words

VOT is defined as

time between the release of the articulatory blockage and the start of phonation (coordination between laryngeal and articulatory systems)

Negative VOT

vocal folds are vibrating before articulatory release takes place

Glides (/w,j/)

/w/ Shares articulatory similarities to the vowel /u/ Lips very rounded, more than for /u/ Back tongue constriction, but tongue can be more forward than /u/, depending on the following vowel F1 and F2 begin at values similar to those for /u/ /j/ Shares articulatory similarities to the vowel /i/ Oral cavity constriction, like /i/, but can vary somewhat depending on the following vowel F1 and F2 begin at values similar to those for /i/ For both: All formants (F1-F3) quickly shift toward the formants of the following vowel

Methods used to study speech perception in infants- Development in Perception Early Capacities, Rapid Change- Vihman

1. HAS high amplitude sucking technique 2. visually reinforced head turn (response to auditory stimulus)

Categorical perception functions (2)

1. identification 2. discrimination

Three acoustic properties that distinguish stressed from unstressed syllable

1. increase in F0 2. Increase in intensity 3. Increase in duration

Three ways the infant directed pattern (expanded vowel triangle) facilitates speech language acquisition

1. increases the acoustic distance between vowels and makes them more distinct from one another 2. hyper-articulation of vowels produces distinct phonetic categories 3. greater variety of instances representing each vowel category without overlapping the categories

Two examples of sound contrasts perceived categorically

1. place of articulation for stops and fricatives 2. VOT

Intonation contours

1. questions: rising F0 at end of utterance 2. statement: falling F0 at end of utterance 3. wh-questions: rise and then falling F0 issues commonly seen in childhood apraxia of speech

Features of nasals

antiformants nasal murmur

Vowel Classification:

Vowels are categorized in two major dimensions based on the placement of the tongue: Tongue height Tongue advancement or backness

Current research regarding relationship between speech production and perception in infants

1. the speech stream is decoded by the auditory system 2. The infant's initial auditory biases are only gradually shaped into phonetic categories derived from the particular affordances of the ambient language. 3. neither 'learning' nor 'maturation' need be invoked to account for sensitivity to speech sounds in the first six months of life If a child has a hearing loss and, as a result, his or her receptive and expressive language skills are delayed, then it is expected that the child will also present with reading difficulties. Even a mild hearing loss can have an effect (Wake, Hughes, Poulakis, Collins, & Rickards, 2004).

VOTs for voiceless stops

25ms-100ms

Formant transitions

A formant transition is a slope. Formant transition = m ∆ Formant Frequency/∆Time In equation form: (f1 - f2)/t1 - t2) Slope index = hertz per millisecond (Hz/ms) Formant transition = (frequency at start) - (frequency at beginning of vowel steady state) / (time [in sec] at start) - (time [in sec] at beginning of vowel steady state)

How is EGG measure collected?

A high- frequency signal of very low current is generated and passed through two surface electrodes held in place with a Velcro band at either side of the person's thyroid cartilage. Because tissue conducts electricity well, when the vocal folds are closed, the resistance is low and the current passes easily from one electrode to the other. However, when the vocal folds are open, there is a relatively large body of air be- tween them, creating more resistance to the flow of current from one electrode to the other. The changing of resistance as the glottis opens and closes is displayed on a screen as a waveform, with time along the horizontal axis and amplitude of electrical voltage along the vertical axis. The waveform is called the Lx wave; it reflects the surface area of contact of the vocal folds (Figure 5.2). As the vocal folds close during vibration, the resistance to the electrical current decreases and the amplitude of the waveform increases. As the vocal folds separate during vibration, the resistance to the electrical current increases and the amplitude of the waveform decreases. Thus, the Lx waveform produces a record of vocal fold vibration dur ing phonation.

Electropalatography (EPG):

A way to study the placement of the tongue during articulation. Speaker is fitted with a pseudopalate covered in electrodes. Electrodes respond when contacted by speaker's tongue.

Evaluating and Treating Prosody:

Abnormal patterns of stress or intonation can be found in various clinical populations: Speakers who have difficulty with motor coordination to time stress correctly (e.g., acquired apraxia of speech, childhood apraxia of speech) Speakers who have difficulty changing pitch or producing a normal pitch range (vocal pathologies like nodules or paralysis; dysarthria, e.g., secondary to Parkinson's disease) Autism spectrum disorder Also used in accent modification for L2 speakers of English

Tallal's temporal processing deficit hypothesis

According to Tallal et al. (1996), children who are language-learning impaired often cannot identify fast elements, such as formant transitions or spectral information that are embedded in ongoing speech. If a child has difficulty perceiving the phonemes of the language, then the child does not have access to complete language input from the environment. This can interfere with higher-order language functions, such as phonological process- ing skills, morphology, syntax, and semantics.

Acoustic characteristics of diphthongs

Acoustic characteristics of diphthongs: Like vowels, characterized by first three formants Characterized by formant transitions: changes in formant frequency caused by shifting the articulators Have a steady-state portion, a formant transition, and then a second steady-state portion

Lack of acoustic invariance

Acoustic signal is highly variable due to: Different speakers (interspeaker variability) Different movements of the same speaker (intraspeaker variability) Different speaking rates Different phonetic contexts

Affricates

Affricates = stop + fricative Stop closure is shorter Fricative turbulent noise is shorter Characteristics of both Turbulence in course of release

Why would you have categorical perception?

Allow listeners to "ignore" irrelevant variations in speech signal (Would make speech difficult to decode)

Vowels vs. consonants

American English two major classes Vowels Consonants How do they differ? In articulation Vowels: relatively open VT; low constriction Consonants: much greater degree of constriction: narrow or complete closure in the VT

Sagittal view of tongue with ultrasound

Anterior vs. posterior (/k/ vs. /t/)

Types of coarticulation

Anticipatory (backward effect/regressive) Right to left: speech sound is influenced by a following segment E.g., spoon Carryover (forward effect/progressive) Left to right: speech sound is influenced by a preceding segment E.g., dogs

Accents/ESL

Are there characteristics that are helpful in identifying different languages? Can look at temporal features, intonation patterns, frequency characteristics (Arsland & Hansen, 1997); several languages Helpful features were VOT Word-final stop closure duration In ESL clients Can practice vowel reduction, etc., via spectrogram Can practice pitch contour change via pitch contour Can practice duration effects via waveform/spectrogram

Coarticulation makes it easy!

Articulatory task solved No more movement than necessary Blueprint for movement at faster rates

Articulatory undershoot/reduced vowel space

Articulatory undershoot/reduced vowel space area is common in speakers with Parkinson's disease. Speech of PD patients is often perceived as abnormally rapid, even when measurements reveal that rate is within normal limits. Explanation: In typical speakers, vowel space is compressed in rapid speech. We learn to associate articulatory undershoot with rapid rate. It is essential to know whether a given patient's intelligibility problem is due to rate or undershoot—use instrumental measures!

Covert contrast

Black line is release burst Normal speaker will have 0-20ms for voiced sounds and 30-100ms for voiceless A speaker with no contrast will have 0 ms for both voiced and voiceless sounds (they can't hear the difference between the two) and should probably receive therapy Covert contrast means that there is a difference between the voiced and voiceless sounds, but they both happen before the burst (so they will both sound voiced) and it's too small of a difference to hear, but there is a difference- and this will clear up on its own (doesn't require therapy) Covert contrast: The child produces a measurable, reliable distinction between two sounds/categories. However— Child does not realize contrast in the same way as adult speakers Adult listeners do not readily hear the contrast as child produces it. A sound intermediate between two sounds Covert contrast has been found for wide range of child processes: e.g. voicing neutralization (Tyler, Figurski, & Langsdale, 1993). Velar fronting Palatal fronting Since errors involving covert contrast are likely to resolve on their own, it may be more efficient to focus therapy on errors that are likely to resolve on their own, it may be more efficient to focus therapy on errors that do not involve covert contrast Better treatment outcomes for children with covert contrast rather than no contrast Covert contrast occurs in children because there is less laryngeal control The child does not realize the contrast is different

Breathiness and inefficiency

Breathiness is an inefficient form of phonation, with a limited dynamic range because less subglottal pressure builds up when the vocal folds do not adduct properly. In addition, a person with a breathy voice often uses three to four times the normal amount of air per second during phonation.

Biofeedback for Intonation:

CSL Real-Time Pitch can be used to train intonation patterns. Open clinician's model of intonation in top window. Speaker tries to replicate intonation pattern with real-time pitch track inbottom window. Praat can be used in a similar way, but it lacks the real- time component. Intonation pattern of a second language can be one of the most challenging aspects to master. Hardison (2004): English speakers learning French practiced intonation with real-time visual biofeedback. Compared visual representation of own pitch contour versus native speaker productions using software similar to CSL. Hardison (2004) reported significant gains in observers' ratings of prosodic appropriateness following three weeks of training.

Speech (speaking rate)

Can be calculated in syllables or word/sec Across and within speaker differences Effect = In fast rate, movements of articulators overlap more in time Durations of some sounds reduce more than others Durational proportions change

Articulation disorders

Can use biofeedback in therapy for developmental articulation disorders Highly variable based on many factors Motor impairment Cognitive contribution

Hearing impairments

Can vary greatly within same disorder (not untrue for other disorders) May observe reduction in range of F0 •Highly variable Amplitude: high/low For hearing/CAPD/comprehension-impaired patients Cues in signal may be heard but not processed Dependent on fewer cues May miss cues in colloquial/informal/ conversational

Stops

Caused by a temporary blockage of air in the vocal tract Four characteristic features: Silent gap: time during which articulators are blocking off oral cavity and oral pressure is building behind the block For voiceless stops: nothing on spectrogram; no VF vibration For voiced stops: might see a voice bar: low- frequency energy corresponding to VF vibration Release burst: articulators release the block and air forcefully escapes from the oral cavity Can be energy at a broad range of frequencies Very short period: 10 to 30 ms Longer and more forceful for voiceless stops d/t aspiration: Formant transitions Voiced stops: formants superimposed on the transient noise Voiceless stops: no formants over the noise Voice onset time Time between the release of the articulatory blockage (beginning of the burst) and the onset of VF vibration for the following V Longer for voiceless stops than for voiced An important cue for perception of voiced vs. voiceless stops

Fricatives

Caused by turbulent airflow due to air being forced through a small constriction Energy over a broad range of frequencies Energy is continuous (less transient than stops) Aperiodic sound BUT...the aperiodic sound is resonated as it passes through the vocal tract Strident sounds: /s, z, ∫/ Sound resonates in the front cavity (between the constriction and the lips) Very small front cavity = very high frequencies will be amplified Frequencies above 4,500 Hz are most amplified for /s/ and /z/

Ultrasound for intervention

Children as young as 6 can be taught to identify the surface of the tongue on the ultrasound image. Ultrasound has been used to cue appropriate tongue placement in speakers with hearing loss (Bernhardt et al., 2003). More recently, ultrasound imaging has been used to for hard-to-treat articulatory errors, especially /r/ misarticulation (Adler-Bock et al., 2007; McAllister, Byun & Hitchcock, in prep). /r/ has a complex articulatory configuration. Crucial tongue constrictions are concealed inside oral cavity. /r/ features lip rounding plus two major tongue constrictions: Anterior tongue is raised near the palate. Tip can be bunched or curled back (retroflexed). Tongue root constriction narrows the pharyngeal cavity. Treatment typically focuses on anterior constriction, but new evidence suggests that pharyngeal narrowing may be most critical component (Hamilton et al., 2012; Klein et al., submitted). Ultrasound biofeedback treatment: Familiarize child with tongue shape for /r/. Present an image of the target tongue shape. Child attempts to match target while viewing ultrasound image. Important to find tongue shape that is most natural for child's vocal tract. Use video capture software so you can determine whether tongue shape during closest /r/ approximations is closer to retroflex or bunched. Can trace best approximation on sheet protector to use as template.

Juncture

Combination of changes in stress and duration can cause change in meaning E.g., a name vs. an aim, contest can mean either "games" or a "challenge" depending on context

Case Study 1:

DG is a 10-year-old male diagnosed with cerebral palsy. Speech intelligibility is decreased. You suspect that breathy vocal quality and reduced vocal volume are contributing to his intelligibility issues. What data would you collect to enhance your understanding of DG's speech difficulty? Identify both the task you would use (e.g., sustained vowel, reading sample, etc.) and the measurement you would take (e.g., average F0, etc.). How would you track DG's progress over the course of intervention?

EPG

Differences in place and manner are clearly revealed with EPG. Can be used diagnostically and for biofeedback intervention.

Obstruent and sonorant consonants

Different degrees of constriction yield different manner classes within the obstruent category different manner classes within the obstruent category. Stops: Complete cessation of airflow with a turbulent release. Fricatives: Continuous turbulent airflow. Affricates: Stop closure is released into fricative airflow.

Transitions Between Sounds: Diphthongs:

Diphthong /a/ => /I/. Note the diphthong is produced slowly enough that a steady state for each vowel is visible. Speech at normal rates may show only transition.

Coarticulation

Do we produce phones in isolation? Do we preplan individual phones, one at a time, during speech production? This would be physiologically impossible. "cat"≠[k]+[ᴂ]+[t] We speak at approximately 170 words per minute. If each word has three to four phones, this is about 10 phones per second. Coarticulation: The influences of the articulation of one sound on the articulation of other sounds in the same utterance Different contexts à different articulatory movements —> different acoustic characteristics The sound remains the same phoneme in different contexts

Nasalized vowels

Downward movement of the velum starts before the oral tract is occluded. Velum moves down about 100 ms before the oral occlusion and goes up about 100 ms after the release of the occlusion. Vowels preceding and following nasals are nasalized for about 100 ms. This adds extra resonances and antiresonances to portions of the vowel. E.g., the "a" in man and mat are nasalized

Voicing: Other cues

Duration of preceding vowel (longer before a voiced stop) is the best cue of voiced/voiceless contrast for a stop in word- final position.

Biofeedback with EPG:

EPG can also be used to provide biofeedback intervention. Client and clinician both wear pseudopalates. EPG readouts are displayed on parallel screens. Clinician cues patient to match clinician's pattern of tongue- palate contact for a particular target sound. Goal is to make correct production automatic, so it can be maintained without EPG feedback.

Instrumental Assessment of Voice Quality: Electroglottography:

Electroglottography (EGG) is a noninvasive way to assess the degree of contact between vibrating vocal folds. Draws on principles of electrical conductivity. Very low current is run between electrodes on either side of the thyroid cartilage. Human tissue conducts electricity more efficiently than air. More current is transmitted when vocal folds are closed. Less current is transmitted when folds are open. Changes in resistance are displayed as a waveform (the Lx wave). Peaks on the Lx waveform correspond with points of maximum closure. Troughs correspond with open vocal folds.

Electropalatography

Electropalatography (EPG) uses a system consisting of electrodes that are mounted on a thin acrylic plate called a pseudopalate. A pseudopalate is custom-made to cover a speaker's hard palate and upper teeth. A surface electrode is attached to the person's wrist. This electrode generates a small, safe charge that the speaker does not feel. A current flows to the pseudopalate electrodes when the speaker's tongue makes contact with them. The palatometric contacts can be relayed to a monitor to provide visual feedback of the patterns of contact between the tongue and palate (Figure 7.1). Contact patterns are visualized in real time so this technology provides instantaneous feedback regarding the speaker's articulatory

Consonants vs. vowels

How do they differ? In articulation Vowels: relatively open VT; low constriction Consonants: much greater degree of constriction: narrow or complete closure in the VT In acoustics Vowels: More sonorous; characterized by periodic VF vibration, harmonic structure, and formants Greater duration, greater amplitude Consonants: Less sonorous; while there may be VF vibration, not always characterized by harmonic structure and formants (they have other defining features) Aperiodic noise caused by pressure changes in the oral cavity can also be a source for the sound (instead of or in conjunction with VFs) Shorter duration, less amplitude

Inverse function

Identification function: As the percent "pa" response increases, the VOT increases As the "ba" response increases, the VOT decreases Green line is "ba" response, red is "pa" response Where the two lines cross is where it's hard to distinguish which it is (category boundary) Category boundary in this example is 25 ms

When conducting acoustic biofeedback with a child

Important to consider shapes of vocal tract are different between adults and children. The pitch is not important.

Intonation

Imposed on respiratory cycle Less breath support available at end Raising pitch at end —> more VF tension Changing intonation can change meaning and may signal attitude and feelings Rising inflection can signal difference between statement and question

Round vowels

In English, most back vowels are rounded, while all front vowels are unrounded. The distinction between front and back vowels is based on F2 height. Front vowels have a short front cavity and a high F2; back vowels have a long front cavity and a low F2. Rounding the lips makes the front cavity longer and F2 even lower. So rounding makes a back vowel "more back"; easier to hear the difference between front and back vowels.

Identifying high front vowels

In a high front vowel like /i/ or /ɪ/: Tongue is high (large pharyngeal volume), so F1 is low. Tongue is anterior (small volume in front cavity), so F2 is high. If F1 and F2 are very far apart, think high front vowel.

Low back vowels

In a low back vowel like /ɑ/ or /ɔ/: Tongue is low, so F1 is high. Tongue is back, so F2 is low. If F1 and F2 are very close together, think low back vowel.

Cross-Language analysis of phonetic units in language addressed to infants- Kuhl et al.

In the early months of life, infants acquire information about the phonetic properties of their native language simply by listening to adults speak. The acoustic properties of phonetic units in language input to young infants in the United States, Russia, and Sweden were examined. In all three countries, mothers addressing their infants produced acoustically more extreme vowels than they did when addressing adults, resulting in a "stretching" of vowel space. The findings show that language input to infants provides exceptionally well-specified information about the linguistic units that form the building blocks for words.

"Theories" accounting for cross-language differences in perception:

Infants learn the phonetic contrasts of their native language from scratch. "Forget" the ones that they don't hear around them. Boundaries between phonetic categories shift depending on what they hear. Infants lose the ability to discriminate nonnative contrasts Adults maintain the ability to hear some nonnative distinctions People can "relearn" nonnative distinctions

Multiphasic closure- pulse register

Instead of opening and closing the glottis once per cycle, the vocal folds may approximate and separate partially once, twice, or three times before completely adducting

Instrumental assessment of voice quality

Instrumental Assessment of Voice Quality: HNR: Harmonics-to-noise ratio (HNR): Recall that a harmonic is a whole-number multiple of the fundamental frequency. Harmonics are the product of periodic vibration of the vocal folds. The human voice also features some aperiodic noise (irregular vibration, noise of air escaping). HNR compares the loudness of the harmonics of the vocal source versus extraneous noise. Higher = better. Usually reported in dB.

Vocal fry in the news

Is use of vocal fry increasing, especially among young women? 'Vocal fry' creeping into U.S. speech Vocal fry a new language fad mainly among college females Vocal fry and young women: Are they trying to sound like Ke$ha and Britney? 'Vocal fry': The new craze in talking inspired by Britney Spears, Ke$ha and Kim Kardashian Bad science reporting! Original study finds only that "vocal fry register may be common in some adult SAE speakers." No comparison of males vs. females, old vs. young speakers. Is vocal fry increasing among young females? Maybe, but we won't know until someone studies this question systematically.

Diagnostic Use of EPG:

It can be hard to tell where child is placing the tongue during incorrect production of a sound. Child speech errors are often described in terms of neutralization or substitution (e.g., children with velar fronting neutralize /t/-/k/ contrast). However... Covert contrast (review): Child speaker produces a measurable, reliable distinction between two sounds, but adult listeners do not readily perceive the contrast the child produces. Child may have more knowledge of a sound contrast than we think based on transcription alone. Important to consider level of knowledge when setting treatment targets. Gibbon (1999): EPG display shows covert contrast between alveolar and velar targets in a child with alveolar backing. Target /d/ (top) and target /g/ (bottom) were both transcribed [g].

Distributional learning

Learners are hypothesized to obtain information about which sounds are contrastive in their native language from the distributions of sounds they hear

Lexical vs. sentential stress

Lexical: PREsent and preSENT Sentential: YOU are going to wear that vs. you are going to wear THAT

Psychiatric disorder

Limited investigation Reduced F0 range Reduced volume May help identify

See vs. sue

Lip rounding is present for /u/ vowel in sue also present for /s/- anticipatory (lips spread in anticipation for /i/ in see)

Nasals /m, n, ŋ/:

Nasals are also sonorant. Oral cavity is completely occluded. Velum (soft palate) is lowered; there is movement.

Acoustic properties of breathy voice

Noise in the voice, acoustically called additive noise or spectral noise, can be heard as breathiness, hoarseness, roughness, or any combination of these perceptual attributes. A small amount of noise in the glottal source may contribute to a slightly breathy quality. - less periodic acoustic signal - noticeable noise above 2k Hz -

Acoustic properties of rough/hoarse voice

Noise in the voice, acoustically called additive noise or spectral noise, can be heard as breathiness, hoarseness, roughness, or any combination of these perceptual attributes. A small amount of noise in the glottal source may contribute to a slightly breathy quality. More turbulence might be heard as huskiness, where- as a great deal of turbulence is usually perceived as a hoarse voice. - Vibration of vocal folds is excessively aperiodic - amount of spectral noise resulting from turbulent air flow through the glottis - noise of hoarse voice is more prevalent below 1k Hz

Nonstridents

Nonstridents: /f, v/ Less intense energy than strident sounds No front resonating cavity Low intensity energy spread out across a broad range of frequencies (no specific frequency range) Fricatives can be voiced or voiceless: voicing shows as a voice bar at very low frequencies .

Some Diphthongs of English:

Notice: Initial steady-state portions for each diphthong pair are similar and resemble the corresponding vowel. Formant transitions occur, and the second steady-state portions of each pair resemble different target vowels.

Measuring vowel formants

Patients with motor speech disorder often show a high degree of articulatory undershoot: Their articulator movements fall short of normal target location (Ziegler et al., 1988). Area of the vowel space is reduced. Imprecise vowels can have significant negative impact on intelligibility Measure vowel formants (F1, F2) to compare a patient's vowel space against typical expectations.

Categorical Perception:

People tend to hear speech categories, rather than the small acoustic variations across exemplars of the same phoneme. When two people say /p/- the two sounds will have variability on a spectrogram, even though perceptually they sound the same When it passes a certain value, we categorize it as a different sound (there's a specific point where the sound switches over) Commonly examined using two tasks: Speech discrimination tasks Speech identification tasks Pat Kuhl's TED Talk https://www.ted.com/talks/patricia_kuhl_the_linguistic_genius_of_babies

Assimilation

Phoneme is "changed" into a different phoneme due to the influence of the phonetic context A substitution of another phoneme for an intended phoneme (or phonemes)

Classification of consonants

Place Manner Voicing

Vowel space on formant axis

Plotting F1 vs. F2 (assuming bottom left corner is 0) F1 on y axis (constriction/height, high to low) F2 on x axis (front to back, high to low) What is the similarity between the two figures?

Suprasegmentals

Prosody Includes different parameters <— not independent of each other Intonation Stress Duration

Vocal fry register

Pulse/fry register is produced when the vocal folds are at their shortest length. Rate of vibration/pitch/F0 is low. Vocal folds are slack and hang loosely together. Vocal folds remain closed for around 90% of each cycle of vibration (versus around 50% in modal voice). Air can only "bubble up" irregularly between margins of the folds. Pulse register is produced at a lower subglottal pressure than modal voice. Normal speakers use fry register at the end of a sentence, when lung volume is low (à low subglottal pressure). Speakers with less flexible vocal folds (little mucosal wave) may need to use fry register to produce voice. Vocal fold nodules or other damage Older speakers Is vocal fry pathological? Lots of debate on this topic. Vocal fry can be a marker of pathology (e.g., nodules), but it is not intrinsically pathological. No evidence that vocal fry is damaging to the vocal folds.

Voice disorder

Quantify fluctuations: Pitch—Jitter Amplitude—Shimmer Harmonic-to-noise ratio (HNR) Pitch range Amplitude range Using Praat: Voice Profile

Clinical Application: Electroglottography in PD:

Ramig and Dromey (1996) used EGG to compare outcomes of two treatments for dysphonia in Parkinson's (longer closed interval = improved vocal fold adduction): Respiratory treatment (RT), a program to stimulate activity of the respiratory musculature to generate increased volumes and pressures for speech LSVT, which targets increased vocal fold adduction as well as respiration Finding: VF adduction increased in the LSVT treatment group but not in the RT group. Recommend treating both respiratory and phonatory aspects.

Diphthongs

Rapid blending of two separate vowel sounds to create one sound Quality change resembles a gliding movement Involves a quick shifting of the articulators /aɪ /, / aʊ /, / ɔɪ /: Lip posture Tongue advancement Tongue height

Acoustic differences between infant and adult directed speech (Kuhl et al)

Rather, formant frequencies were selectively increased or decreased to achieve an expansion of the acoustic space encompassing the vowel triangle.American English mothers showed increased F2 in /i/, decreased F2 in /u/, and increased F1 and F2 in /a/ (Fig. 1A). As expected, significant increases in fundamental frequency and vowel duration were observed in I speech in all languages

Dysarthria

Remember there can be tremendous variation as well Type of dysarthria will impact types of cues (e.g., flaccid vs. spastic) E.g., Parkinson's Continuous voicing Increased rate In patients with motor speech disorders Pitch declination may be faster Sentence/utterance boundary signals (final lengthening, final pitch, new breath group, etc.) may be deceptive May not coordinate need for respiratory support with upcoming planned utterance

Why is it not possible to identify a vowel based on the absolute magnitude for the formant frequencies

Research has shown that the formants of a particular vowel are not all equal in terms of how much they contribute to a listener's perception of that vowel.

Case Study 2

SA is a 63-year-old male with Parkinson's disease, diagnosed in 2009. Chief complaints: Reduced vocal loudness Difficulty changing pitch and loudness Breathy vocal quality What data would you collect to determine whether SA's complaints are quantitatively accurate? Identify both the task you would use (e.g., sustained vowel, reading sample, etc.) and the measurement you would take (e.g., average F0, etc.). How would you track SA's progress over the course of intervention?

Ultrasound Imaging for Diagnosis and Intervention:

Sagittal view of tongue (side view) can be used to visualize anterior-posterior contrasts (e.g., /t/ versus /k/). Coronal view can be used to look at lateral bracing, central groove. Central groove is important for correct production of strident fricatives and may also be present in /r/. Database of typical articulation seen with ultrasound

F0 as a Marker of Stress:

Sentential stress: indicates what is new or important information in the sentence

Ultrasound

Sound waves are reflected and present as a moving image For speech production purposes, the transducer is held underneath the speaker's chin to produce images of the tongue. Two different views can be obtained. The sagittal view displays a side view of the tongue showing tongue height, tongue advancement, and tongue slope; the coronal view visualizes a cross section of the tongue from one side to the other, providing information regarding midline grooving or depression/elevation of the sides of the tongue (Bernhardt, Gick, Bacsfalvi, & Adler-Bock, 2005; Bernhardt, Bacsfalvi, Gick, Radanov, & Williams, 2005).

Voice quality

Speech pathologists are often called on to indicate whether or not a patient has a normal voice quality. There is no single accepted definition of normal voice quality, which varies with age, gender, language, etc. Still, most clinicians use perceptual judgment to label different voice qualities such as breathy or rough/hoarse. Remember that perceptual impressions of vocal are subjective and unreliable.

Ultrasound Imaging of Speech:

Spoken and Written Language Lab (Buchwald) Biofeedback Intervention for Speech Lab (McAllister Byun) Ultrasound is another way to view articulator movements that are concealed inside the oral cavity (Stone 1991). Transducer emitting ultrasound waves is placed in contact with the skin below the mandible. Reflection of sound waves occurs when a border between media of two different densities is crossed. E.g., tissue of tongue versus air above tongue Video display allows real-time visualization of articulator movements (tongue only).

Strident vs nonstrident fricatives

Strident: intense spectral energy above 2000-3000 Hz, longer Nonstrident: low intensity spectral energy over wide band of frequencies (picture is je vs. th)

The Effectiveness of Oral Resonance Therapy on the Perception of Femininity of Voice in Male-to-Female Transsexuals Lisa Carew, Georgia Dacakis, and Jennifer Oates

Ten male-to-female transsexuals participated in five sessions of oral resonance voice therapy targeting lip spreading and forward tongue carriage. Acoustic analysis of recordings made pre- and posttherapy found that participant formant frequency values (F1, F2, and F3, from the vowels /a/, /i/, and /[/), as well as fundamental frequency (F0), underwent a general increase posttherapy. F3 values, in particular, increased significantly posttreatment. Trends in listener ratings of these recordings showed that the majority of participants were perceived to sound more feminine following treatment. Participants' self-ratings of their voices pre- and posttreatment also indicated that participants perceived their voices as sounding more feminine and that they were more satisfied with their voices following treatment. The present study supports the findings of previous studies that have demonstrated that resonance characteristics in male-to-female transsexuals can be changed to more closely approximate those of females through oral resonance therapy. This intervention study also demonstrates that a spontaneous increase in F0 is achieved during the course of therapy. Further, this study provides preliminary evidence to suggest that oral resonance therapy may be effective in increasing femininity of voice in male-to-female transsexual clients.

Place of articulation and formant transition

The F2 transition changed in small, evenly spaced steps from a low-onset frequency and rising transition (typical of a labial stop), through intermediate- onset frequencies (typical of an alveolar stop), to a relatively high-onset frequency and falling transition (typical of a velar stop).

How can EGG be used in diagnosis/treatment of phonatory disorders?

The Lx waveform can be interpreted in several ways. First, by counting the peaks in a specific interval of time in the Lx waveform, it is possible to deter- mine the speaker's F0 very precisely. So, for example, a greater number of cycles per second can indicate that the person is using falsetto. A reduced number of cycles per second can indicate pulse register. Second, by evaluating the shape of the waveform, one can make judgments about the way in which the vocal folds are opening and closing. A longer-than-normal separation time between the folds may indicate breathiness resulting from a greater volume of air passing through the glottis. A longer-than-normal closed time may indicate a hyperfunctional, pressed quality resulting from excessive medial compression. An even, regular pattern of the cycles in the waveform reflects periodic opening and closing of the folds, whereas an irregular pattern shows less periodic vibration, which may sound perceptually like hoarseness.

First formant frequency

The frequency of the first formant, F1, is determined by the volume of the pharyngeal cavity. When the volume of the pharyngeal cavity is small, it resonates at a high frequency. Therefore, a low tongue/jaw position is associated with the highest F1 frequency. The lower the vowel, the higher the F1 frequency F1 increases as vowels get lower

Harmonics to Noise Ratio (HNR)

The higher the value is, the more the harmonic components of the voice predominate over the noise. The lower the HNR is, the more noise that exists in the voice. An HNR of 0 dB reflects equal energy in the periodic and aperiodic components of the vocal signal (Verdonck-de Leeuw, Festen, & Mahieu, 2001)

Phonation Threshold Pressure (PTP)

The minimum amount of Ps needed to set the vocal folds into vibration

Categorical perception

The way that many consonants are perceived is different from how vowels are perceived because many consonants are perceived categorically. That is, if a se- ries of consonant sounds was heard by a listener, with the sounds differing in one acoustic aspect by small equal steps, the listener would perceive some of the sounds as the same phoneme until a boundary was reached. On the other side of this boundary, known as the crossover, the listener would hear the other sounds as a different phoneme (Figure 8.6).

Identifying vowels on a spectrum

We can identify vowels on a spectrogram using the relative heights of F1 and F2. F3 also plays a role, but it is less important.

Voice registers

Three major patterns of vocal fold vibration are used in speech. Modal voice register (normal vocal quality) Falsetto register (also called loft register) Pulse register (also called vocal/glottal fry or creaky voice) Pulse is associated with low pitch/F0, falsetto with high pitch/F0. Speakers vary in how much they use nonmodal voice registers.

What does EGG measure?

Transducers on throat record electrical activity to evaluate vocal fold function

Is categorical perception unique to human beings and to speech?

Unique to humans- No! Chinchillas also have categorical perception of VOT Chart on the left: identification task Unique to speech sounds- No! (3rd chart below) Categorical perception of nonspeech sounds Changes in frequency, intensity, time

Configuration of vocal folds for breathy voice

Vocal folds that do not adduct as tightly as they should are said to be hypoadducted In this case, there is too little muscle force, so the vocal folds do not offer enough resistance to the flow of air. Air escapes between the vocal folds without being converted into acoustic energy. The loss of air creates turbulence as it passes through the vocal folds and adds a noisy, breathy quality to the vocal tone.

Electroglottography

Vocal quality can be characterized based on the distance between peaks in the waveform. In a typical voice, the open and closed portions of the cycle have roughly equal duration. If speaker cannot form a complete seal between the vocal folds (breathy voice quality), peaks will be abnormally far apart Loud or strained voices may show atypically long closed interval. Which Lx wave is normal and which is breathy?

Positive VOT

Voicing happens after the articulatory release (short lag = 20 ms, long lag = 100ms)

Perception in phonological disorder:

What role does perception play in speech sound disorder (SSD) in children? As a group, children with disordered speech production show below-average perceptual abilities (Edwards et al., 2002). Some children with SSD show disordered production with normal perception, but most show some deficits in both production and perception (Shiller et al., 2010). The contrasts that children have difficulty perceiving tend to be the same contrasts they produce incorrectly (Locke, 1980). E.g. a child who neutralizes /t/-/k/ will have difficulty perceiving /t/-/k/, but not necessarily /l/-/r/, etc. Speech Assessment and Interactive Learning System, or SAILS (Rvachew, 1994): Presents recorded words produced by adult and child speakers. Child points to a picture of the word for a correct production and to a picture of an X for an error. ] Children make greater gains in therapy when their regular training is enhanced with SAILS perceptual training. Efficacy of SAILS demonstrated in 3 randomized control trials (Rvachew, 1994; Rvachew et al., 2004; Wolfe et al., 2003). Consider incorporating SAILS (available as a free download) into your treatment plan for a child with speech sound disorder.

Assessing voice quality

Zemlin (1998) identified six measurable parameters that contribute to normal voice quality. Try to assess these properties in your initial evaluation of a voice client. SFF (average/habitual pitch) Maximum phonational frequency range Maximum phonation time Dynamic range Jitter Harmonics-to-noise ratio

Three types of VOT

Zero, negative, positive


संबंधित स्टडी सेट्स

Senior Seminar Quiz 4 (Ch. 33-45)

View Set

Chapter 9: The Integumentary System

View Set

American Government Midterm Exam (Ch. 1 Review)

View Set

Chapter 40: PrepU - Nursing Assessment: Musculoskeletal Function

View Set

Needs, Motivation, and Attitude (BS II)

View Set