core 103: human speech test #2

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

motor theory

"a speech is special theory"Liberman & Mattingly (1985) Perception crucially refers to the production process: Units of speech perception are the action events of speech production. Specialized for language specifically Brain uses cues in the speech signal to decode cognitive units used for speech production Percept is maximally consistent with all the available evidence for the causal events. Visual and auditory information are integrated into a single amodal phonetic percept. Speech perception is not explained by principles that apply to the perception of sounds in general. Biological specialization for phonetic gestures prevents listeners from hearing the signal as ordinary sound, But enables them to use the systematic, special relation between signal sound to perceive the overlapping and continuous articulatory gestures. The puzzles of lack of segmantability and lack of invariance

2. Modularity Theory

"it's not just for speech": Speech is special, but so are some other neural functions Speech processes occur within an anatomically & functionally distinct circuit (dedicated connections among neurons) whose sole purpose is to decipher and generate speech.

How is a 20 ms difference in VOT perceived?

(from 10 to 20, and 40 to 60: can't hear difference in VOT BUT from 20 to 40, can clearly hear difference from [ba] and [pa] Humans have good between-category discrimination Poor within-category discrimination

mutual intelligibility

(if person A understand person B, speaking the same language)

Invariant Property Models

(the look-harder model): maybe there is invariance but we are looking at the wrong thing → the 'look-harder' model If you look carefully enough, it is possible to find some invariant auditory property associated with a phonetic feature. Even though the particular frequencies associated with the specific peaks (formants) may vary across vowel consonants, the overall distribution of energy in the spectrum remains a constant shape for each phase. Somewhat accurate classification (e.g. 60-80%). *But, when (variable) formant transitions are pitted against postulated 'invariant' properties, listeners classification is determined by the variable (formant) cue. (Walley & Carrell, 1985) → people answer on the formant pattern so this is not the dominating feature in recognition *The baby would have to infer from the invariable formant transmissions and make a learge learning leap

results:

*baby loves new things, so get bored once familiar with what they are looking at: THEY LEARNED SOMETHING (looking time longer for the none words) BABY DONT LOOK AT THE WORDS THEY KNOW BUT THE WORDS THEY DON'T KNOW Babies and Stress: Are babies just memorizing acoustic strings? Let's manipulate stress to change the acoustics BIgamuNEpoku They are recognizing a physical thing that they haven't herald before as a familiar sequence; it's really something than just the cound; it's a LINGUISTIC thing

Phonetic percept In McGurk effect:

- auditory and visual stimuli are seamlessly integrated into a sSINGLE PERCEIVED EVENT effortless — mandatory — automatic — unconscious amodal phonetic percept [without any awareness as to where the information came from] → linguistic percept is not specific to one of your senses (auditory, visual) In SWS, speech lacks any of the cues of normal speech→ no one speaks like that Except "coherence." Thus, there is no way to define phonetic categories (consonants and vowels) in exclusively auditory terms.

Types of word execution errors: Types of errors

1) Metathesis/exchange 2) Anticipation (appears too early) and preservation (appears too late)

Auditory theories:

1. Dual-Stage Models 2. Invariant Property Models 3. Phonetic categories == Auditory categories

Auditory theory accounts

1. Dual-Stage Models: Acoustics =sorted into> Phonetics (language) Why are different auditory percepts categorized the same? 2. Invariant Property Models More study will eventually find some invariant acoustic/ auditory property associated with every phonetic category. 3. Purely Auditory models: Phonetic category boundaries are determined by mammalian auditory system Top-down phenomena and human-specific phenomena must be explained post-hoc.

But speech does seem special in many ways...

1. Experiments show that boundaries of phonetic categories are not fixed (i.e., not purely auditory discontinuities); Experience with speech (re)organizes phonetic categories, alter with learning, context, word status, speech rate Many ad hoc assumptions would be required to explain speech perception via auditory properties alone. [Goldstein] 2. Responses to very same stimuli are different when they are perceived as speech vs. non-speech events: E.g.; sine wave speech has different discriminations patters depending on whether perceives as speech or not

Task 1: Identification aka classification/labeling

1. Forced choice identification. Subjects have to categorize sound using one (e.g., of two) choices (e.g., [pa] or [ba]). They will do so unlineraly. Speech signals mean something: require two parties to communicate (not like dimming your lights)

HAS

1. Habituation Phase same stimulus whenever infant sucks 2. Switch stimulus at predetermined threshold (once bored, you are going to switch what they're hearing) different stimulus from Habituation Phase Then either: Dishabituation occurs => increase sucking rate (stimulus perceived as new/different); Or continued same/decrease in sucking rate (stimulus perceived as same/boring) Can't do identitifcation with babies, can only do discrimination

Challenges for motor theory

1. Non-human acoustic processing (we see non-humans having the same sensitivities as humans) 2. People who are pathologically incapable (from birth) of controlling their articulators are able to perceive speech. 3. Pre-babbling infants can perceive phonetic distinctions as adults do. 4. Even if child could develop a mapping from the sounds that it produces to its own gestures, it is not clear how this mapping could be applied to the speech she hears adults produce. Since the establishment of phonetic categories on the basis of acoustic properties is not possible

Conditioned headturn paradigm:

1. Repeated speech sound (e.g., [k'a]) 2. Eventually, sound switched (e.g. [g'a] ) 3. Seconds later, toy turns-on and attracts attention => reward (babies love it! And look at the cool toy) 4. Infants learn that sound switch means the toy is about to turn on. 5. Infants become conditioned to turn head when they hear this change BEFORE the toy turns on. * **Babies learn to anticipate the reward upon hearing the switch.

TWO criteria for CP (to determine if you have CP)

1. The perception of a continuum in terms of categories. AND 2. The reliance on those categories in order to determine whether two stimuli are identical. i.e., the probabilities from the category identification task can completely predict discrimination (tell-apart) performance.

Organization of the lexicon: Activation of word execution: Knowledge of words includes

1. Units of Articulation (sounds) 2. Features of Meaning 3. Grammatical components A word's production is triggered when its sound-meaning network is sufficiently activated. Sometimes this activation is mis-timed

Where is there invariance?

1. We have cognitive tasks or goals that are stable across tokens (instances) of speech units. Where is the invariance in [di] [da] and [du] → all have tongue in the alveolar ridge [bad] [pad] and [bingo] → they all start with a bilabial consonant (lips touch) 2. There are lawful relationships between formant structure (vocal tract resonances) and gestures (vocal tract actions). All on earth, talking in air Some particular shape has some particular resonant frequencies: that is the way things are *both are reliable so linguists that take this perspective have utilized this to solve the puzzle of lack of invariance

What are the building blocks (units) of words? And what evidence tell us so?

1. gesture 2. segment 3. unit: syllable 4. stress foot 5. speech errors

Theories that claim to explainCategorical Perception, Duplex Perception, McGurking, and pop-out phenomena like sine wave speech

1. motor theory 2. modularity theory 3. direct perception

What gives rise to dialects:

1. time 2. isolation (Geographic (islands), Social Segregation)

Words can be described by:

1. which gestures are included in the wordAnd what is the organization of those gestures 2. Same gestures, different organization: ex. Act, task, cat What do you know about how to say a word: you can use this as an example

Task 2: Discrimination

2. ABX discrimination Subjects given two stimuli—A and B—which have different values along the continuum in question. Subjects are then given X, which matches either A or B, and they have to indicate which it matches. (either a repeat of A or B) If subjects cannot tell the difference between A and B, then they only have a 50% chance of matching X correctly. i.e. if they are correct only 50% of the time they are guessing randomly (about chance) If subjects can perceive A and B as different things, then they are very good at matching X.

Headturn Preference Procedure (HPP)

7-month old infants remember patterns after very little experience. Provides a way of breaking into the system of words Allows infant to learn additional cues specific for his or her language: stress patterns phonotactic patterns Babies memorizing not by acoustic strings but by recognizing the transitional consistency between three syllables Doing something abstract Good at picking up patterns of speech around them

coda

= consonants after the vowel

onset

= consonants before the vowe

rime

= vowel + following consonants

This suggests:

=> gradient errors Many errors involving individual gestures will go unrecorded in databases of speech errors since these databases use (segmental) transcription of perceived speech which limits are ability to identify them.

Cue

A bottom-up (e.g., acoustic) dimension or pattern that informs perception Every potential cue is an actual one. No specific cue is necessary to the percept of a given category. Cues may engage in "trading relations" with one another.

Artificial language learning: an experiment paradigm

A made-up language with a small number of "words" (fixed syllable sequences "CV1-CV2-CV3") (consonant vowel) Word sequences form repeating patterns Words presented in varying order; no pauses Let infant hear speech in the artificial language for long enough to test whether they detect the patterns

deletion

A phonological pattern involving segments: French Consonant Deletion

Expletive Infixation

A source of evidence for the importance of the foot in phonological structure comes from the phenomenon of expletive infixation. Demonstrates that any knowledge of language and do systematically because it tells us something about our language Expletive Infixation: the within-word insertion of 'f_ _ _ _ _ _' (in American dialects) or 'bloody' (in British and Australian) dialects. fan- f_ _ _ _ _ _ -tastic (****ing, freaking, effing, *******) This phenomenon was extensively researched and presented in a 1982 article by John McCarthy in the academic journal Language.

The gestures of speech have a lawful relationship (albeit complex!) to the speech signal.

Acoustic patterning in the speech signal help us to—require us to—reconstruct the physical events (movements of the articulators) that create speech. Brain perceives by using sensory signal cues to reconstruct ecologically important events in the world.

Phonetic percept

Acoustic patterns seem to be perceived as speech just when brain can interpret the pattern as resulting from linguistically significant vocal tract gestural actions.

What is stable across all this variability?

Acoustic signal can vary within and across speakers and across contexts (coarticulation). How do you ignore or compensate for the variation? And pay attention to aspects of the signal that are meaningful?

Co-modulation (Louis Goldstein)

Acoustics that move together, go together (are neuro-cognitively bound) => a coherent unit of action is cognitively recognized by acoustics that move together (share a temporal dynamics) The coherence of a signal confirms that it was produced by a vocal tract and allows its interpretability. *can't take a speech and chop it up and re-order it and understand it → coherence important to understand

Additions and deletions

Additions spic and span => spic and splan Omissions/deletions chrysanthemum plants -> chrysanthemum pants

Language specific effects

Adult listeners with different native languages exhibit categorical perception based on their own language's categories. Japanese adults lose sensibility on discrimination between [r] and [j] Japanese babies have this discrimination, lose it overtime

Harmony

All of a word or a certain part of a word must agree in the presence of some particular gesture.

Anticipation (appears too early) and preservation (appears too late)

Anticipation splicing from one tape => splacing from one tape the gates closed => the goats closed *not only are the vowels anticipating, butt on stressed syllable Preservation (less common than anticipation) praised the man => praised the pan phonological rule => phonological fool

Identification

Assign a category Forced choice (forcing them to choose between two categories you gave them) Open response (tell them to write down what they heard)

why?

Babies absorb the statistics of a language Bilinguals must keep two statistics of a language at a time English Babies who were exposed to mandarin for twelve stages were as good as babies from taiwan who have been speaking it for 12 months This means that babies can take the statistics from the language that is in front of them Takes a human being for a baby to take their language statistics (not just from audio or TV) INTERACTION IS KEY (for how long do they have to be exposed→ no direct answer) (how about babies born deaf)

Babies and segmentation

Babies are very good at picking up on the patterns of speech around them. They learn about other cues (e.g., stress, phonotactics) once they can segment some words. ....all underway by 9 months of age!

What about babies?

By 8-months, infants segment words from continuous speech pretty well. (but they don't say their first words until about 12-months) By 4 months can know their name What they hear in the first year is incredibly important vocabulary is very small

English Phonotactics

C=consonant V=vowel in [sCiVCi]

Top-down effects on CP

CP is affected by knowledge of the language language-specific boundaries exhibited knowledge of the words (Ganong Effect)

Experiment: Recognizing patterns

Can infants remember patterns in speech? In this experiment, the baby will hear 2 minutes of speech Test whether infant detect patterns Experimental Technique: Preferential Listening (what do they prefer to listen to) familiarization : two minutes of artificial language Test: one light flashes When baby looks at light, test word is played continuously until baby looks away Experimenter records how long the baby looked at the light

Placement of the expletive is constrained

Can't get => fantas- f_ _ _ _ _ _ -tic Can't get => a-f_______-soluteley

Range of CP phenom

Categorical Perception can be observed (perhaps with training; perhaps without discrimination results) for: speech in non-humans non-speech sounds in non-humans non-speech in humans Does this argue against "speech is special??"

Why is lack of invariance particularly a puzzle for consonants?

Coarticulation: Every formant transition depends on the consonant and vowel. Example: About 8 vowels and 30 consonants yield about 500 C+V and V+C transition patterns Many of the patterns "look" VERY similar Dependent also on where sound is in word, e.g. [p] vs. [ph] Vary with rate of speech, speaker, etc...

A CP experiment with these stimuli:

Combine a duplex perception experiment with a categorical perception experiment (base in one ear and exposed to different stimuli in another) Duplex Percept: Subject hears syllable and shrimp. Then...manipulate just a subject's instructions:

Discreteness

Complex messages are built of smaller parts. Message →phrases→words →segments...

A phonological pattern involving syllable onsets: Language games

Consider English Pig Latin "Speak pig latin" → "eakspay igpay atinlay" PATTERN: move the syllable onset to end of the word and add "ay" to the end NOTE: "latin" becomes: "atinlay" not "alayintay" Pig latin does not target all syllable onsets, just targets the first onset of a word *Language games are another piece of evidence

How are segments affiliated with syllables?

Consider: V1CV2 or V1CCV2 Across all languages: Consonants will prefer to syllabify rightward (i.e. as syllable onsets) VCV => V.CV as long as the language permits words to begin with those segments *can't push [tl] righword, but unless prevented, consonants like to go rightward. *language preference a single consonant

*maybe differentiate place by looking at slope and intercept BUT

Consonant perception example: [d] There is no one-to-one relationship between formant transition patterns and a specific consonant, like [d]... All don't have the same slope

Diversity in American English consonants production

Consonant variation Consonant omission Consonant metathesis Code Switching

In this pattern:

Consonants can't both be nasal → smim, smone not possible in english (knew this but didn't know we know this): those consonants have in common a velum-lowering gesture is what these sounds have in common Consonants can't both be velar → skik, skag: those consonants have in common a tongue body raising gesture is what these sounds have in common Consonants can't both be labial →smiff: those consonants have in common a lip constriction gesture is what these sounds have in common SPAM, SKANK, SKAG, SKUNK (both velar) violates these

Phonological constraints on errors

Constraint: Only units of a similar type exchange with each other: cone phall for phone call (the onsets have exchanged) But we DON'T GET lone caph for phone call Why? What phonological unit is participating in the error?

Parallel Transmission

Continuous and overlapping articulators... Parallel transmission: Acoustic consequences of a constriction are spread through the acoustic output during its formation and release (what is happening at the physical signal level) One 'sound' at many points in time. Continuous and overlapping articulations mean that... Parallel transmission: Acoustic information about multiple linguistic units is being transmitted at given moment in time Many 'sounds' at one point in time.

One way of perceiving:

Continuous perception Equal physical differences are equally perceptible Consider a dimmer switch and brightness or amplitude and the loudness of an acoustic signal

Another way of perceiving:

Continuous signal => Percept as discrete categories → categorical When we look at a rainbow, we tend to see about seven distinct bands of color We know from physics that the wavelength of light that meets the eye changes smoothly from the top to the bottom of the rainbow Continuous signal => percept

Articulatory data on speech errors: Experiment: repetition of phrases with "alternating" consonants

Control: "cop cop" Tongue Twister: "cop top": as sped up, error builds up and tongue rear is intruding in the articulation In some repetitions of cop top: an intrusive tongue-rear raising gesture occurs during the /t/ → happened gradually When large, /k/ perceived But smaller ones not really noticed

But for word execution

Data is (usually) recorded after the fact, not online/real-time → someone made an error and then someone wrote it down IPA or orthography limits what we record We only remember certain types of errors We tend not to notice errors, especially when a correction occurs mid-word

Categorical Perception [CP]

Differences between objects that belong in different categories are accentuated or recognized, and Differences between objects that fall into the same category are deemphasized or unavailable/imperceptible. Categorical perception is another way to help deal with lack of invariance

CP in non-human animals

Do other creatures show categorical perception for human speech signals?- Chinchillas show non-linear perception for VOT with (close to) human infant boundary. Perhaps related to mammalian auditory system? BUT significant training required. And no discrimination results for chinchilas. No completel picture of the chinchillas assessment of their world Crickets: non-mammalian Chirp frequencies are meaningful and important for crickets (threat v possible mate) Nonlinear identification function CP present in crickets BUT cricket's auditory abilities are not directly related to the system that has evolved in humans, but show that "they are solving the same kinds of perceptual problems we are" Does this mean that crickets are doing the same thing we are? NO! Animals solve problems just like we do so they are just finding a solution to a problem in front of them

Is speech special?

Duplex Perception

categorical perception in infants: H(igh) A(mplitude) S(ucking) experiment paradigm

Each time infant sucks, they receive a speech stimulus. (can hear "ba") When infants get bored with what they're hearing, they suck less often. Sucking rate is the dependent measure. → depends on the stimuli they are hearing

Puzzles: Lack of segmentability

Edges of words and sounds are not provided in the speech signal.

Example:

English has not contrast between dental and retroflex stops (both made with tongue tip) BUT infants being raised in English-speaking communities can distinguish paris like these Come with tools to step into categorization (sort of unlearning as they learn one language) → these babies unlearn the difference if learning English

An experiment on speech errors

Errors involving segments are the most common (recorded) type of word execution error... Evidence for segment unit But what if this is just due to how they are recorded by listeners... If make a gestural error, get a different segment

*Nearness is a biasing factor*

Errors seem to get less frequent at longer "distances" between the interacting elements An anticipation error at a distance: "make-up I'm not packing up" => [maek up]... What might have made this vowel error particularly possible in this sentence: 1)the presence of the work "packing" meaning 2) sound→ production of articulation: stress 3) usage/function Can speech errors tell us about the 'window' of speech planning? How far in advance do we prepare our word selections? The planning increments aren't fixed, but tells us about the scope and when we send chunks off to be executed

Word perception

Experimental evidence exists for both top-down bottom-up processing. While we detect a variety of sensory signals, in language we perceive and process abstract units.

T/F: there is a single dialect of English

F: Not really "a" singular dialect → not fixed set of attributes, referring to a common form

T/F: borders, politics give rise to new dialect

F: they do not

Other modules with these properties:

Face identification Emotion → important because important to know that when someone is mad, to leave Depth perception (ex. Like Necker cube must be seen in 3D) Color?

T/F: Formant transitions are NOT context-specific

False: Perception of same consonant can have different formant transitions differ as a function of the vowel context No simple one-to-one relation between the sound (formant transitions) and the perception (consonant).

Find pauses and known words

Find pauses Find known words: cocktail party phenomenon (can pick out your name being said in a loud room but can't hear normal conversations → can pick out your name because you hear it a lot) "Atax" BUT "attacks" → not right all the time but a strategy

A phonological pattern involving feet: Nursery rhymes, verse

Foot unit used frequently and naturally in language games and verse. Example: English Nursery Rhymes: lines of four feet (while the syllable count doesn't seem to matter): Eenie meenie minie moe In order to to complete the beat, you may need to add an extra beat: hickory dickoery dock *snap* Limericks: five lines → A: first, second and fifth rhyming with one another and have three feet, B: third and fourth lines...

Experiment part 2:

For "sh": tongue body raising, lip rounding (2 gestures) For "s'": just tongue tip raising (1 gesture) Control: "sop sop" Tongue twister: "sop shop" During the "s," see active lip rounding and the raising of the tongue body (only one of them intrudes, not both at the same time!) => sub-segmental (gestural) errors Evidence for independent participation of individual gestures in errors. *Evidence for independent participation*

Phonological Units and Phonological Patterning

For each of the units of the internal structures of the words that scientists have argued are important, why do they think this? What we know (even though we don;t know we know it): It is Kit-Kat not Kat-Kit The vowel that is front is before the vowel that is back (we knew this but we didn't know we knew it)

Acoustic signatures of articulation are not absolute: Acoustic signatures of consonant place are not fixed

Formants & stop place of articulation in [CV] Imagine a line from F2 at start of vowel to F2 at plateau/middle of vowel [p/b] F2 Slope is rising and Intercept is low [t/d] F2 Slope is (flat?) & Intercept is intermediate [k/g] F2 Slope is falling & Intercept is high

Perceiving vowels

Formants are critical to vowel perception. But... The absolute frequencies of F1, F2, F3, etc., cannot be what determines the vowel you hear. Different people have formants at different frequency configuration. (they have different resonant frequencies) Maybe relative values are important: maybe not the formants but maybe it's their ratio, maybe it's the spacing of the first and second formants Hypothesis: Is ratio of F1 and F2 is a cue to vowel identity? Sometimes described as difference between F1 & F2 (F2-F1) or formant spacing Somewhat constant, even with variations across utterances and speakers. Computing F1/F2 ratios help, but we still need to learn the vowel space for individual speakers

Variability between speakers

Fundamental frequency Formant frequencies (different vocal tracts) Speaking rate Accents Other: Maybe I have a cold and someone doesn't

Variability within a speaker (social factors, emphasis)

Fundamental frequency Speaking rate Emotional state Environmental circumstance Health (sick with a cold)

Another phonological pattern involving gestures

Harmony (found in many languages)

We simultaneously use many imperfect strategies

Hearing distinct words is possible because of experience with our language. What strategies do we use to find word edges?

Evidence

How might we diagnose linguistic/cognitive units? Examples: linguistic intuitions (ex. Invention of writing systems) sound patterns within words word formation processes (new words) verse & rhythmic patterns (poetry, song) language games (in order to play the game, need to manipulate the building blocks) speech errors (how can I describe when things go wrong)

Where is insertion permitted? What is the pattern of where insertion is allowed?

Hypothesis 1: expletive gets inserted immediately before a stressed syllable (FAILS because no insertion for cat) Hypothesis 2: expletive gets inserted before last syllable with stress Re the stressed syllable hypothesis, consider... [among] no insertion possible

2. segment

Hypothesis: Gestures cohere into larger units => segment Basically the units we've been denoting with IPA symbols

Defining characteristics of CP

Identification shifts rapidly at linguistic category boundary. Equal acoustic steps aren't equally perceptible. Poor within category discrimination. Good between category discrimination.

Assessing Categorical Perception

Identification v. discrimination functions (between category boundires, really good at telling them apart.

Preemptiveness of Phonetic Module:

If a listener can hear it as speech, they must hear it as speech.

Stop consonants and formants

If the consonant is a stop, then there is silence during the closure (i.e. the 'formants' are not audible). how do you tell one stop from another? voicing formant transitions preceding and following (we hear different sounds in different stops because of the sounds that come before and after it)

Prediction

If we presented babies with some string of speech with patterns to infants, they should be able to remember and recognize the patterns. How could we test this? Artificial language experiments

Lexical effects on CP

Impact of lexical knowledge on perception of words when stimuli are acoustically ambiguous. Listeners are biased to hear real/known words: Ganong Effect

Outcome of infant experiment

In both change conditions, the acoustic differences in VOT are identical Infants show: If learned, and not innate, they will all be the same and the babies will suck at the same rate whether you cross the categorical boundary or not If innate, they will suck more when VOT changes categorically Infants aren't equally sensitive to each 20 msec change, it matters which one Conclusion: Infants as young as 1-month perceive VOT changes categorically. (

Thompson Salish (Nthlakampx) contrast

Indigenous language spoken in British Columbia (Canada) Has a contrast between Velar and Uvular Ejectives: [k'] and [q'] Acoustic difference in formant transitions [why]

Losing categories

Infants are born Language-Ready, able to distinguish all categories that any language uses. But experience matters too Infants start to lose the ability to perceive some non-native categories between 10 and 12 months. This perceptual attunement to NATIVE contrasts predicts VOCABULARY at age 2. The more they lose, they more vocabulary they learn by age 2.

General take-away message from infant experiments

Infants are born able to distinguish the categories used by human languages: "language ready" Supports innateness of CP, but experience matters too. A bit before the first birthday, these abilities decline as babies tune in on the ambient language. Combination of innate co-evolved properties of the auditory system and of speech production action-acoustics

More on infant category formation—What else can infants do?

Infants can learn to categorize stimuli by vowel type and ignore talker-specific variation

Innate VOT boundaries found for infants

Infants' abilities do not depend on the language spoken in the infants' environment. Cross-linguistically, infants show a categorical boundary around 30 msec VOT Universal across languages--even when the ambient language does not have the contrast being tested Spanish has an unusual VOT boundary just less than zero ms (~ -4ms) Nevertheless, Spanish-exposed infants are sensitive to the usual +30 msec boundary, even though it's not a boundary in their language Their later Spanish "boundary shift" apparently is learned. They have to learn their language to exposure Thus, results cannot be due to learning

Are modules Innate or Learned?:

Innate: Perhaps mental processing modules evolved in response to selection pressures. But critics disagree "the 'modules' result from the brain's developmental plasticity and that they are adaptive responses to local conditions, not past evolutionary environments." [23] David Buller, philosopher

Are they gone forever? So what happens to the nonnative distinctions that we lose in infancy and childhood? Are they gone forever?

It turns out that they aren't. 1. Infants' ability to retain discriminability for non-native speech contrasts is promoted by exposure, but only if they are spoken by a present speaker, not merely a pre-recorded video. (has to be interactive and social) 2. People can (re)learn distinctions 3. International adoptees: Advantage for previously learned, but lost, first language; (disadvantages for adoptee learning of their new, now solely-monolingual language) 4. And some nonnative distinctions that infants make, adults do continue to be able to make (ex. Can tell difference between clicks even when they aren't in our language) *before puberty, can learn a language unaccented; after, can speak the language but with an accent

Segmentation is based on:

Knowledge about the linguistic system for that language. Knowledge of physical properties of articulation

Are faces special too?

Known faces are perceived categorically Is this because humans are specialized to do this? Or is this because we are very experienced with faces? Perhaps perceptual expertise is the endpoint of a normal learning trajectory? (for speech, fair support that babies do that) But we have a lot of experience with other things too, like cars, and they don't all show this problem People CAN show CP in some other tasks in which they've become experts

What are the puzzles of speech perception:

Lack of Segmentability and Lack of Invariance

*Perception of lexical tone by infants who were learning Chinese and infants who were learning English

Lexical tone—actual speech sounds Pure tones—same shape differences (rising/steady/falling) 6 months: Both chinese and english learning infants could discriminate Lexical tone as well as non speech tones 9 months: enlighlis-learning infants were worse at lexical tone, in chinese-learning infants maintained their ability to discriminate Lexical Tones

Evidence for phonetic module? Duplex perception has been viewed as evidence for distinct phonetic and auditory modules.

Liberman & Mattingly:"In duplex perception, a single acoustic stimulus is processed simultaneously by the phonetic and auditory modules to produce the perception of two distal objects." If the intensity of the isolated transition is lowered Still capable of disambiguating the speech percept Evidence against 'speech is special': duplex perception in non-speech - record metal door slam Separate low and high frequencies Dichotically: subjects can hear both sounds: dialotically

silence does not equal

Linguistic Boundary

Sine Wave Speech: sine waves are used to replace the formants SWS demonstrates that:

Linguistic categories not exclusively based on acoustic properties that aren't encoding particular linguistic qualities. The signal is used as one source of information. Acoustic patterns seem to be perceived as speech just when brain can interpret the pattern as resulting from linguistically significant vocal tract gestural actions. (following the pattern that the vocal tracts makes) Sound + 1. EXPECTATION 2. OTHER (frame and shape how we interpret a sound→ sound and expectation inseparable)

Code Switching

Linguists use "code-switching" to mean: switching between languages/dialects in the middle of a speech exchange with another speaker who commands both languages/dialects. But In lay use of the term code-switching is refers to command and selection of different dialects for different circumstances or social purposes. Linguist might say "style switching" or "variety switching"

Errors and the mental lexicon; Mental lexicon is organized in terms of:

Meaning: properties of the world, categories, qualities Articulation/Sound Grammar/Usage Speakers don't always say what they intend to say.

Consonant metathesis

Metathesis switch order of two sounds ask => aks

Another phonological pattern involving segments

Metathesis (switiching the order of two sounds) The exchange in temporal order of two segments.

Metathesis/exchange

Metathesis an exchange of two segments (or units) a hunk of jeep for a heap of junk odd hack for ad hoc (vowels metastasized) chet the seck lay the weed show snovel (what unit type is exchanged here??) [the onset is metastasized)

Mirror neurons

Mirror neurons fire when an individual executes an action and when he observes another individual performing that same action. First discovered in macaques in 1990s; posited in human homologue areas (2004) Speculated as a relevant to empathy→ why is it that when you're sad, I am sad & theory of mind (e.g. autism) → how do you know that I know what you know Considered a plausible component of imitation & has been speculated as a component of empathy Neuroscientists have proposed that the mirror mechanism is the basic mechanism from which language evolved. (the basis for language evolution) A system that matches observed events to similar, internally generated actions Forms a link between the actor (e.g. message sender) and observer (message receiver) *Organ imitation and vocal learning: Such an observation/execution matching system could provide a bridge from 'doing' and 'communicating,' as the link between actor and observer becomes a link between the sender and the receiver of each message

what is another example of language variation

Monophtongization or "smoothing" of diphthongs in merger Vowel Quality Variation Qualitative variation

Activation thresholds

Networks (words or sub-word units) can be activated (or inhibited) Speech is triggered when activation passes some threshold. Mistiming can occur if spill-over activation is co-occurring from similar neighbors—the gang effect (activation may pass threshold in an incorrect location)

Word segmentation

No 'spaces'/silences between words or between speech sounds Discontinuities in the acoustics is not necessarily a space between words Thus segmentation is not (obviously) inherent in the speech signal itself.

Is perception Vertical (A cognitive act like perception is a match to morality)?

No → sound is part of what we are using but it isn't all that we are using

A more modern interpretation

Objects of speech perception (things decoded from the signal) are the intended vocal tract tasks or gestures —linguistically significant actions—of the speaker → goal directed: so like move up your lip 3mm and touch you lips to make this sound Phonetic Categories or speech production building blocks are also these goal-directed articulatory actions: gestures But, how do we identify them from the signal? How do we connect the signal to production?

Sub-syllabic constituents

Onset Rime Coda sp + ace [s] sn + ow gr + eat [t] f + ood [d]

A better account of expletive insertion with reference to foot structure

PATTERN: Expletive must be inserted between two feet. F-ing insertion is evidence for the foot as a phonological unit. Even Casual or slang word forms (any word formation pattern or process!) can be a source of evidence for cognitive units.

CP for emotion

People are better for between-category advantage Emotion provides category advantage: angry to fearful, but in asking volunteers to catigoriize, volunteers feel certain emotions towards said emotions Hypothesis: Faces, voices, body postures (or some combination) encode discrete emotional meaning and are: Homologous: evolutionarily conserved among species Innate: biologically-derived Universal: culturally similar among humans

The Perceptual Imperative

Perception is a "pattern-making" process with two functions: To structure and organize the distal stimulus, which in principle is always ambiguous. Incoming sense data can be validly processed multiple ways, and ambiguity arises when alternative meanings are possible. To attach to it a meaningful response, like " I see X and not Y", and if ambiguity arises when alternative meanings are possible, to select one.

Why is CP important in cognition

Perception is not veridical: Our perceptual systems can transform relatively continuous (linear) sensory signals/stimuli into relatively distinct (nonlinear) internal representations/abstract categories. Different input signals can have equivalent perceptual consequences. Categorical Perception is yet another way to help deal with the lack of invariance. CP can provide us with equivalence classes, the beginning of proto-symbolic thought. particularly useful when we need to make connections between things that have different apparent forms. CP is the first stage of this process of responding to the essential, rather than superficial, aspect of an entity.

If the percept is abstract (not a match with the signal), what is in fact it's nature...

Perceptual representations ≠ auditory signals Possible perceptual object: the phonetic gesture.

Phonetic categories == Auditory categories

Phonetic category boundaries depend directly on discontinuities in the mammalian auditory system Implication: that means that for every cue and cue context known, the articulatory maneuvers must be designed to produce just those acoustic patterns that fit the sensitivities of the general auditory system. "Thus, this last auditory theory is auditory in two ways: speech perception is governed by auditory principles, and so, too, is speech production." Liberman & Mattingly An alternative (Liberman, Mattingly, Fowler, Goldstein and others) Speech perception not explained (solely) by principles that apply to the perception of sounds in general. → in these auditory theories, not specific about speech: al have to do with audio

*Similarity is a biasing factor*

Phonetic similarity wait a minute => mait a winute form-persuasive garments => porm fersuasive... Meaning similarity Motley and Baars (1976) found that a word pair like "get one" will more likely slip to "wet gun" if the word pair presented before it is "damp rifle" (prime with this word)

Linguistic variation among dialects include:

Phonetic: pronunciation Grammatical: e.g. verb tense Lexical: vocabulary (words or phrases)

Biasing material

Phonologically (sound & rhythm) and semantically (meaning) biasing material, along with proximity, show Gang effects: Having multiple biases is stronger than just one

Phonology

Phonology: What is the sound structure of words? Building blocks of words—phonological units What kinds of evidence for linguistic units can we observe?

Evidence for gestural units?

Phonotactics

CP for known face identity (guess in the middle if it is one man or another)

Poor within category discrimination, better with out category discrimnation

Categorical Perception And VOT

Poor within category discrimination. E.g.: VOTs of -10, 0, +10, all sound like /ba/ VOTs of +50,+70, +40, all sound like /pa/ Good between category discrimination. +40 is /pa/, +20 is /ba/ The duration difference between -10ms & 10ms, and +50ms & +70ms is the same as the difference between +20ms & +40ms. But The latter pair of VOTs sound different. The first two pairs do not.

Problems with detecting known words

Posit boundaries around words you know. ?????????????BONE???????? I picked up the dog bone... I picked up the trombone Us poke can cent tense off in contains men knee words that were knot in ten did tube bee herd A spoken sentence often contains many words that were not intended to be heard Words often contain strings that are word similar

Affixes

Prefix un+lock (before root) Suffix lock+ able (after root) un+lock+able

Infants lose the ability to discriminate nonnative contrasts*

Progressive learning 8/10 months then 10/12 months: English infants From 6-8 months, they can clearly tell the difference between [k'i] and [q'i]; 8-10 months, start losing their ability to tell the difference, 10-12 months, lose the ability to tell the difference. But Salish babies can hear the difference even at 11-12 months and beyond because that is their language so they have more experience with the language.

Evidence for the syllable

Reduplication

Bottom Line

Regardless of whether the speech perception system is "special" or relies on general perceptual mechanisms. or is special in the sense that it is evolutionarily/ecologically vital,

Other Constraints

Result of error typically: obeys language's sound sequence rules phonotactics and syllabication maintains the rhythmic properties of the target yields (or at least is heard as) real word

Innate capacity + tuning

Some language-specific learning/tuning Children can learn categories that don't exactly match the ones newborns exhibit. Adults can learn new language categories. Learned boundaries are somewhat plastic in context (e.g., Ganong effect).

Evidence of onsets & rimes

Some speech errors involve misordering syllable onsets or rimes.

Binding: binding together the sign and the gesture(Louis Goldstein and Poeppel & Assaneo)

Speech acoustic signal shows certain prominent frequencies (roughly the syllable rate) → vocal tract operates with a certain rhythmicity The moving vocal tract also shows similar prominent frequencies of rapid shape changes (have same range of rhythm that our vocal tract likes) Interestingly, these frequencies coincide with the frequencies observed for neural ensembles in the motor and perceptual brain regions. →This comodulation and resonance to neuro fruecenines important with how we do speech perception

5. Speech Errors: "Psychological reality"

Speech error can inform scientists at to the structure of words/language If a particular unit/category participates in an error, this is taken to be evidence for the 'psychological reality' of that unit/category. Cognitively active or functional

Speech errors and linguistic units

Speech errors are: THEY AREN'T RANDOM: words are structured so speech errors happen systematically (happen around the same places) Systematic Constrained Shed light on characteristics of error-free speech production.

Possible theoretical approach

Speech perception and production are innately linked, have co-evolved, and are neuro-cognitively specialized Perception involves [somehow] abstract actions ("gestures") of the vocal tract. Motor Theory: speech is disambiguated because speech perception directly activates the speech production process (à la Liberman) Current Co-modulation account: Acoustics that pattern together—move together in time—'cohere' perceptually into the causal actions. (L. Goldstein)

Speech is encoding meaning

Speech perception is a biological/evolutionary specialization/adaptation enables listeners to use the systematic, special relation between signal sound and the vocal tract actions that produced it. Evolved in parallel: Motor system for producing articulation and perceiving system sensitive to co-species action-sound patterning Motor system for producing articulation and the perceiving system evolved in parallel and are sensitive to co-species, action-sound patterning *Biological specialization enables listeners to use the systemic, special relation between signal sound and the vocal tract actions that produce it*

3. Direct Perception

Speech perception is not 'more special' (i.e. not different) than perception of other important environmental events (speech is another instance of our understanding what our senses are telling us based on the actions around our environment tell us) The demonstrations of Categorical Perception and Duplex Perception in other domains weaken the view that "speech is special." (door slamming example: shows that duplex perception is not unique to speech (infusion pieces of auditory is not just in speech) Our minds seem to be able to automatically reconstruct the distal (out there in the world events) cause of a sensory signal => Direct Perception

Variability for a given segment

Speech rate Informational role (e.g. emphasis, phrase, grouping) Coarticulation of neighboring sounds "Key" v. "coo" The release of the k has different sounding

How is spoken language different than written language?

Speech: auditory but lang. can be visual!, instantiated as body movement in space, unfold in time (don't teleport our tongue from one place to another), Overlapped (in time), edges of words and sounds not apparent, Lack of invariance Writing: visual symbolic atemporal discrete edges of words & letters marked

what are speech errors and what are they not

Spontaneously and inadvertently produced errors (when something comes out of your mouth that you didn't mean to come out); misspeaking from the intended utterance Not... intentionally produced word-plays or puns. malapropism—the speaker has the wrong beliefs about the meaning of a word ("ACLU, honey, that's an 'anachronism') Not performance disfluencies (filled pauses "uhs") Not euphemism or hyperbole Not falsehoods, misstatements

Why might Categorical Perception be useful?

Stable perception of a variable signal. Allows listeners to "ignore" irrelevant (i.e., within category) variations in speech signal (when within a category) Provides good discrimination between categories that must be recognized as distinct in order to encode information (i.e., contrast). Apparently abstract perception in high-level cognition can be grounded in perception and action of the physical world. EMBODIMENT

Dual-Stage Models

Stage 1: Acoustic: Consonants in different contexts represented by distinct sound percepts Stage 2: Phonetic: these are then sorted (matched/classified) into distinct percepts into categories But/Quandaries? Why are the different auditory percepts categorized the same? Because of lack of invariance Why are the auditory percepts largely inaccessible?

Problems with working from pauses

Start at the edge of utterances, where there are pauses, and work inwards Herearetheworksonthis Here are the works on this He arthur works on this

Statistical learning Theory:

Statistical learning: Infants remember patterns of sound occurrence in language. Initially, they segment based on simple patterns (consonants and vowels) Then they expand their notice to other types of patterns: stress phonotactics

Foot

Stressed syllable + following syllable if it is unstressed Note that a single unfooted syllable (unstressed) may occur word initially. Ex. [happy] ba[nana] a[bound] [ban][danna] *Unstressed vowels in English are often schwa [ə]*

4. Stress Foot: Stress

Stressed syllable more prominent than neighboring syllables. e.g. in English: louder, longer, pitch change In English: all words must be stressed somewhere stresses recur at regular intervals.

T/F: Many (but not all languages) stress one or more syllables in a word.

T: Stress has to be apart of the information we know about a language

T/F: Linguistic variation is a feature, not a bug

T: we will never find a language where everyone speaks in the exact same way

Experimental Paradigms: Assessing Categorical Perception

Task 1: Identification aka classification/labeling Task 2: Discrimination

Discrimination

Tell two things apart (same or different judgement) Same or different What matches what

The speech is special hypothesis

The "speech is special" Hypothesis claims that speech processing: A specialized neural module/function to process speech. Uses a specialized neural module/function that has evolved to process speech - Speech processing doesn't share the neural machinery of other perceptual processing. - Doesn't (solely) on the neural machinery of other perceptual processing Consider Sine Wave Speech...

Consequently: Link is innate

The basis of phonetic categories must rely on an infant being able to apprehend the gestural structure of speech—i.e. the possible articulator actions. This ability is Innate (not learned) and Specialized for language specifically Brain comes hardwired at birth to do these speech actions based on speech acoustics

Direct Perception:

The brain is adept at recovering distal events—sources of sensory signals—directly, rapidly, and automatically from the signal, rather than the processing of the signal itself being available to cognition.

What about consonants?

The formant transitions from consonant-to-vowel [or vowel-to-consonant] contain most of the information for consonant place identification.

Misperception of long lyrics. Why?

The patterns we are used to seeing are radically altered by the beat, the pitch. Also once you have the perception primed (know it one way), a path your brain is used to walking is so hard to undo.

Context-dependence of acoustic information (aka lack of invariance)

The same acoustic pattern will have different linguistic phonetic classifications depending on the context in which it is occurring.

The Motor Theory of speech perception (Liberman)

The theory: Speech production and perception are innately (at birth) linked in a specialized (evolved) neuro-cognitive "module." Responsible for both perception and production [Original theory 1980s] Objects of speech perception represented as motor commands for vocal tract actions acoustic signal => automatically perceived as motor commands for gesture

What does 'standard' mean in Standard American English

The word 'Standard' here has the meaning 'everyday' rather than 'correct' Widely accepted & understood; educated The language of 'print, school, and media'

3. Unit: Syllable

There is evidence that segments cohere into larger units—syllables

Puzzle: Lack of invariance

There's no one sound pattern that tells you that a signal was a particular segment or word. A word's sound differs from occasion to occasion Perception of same consonant can have different formant transitions (di different from du) No simple one-to-one relation between the sound (formant transitions) and the perception (consonant)

But is there a SWS analogue in vision? (is speech special)

This dramatic change in perception—as in certain visual illusions and SWS—is an example og "perceptual insight" or pop-out effect Usually can't flip back Attempts to experimentally demonstrate that speech is special (properties of speech are not observed in other auditory)

What does it mean when BABIES TURN THEIR HEADS BEFORE THE TOY LIGHTS UP.

This means that they CAN hear the category difference!

T/F: CP boundary shifts to favor known words

True

T/F: The standard language may not be the prestige dialect

True

T/f: perception is GENERALLY not vertical

True Depth perception (necker cube) → there is no depth here, yet I can reconstruct depth here

T/F: Finding known words depends on the words you know!

True Just memory and input affects inteital word learning

T/F: The representation of speech as a sequence of symbols led scientists to expect that it would be possible to chop the acoustic signal up into chunks such that each chunk corresponded to one of the phonemes.

True, but turns out not to be possible. Symbolic representation [di], [da], and [du] indicates two units. Physical signal: not possible to separate the stop from the vowel

Evidence for subsyllabic constituents from linguistic universals

Two of the most common cross-linguistic constraints on word structure: Words with syllable onsets are more common (preferred) over those without. Words with single-consonant syllable onsets are more common (preferred) over those with multi-consonant onsets. These suggest that human language makes reference to the structural unit of syllable onset.

Systematicities

Typically sounds in similar syllable positions interact Onsets can be involved in errors other onsets.Codas can be involved in errors other codas.Rimes can be involved in errors with other rimes.Vowels can be involved in errors other vowels Typically stressed elements interact Typically elements similar in some way interact (stress, syllable structure) Typically elements interact with 'nearby' elements

Syllable

Unit of linguistic organization that (usually) contains a vowel preceded and followed by zero or more consonants. Syllables are important

Sound patterning within and between words

We can use information we know about how sounds pattern in our language: [English] Phonotactics [tl] sound sequence not allowed BUT "Atlanta" [English] Stress Stress often indicates a preceding boundary TWenty CHOcolate PEAnuts But not always the case: "eiLEEN atTEMPTS guiTAR"

How the mind reconstructs the world

We perceive the disparity in time of a sound hitting our two ears as a location of a source (not as a disparity in time of a sound..) → we turn our head left because the source came from the left, not a sensory percept We perceive binocular disparity in vision as distance (not as two slightly different images hitting our eyes) *both important ecologically We perceive a quickly growing object as approaching; not as a shape getting larger ["looming'] Location depth, looming, and speech are all ecologically important events All examples of a causal events The location of a sound source has a lawful relationship to signal disparity at the two ears The distance of an object has a lawful relationship to the image disparity At the two eyes The approach object has a lawful relationship o the size change of the image on the retina

So how do we do word segmentation?

We simultaneously use many imperfect strategies Find pauses and known words

1. gesture

What is the smallest —atomic—building block? Gesture: vocal tract action organized to achieve a specific linguistic goal or task Example: lip aperture closure TASK → 'get the lips closed' Nothing systematic about how far the lips move (move lip up 10 mm), but the lip closing itself

how is it organized

What types of information/knowledge What 'linkages' among types of information bind together to form a word's collection of qualities

Stop categories & Voice Onset Time (VOT)

What would VOT perception look like if VOT were perceived along a continuum? (ba→pa: as you increase VOT from ba, get more like pa) It would be a straight line (red on the chart) every +10. As you increase VOT, more likely to hear [pa]. As VOT increases vertically, more likely to hear pa. *BUT IN REALITY, EVEN IF THE CHANGE IS LINEAR, PEOPLE DON'T PERCEIVE IT LINEARLY BUT AS A CATEGORICAL AFFAIR* It is not a line!!!!

What is heard

When Heard alone—one ear only Base: ambiguous between [da] and [ga] Transition: chirp When heard dichotically Simultaneously hear both: Fused syllable in the ear to which the base is presented [da] or [ga] depending on the transition AND a chirp Duplex perception: hear two things absolutely distinctly!!! Not simplex or triplex

When is word segmentation hard?

When our knowledge is limited/impaired: masks (our vision) When the signal is limited/impaired: masks (the sound) *vision not required, but if any is provided we will use it!* Languages you don't know, you don't know when one words starts and another one ends Difficult listening conditions and unpredictability

Manipulate just a subject's instructions:

When subjects instructed attend to the speech Perceive syllables categorically Chirps not discriminable within categories, only across When instructed to attend to the chirp, not the speech, Perceive chirps continuously ALL chirps discriminable => Exact same stimuli (exact same continuously changing formant transition) can be perceived categorically or continuously depending only on the instructions to the subject.

Formant transitions

When the vocal tract shape is changing, its resonant frequencies—formants—are changing. So when a constriction is being formed (or being released), the formants are moving.=> formant transitions Formant Transitions: Movement of vowel formants as stop closure is released or formed The first formant F1 exhibits a rising transition after the release of a constriction. → tells you a construction is happening F2 & F3 vary according to place of articulation & overlapping vowel(s) → more important in this case *F2 transitions help cue place of articulation If have "a" vowel and have a change from p to t to k When only transitions change (F2), consonants are perceived But everything is context-specific

Tongue twisters

Why are tongue twisters fun/hard? How would you invent a particularly 'good' tongue twister? Leverage multiple biases *Why are tongue twisters hard?: all the tongue tip sounds are near each other for "she sells seashells by the seashore," similarity in onsets and common footing structure for "peter piper picked a pepper," tongue articulation of [l] and [r] are both liquids in English (share articularly similarity) for "red leather, yellow leather"

Ganong Effect:

Word status affects the perception of ambiguous words/sounds, including category boundaries. Is it... A specialized speech perception system? A general auditory system for representation of sound? A perceptual system specializeds for co-species etc.

pattern:

Word-final consonants (of some words) are omitted before a consonant but not before (when a word starts with) a vowel or glide.

Building blocks of words

Words are composed of smaller units that themselves do not have meaning. cat [kæt] tack [tæk] act [ækt] To make words, we recombine these smaller units according the 'rules' of our language.

Segmentability

Words are composed of smaller units which themselves do not have meaning. We can recombine these smaller units into new words.

Multiple "semi-reliable" cues that proficient language users use:

Words you know in your language Stress patterns in your language Phonotactics of your language NOTE: Most segmentation strategies rely on experience with your language Non-native listeners

an example of a shift/syllable variation is:

a chain shift

people who don't use SAE are simply using

a distinct form/variant of language not an incorrect English; rather, an alternative English

dialects are...

a language variety

the discussion of language v. dialect is often...

a political, social, historical, orthographic argument/distinction rather than scientific

Duplex perception:

acoustic signal can be perceived as both a speech event and a non-speech sound at the same time Same signal can be perceived as both a speech event and a non-speech sound at the same time. Reasonable conclusion => There are two modules that produce simultaneous representations: SOUND and SPEECH

What are the cognitive objects of speech perception:

auditory theories v. motor theories

Properties of a Module:

cognitively impenetrable: can't reflect on formants, VOT etc. → can't penetrate the process that led to the speech ecologically relevant/functionally specialized: ecological→ important evolutionary to the success of the species. Speech is ecologically critical. Whatever the module of the function is is important to the environment. →main difference from this and motor theory: there are other ecologically relevant things: able to recognize faces we know (facial recognition) anatomically distinct: special part of the brain is used for this automatic/mandatory/preemptive: once you heard it as sine wave speech, can't hear it as you originally did anymore (innate??)

Speech movements are

continuous in time. Consequently, acoustic signal is also continuous and acoustic info is 'spread out' in time. One 'sound' at many points in time.

What information does this system use to recover gestures?

cues

decoding a "speech signal" involves

decoding "units" some sort. Structural/combinatorial elements

duplex perception: Dichotic presentation:

each ear simultaneously hears different sounds. (one thing in one ear, something else in another).

Sound + Top-down information:

expectations and knowledge (things we learned about our world)

For equal steps of variations

for non-speech acoustic signals, many more stimuli can be discriminated than can be identified. BUT for speech signals listeners can't discriminate much better than they can identify. Proposal: Listeners perceive the units of contrast not the non-contrastive variation.

Reduplication

form new word by repeating part of original word

In Speech percept:

formant transitions produce a categorical perception (with the base) In Non-Speech percept: isolated formant transition perceived as a continuously varying 'chirp'

distinct forms of language vary by:

geographic region social ethnicity and race across communities with culture commonalities Generation —people talk like their peers not like their parents *But dialects never purely any one of these; people have multiple identities*

Blends (A single word from combining pieces of two words)

hilarics from hysterics and hilarity Often combinations of onset of one word with the rime of another. In the sleast for in the slightest/least Not everything goes →not random willy-nilly errors

Two terms associated with Categorical Perception

identification and discrimination

Discrimination in non-speech

in perceivining continously, many stimuli differencts can be percieved

A phonological pattern involving feet

infixes

Coarticulation means

invariance is not a property of the system: tongue shape not the same for idi, ada, and udu The d is being co-articulated with its neighbors Not just a random relationship See the influence of the coarticulation vowel in the shape of the tongue

Whether speech is "special" or relies on general perceptual mechanisms,

its organization helps to deal with variability in the speech signal and in listening conditions

Gesture

linguistically significant, coordinated vocal tract action

Language:

made up from a group of related dialects and their associated accents; typically mutually intelligible How dialects are grouped into languages can be geopolitical decision rather than a linguistic one.

What do we know about stops after [s] that make this error plausible?

makes aspirated sounds in a cluster with s become unaspirated (this guy v. disguise)

The basis for imitation:

newborns can imitate certain facial movements (the newborn can organ match→ when it sees a tongue perfusion and slow lip closing, can imitate it When sees a tongue tip active, does it Able to connect what she sees and hears with what she does (even if she doesn't make the sound)

Segment boundaries are

not implicit in the acoustic signal

infixes

occur internally in a base word, rather than at either end of the base. =>languages including Tagalog, Arabic, Malay and some dialects of Mandarin

Segmentability is...

only abstract; Our knowledge of words must be abstract.

mental lexicon

our complete knowledge of words we know

speech movements are also

overlapping (co-occur in time). This means that at any point in the speech stream information is being transmitted about multiple words. Multiple 'sounds' at any one point in time

A phonological pattern involving gestures

phonotactics

McGurk effect

see a video of a man saying "ba" "fa"..., but when close your eyes, you only hear "ba" repeated Perception depends on multiple sensory modalities. Brain must integrate all sources of information (distal event→ like a syllable or linguistic gesture) to come to a conclusion for the best possible match for the information it is achieving If still in visual field, you'll mcGurk Can McGurk across males and females but not from familiar voices you know Can McGurk by touch Our speech function makes use of all types of relevant information, regardless of the modality. This has been interpreted as further evidence of a phonetic or linguistic module that has as its perceptual object the phonetic gesture.

Sound + Bottom-up information:

sensory/signal information (information directly from the senses ad encoded in the physical symbol) → sound also included

The expectations (situational) are very important

sound + EXPECTATION (situational, grammatical, statistical) Grammatical: if add -ly to something, we know it's an adverb or if I say "It's a nice" you know it will end in a noun Statistical: patterns we find in situations

double negatives are an example of

standard American English

All dialects—all languages— are

systematic and rule-governed (not just random)

coarticulation

temporal overlap More than one constriction is being executed simultaneously. Consonants and vowels (beginning a syllable) overlap in time for a substantial portion of their articulations. In fact, so do neighboring consonant articulations.

Idiolect:

the combination of dialect plus accent for one particular speaker

accent:

the manner of pronunciation of a speaker (a word that is common amongst the general American english but pronounced differently) Everyone speaks with an accent!

Priming

the phenomenon whereby the act of processing an utterance with a particular form facilitates processing a related form

Dialect:

the types and meanings of the words available to a speaker (lexical) and the range of grammatical patterns into which they can be combined (grammar) A dialect may be associated with more than one accent.

The speech system is organized:

to allow efficient learning be robust to variability in the speech signal and in listening conditions link the events of production with the outcomes of perception.

If gestures are units of speech production, why are gestural errors so rare?

voicing metathesis → could be a voicing error glear plue sky for clear blue sky Maybe they're not! A gestural error will generally result in an apparent segmental error. pig and vat for big and fat (evidence for a segment OR a voicing error) velum lowering metathesis Cedars of Lamadon for Cedars of Lebanon (contrastive in whether there is a velum lowering gesture) Or may be un-transcribable. Or imperceptible.

a chain shift

when start manipulating the system, there is a shift in the vowel space (vowels shift across the vowel chart) The " 'on'-line" dividing the North and MidAtlantic Is "on" like "Don" (to the North) or like "Dawn" (to the South)?

vowel merger:

when two categories in one variety are realized (pronounced) as one category in another way [ɛ] and [ɪ] may not contrast before nasals pen = pin; Wendy = windy

errors can occur in

word selection from lexicon (saying cat instead of dog) Often substitution of a word with a similar meaning/grammatical type and/or similar sounds or stress pattern word execution (sound selection) [the right word but produce it incorrectly]

core 103: human speech test #2

संबंधित स्टडी सेट्स

Protection of Information Assets

Week 12: Outlines

TEAS 6 READING SECTION

Calorimetry & Phase Changes

Real World Soc Ch 3

Performance Appraisal

HESI Peds- Adolescents

Chapter 9 Neuro Eye

A&P TEAS practice test

Chapter 8 Questions

United States Foreign Relations midterm

Chapter 10

Chapter 11

CCNA 1 & 2

2nd section

Radiographic Image Quality - Chapter 10 Evolve

Section 26

A-level Psychology; Psychopathology

MGMT-417 EXAM 4

Chapter 20: environmental policy and decision making