NLP Midterm

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

__ indicates not syntactically correct

*

Two types of constraints on parse?

1) input sentence 2) grammar

properties of markov model?

1) limited horizon (don't have to go too many words back to guess current) 2) time-invariant (doesn't depend on time??)

how many words in English language?

2 million

context free grammar

4-tuple (N: non-terminals, Sigma: terminals, R: rules, S: start symbol)

What percentage of the most commonly used words are ambiguous?

40%

__ indicates not semantically correct

?

negation probability

P(!A) = 1 - P(A)

disjunction/union probability

P(A u B) = P(A) + P(B) - P(A, B)

iff events are mutually exclusive (A intersect B) != {}, P(A u B) =

P(A) + P(B)

Bayes Theorem

P(A|B) = (P(B|A) * P(A)) / P(B)

P(sentence) =

Pr(w1, w2, ..., wn) = Pr(w1) * Pr(w2 | w1) * Pr(w3 | w1, w2) * ... * Pr(wn | w1, w2, ... wn-1)

shift-reduce parsing

a bottom-up parser that tries to match the RHS of a production until it can build a complete S

constituent

a word or group of words that function as a single unit within a hierarchical structure; constituents can be substituted for each other

coordination ambiguity

adj noun and noun -- does adj apply to both nouns?

Laplacian smoothing

aka add-one smoothing; P(wi | wi-1) = c(wi-1, wi) + 1 / (c(wi-1) + V) where V is number of words

part of speech ambiguity

ambiguity caused by a word having multiple parts of speech eg. round

phonetic ambiguity

ambiguity caused by mispronounciation eg. Joe's finger got number

word error rate

another language model evaluation method; number of insertions, deletions, and substitutions normalized by sentence length

CKY Parsing

bottom-up parser with a dynamic programming table; complexity O(n^3)

dynamic programming parsing

building a parse for a substring (i, j) based on all parses (i, k) and (k, j) that are included in it

IPA Chart

chart created by the International Phonetics Association to describe sounds

constituent

continuous, non-crossing, subdivisions of a sentence. a word is a constituent, as is anything you could find on the LHS of a rule in a CFG

long distance dependencies

dependencies that span across many words (6+) words in a sentence

nested sentences

eg. "I don't recall whether I took the dog out" = "I don't recall" + "whether" + "I took the dog out"

subjectivity

eg. Joe believes that stocks will rise

center embedding

embedding a phrase in the middle of another phrase of the same type; one of the issues with context free grammars -- can recursively generate sentences that are too long for humans to understand and do not have a mechanism for bounded recursion

viterbi algorithm

find most likely sequence of states (usually POS tag) given words; uses dynamic programming and backpointers

The Noisy Channel Model

framework used in spellcheckers and translators to guess original/intended word that has been garbled; eg. guess original english sentence based on french translation. returns argmax P(e | f) = argmax P(f | e) P(e) = probability of the french translation given the english sentence we are considering (translation model) times the probability of the english sentence we are considering even occurring (language model)

backoff

going back to lower order n-gram model if higher-order model is sparse (eg. frequency <= 1)

phrase-structure grammars

grammars defined by phrase structure rules, not just rules for individual words

polysymous

having multiple meanings eg. ball

lexical semantics

how the meaning of lexical units (words or subwords) correlate with the structure of the language

discourse conventions

how to correctly construct a conversation

perplexity

how well a language model fits the data; nth root (1 / P(w1 w2 ... wn)) where n is the number of words in the sentence

state transition probability

in HMM, probability of current state given previous state(s)

emission probability

in HMM, probability of current word given state

baseline POS tagging method

label everything a noun

The Turing Test

language test invented by Alan Turing to tell machine and human apart; uses language as a test for human intelligence

compositional semantics

meaning of longer utterances and sentences

negation difficulty

no noun and noun -- does no apply to both nouns?

closed-class POS

non regularly accepting new members eg. pronouns, conjunctions

the certain event

omega

the impossible event

phi

P(states, words) =

pi_i P(si | si-1) P(wi | si) = probability up until this point times probability of state given previous state time probability of word given state

posterior probability

probability of event happening in light of other events/ calculations

prior probability

probability of event happening without any other evidence taken into account

maximum likelihood estimates

probability of n-gram = (number of times n-gram observed) / (number of times (n-1)-gram observed) eg. unigram = number of appearances of word / number of words; bigram = number of appearances of "with spinach" / number of appearances of "with"

problem with unigram model?

probability of sentence is the same even if you mix up order of the words

law of total probability

probability of something happening = sum of all the ways it could happen. in other words if B1 ... Bn is a partition of sample space S, Pr(A) = Pr(A, Bi) for all i < n

text preprocessing

processing text for running through algorithm; eg. removing ads and javascript, dealing with unicode, capitalization, word segmentation, sentence boundary recognition, etc.

probability of next word in a sentence =

product of the probabilities of all the preceding words

unconditional language universals

qualities that all languages have eg. all languages have verbs and nouns; all spoken languages have vowels and consonants

conditional language universals

qualities that all languages with a certain other quality have eg. if a language has inflection, it always has derivation

smoothing

reassigning some probability mass to unseen data

open-class POS

regularly accepting new members eg. nouns, verbs, adjectives

Penn Treebank

resource/corpus of words for training automatic parsers

burstiness

seeing rare word in document --> more likely to see it again

Hidden Markov Models

sequence of variables that depend on each other; dependent states are hidden

discrete sample space

sides of cube

forward algorithm

similar to the viterbi algorithm except instead of finding the maximum probability you sum all the probabilities

examples of diversity of language

some languages dont have articles; cases (such as in Latin); sound systems; social statuses; kinship systems

phonetics

study of speech sounds; helps model how words are pronounced in colloquial speech

morphology

study of the forms of words; how words are shaped and behave in context; studies prefixes, suffixes, roots, stems, etc.

event

subspace of the sample space

continuous sample space

temperature

collocation

the act of two words often appearing together to create a phrase eg. stock market

conditional probability

the probability of one even occurring given the other event occurred; P(A | B) = P(A, B) / P(B) ... posterior probability divided by prior

joint probability/the chain rule

the probability of two events occurring together/at the same time; P(A, B) = P(A|B) P(B) = P(B, A) = P(B|A) P(A)

pragmatics

the study of how language is used to accomplish goals; eg. polite language, indirect language/intent

semantics

the study of meaning

syntax

the study of the structural relationships between words

referential difficulty

two subjects, pronoun used later, which person does it refer to?

push down automata

type of automata that employs a stack and input tape; equivalent to a CFG

types v. tokens

types are distinct words, tokens are words

problem with picking a word at random to generate a sentence?

uniform probability for words

interpolation

use lambda1 * trigram prob + lambda2 * bigram prob + lambda3 * unigram prob

bottom-up parsing

uses rules from grammar to generate parse tree from the bottom up; labels terminals (words) with non-terminals and then tries to group based on grammar rules; ends up exploring options that don't lead to a full parse

top-down parsing

uses rules from grammar to generate parse tree from the top down; requires backtracking if tree generated does not translate into terminals (Words) in sentence; ends up exploring options that don't match full sentence

n-gram models

using markov assumption, only look at the previous n words eg. unigram (1), bigram (2), trigram (3)

problem with MLE values?

usually too small; use logarithm probability instead

syntactic ambiguity

when a sentence may be interpreted in more than one way due to ambiguous sentence structure eg. Call Joe a taxi

metonymy

when a thing or concept is referred to by something closely related to it eg. Boston is calling

assimilation

when one sound changes to accommodate another eg. NPR sounds like MPR

The Markov Property

when the conditional probability distribution of future states depends only on the current state and not on the events that came before it

prepositional phrase attachment ambiguity

when the same preposition is used for different meanings eg. Joe at pizza with pepperoni/with Samantha

morphological ambiguity

when two words appear to have same morphology (eg. same root) but do not eg. impossible = not possible but important != not portant

sense ambiguity

when word in sentence has multiple senses which can change interpretation of sentence eg. Joe took the bar exam

cognate

words that sound similar across different languages eg. plato and plate (Spanish and English)

markov assumption

you can only look at a limited history of preceding words and still get a good estimate for the probability of a sentence


संबंधित स्टडी सेट्स

Essentials Technology Basics Study Guide

View Set

Cuál es el rol de la monarquía en Reino Unido

View Set

Chapter 17 - Revolution and Enlightenment

View Set