NLP Midterm
__ indicates not syntactically correct
*
Two types of constraints on parse?
1) input sentence 2) grammar
properties of markov model?
1) limited horizon (don't have to go too many words back to guess current) 2) time-invariant (doesn't depend on time??)
how many words in English language?
2 million
context free grammar
4-tuple (N: non-terminals, Sigma: terminals, R: rules, S: start symbol)
What percentage of the most commonly used words are ambiguous?
40%
__ indicates not semantically correct
?
negation probability
P(!A) = 1 - P(A)
disjunction/union probability
P(A u B) = P(A) + P(B) - P(A, B)
iff events are mutually exclusive (A intersect B) != {}, P(A u B) =
P(A) + P(B)
Bayes Theorem
P(A|B) = (P(B|A) * P(A)) / P(B)
P(sentence) =
Pr(w1, w2, ..., wn) = Pr(w1) * Pr(w2 | w1) * Pr(w3 | w1, w2) * ... * Pr(wn | w1, w2, ... wn-1)
shift-reduce parsing
a bottom-up parser that tries to match the RHS of a production until it can build a complete S
constituent
a word or group of words that function as a single unit within a hierarchical structure; constituents can be substituted for each other
coordination ambiguity
adj noun and noun -- does adj apply to both nouns?
Laplacian smoothing
aka add-one smoothing; P(wi | wi-1) = c(wi-1, wi) + 1 / (c(wi-1) + V) where V is number of words
part of speech ambiguity
ambiguity caused by a word having multiple parts of speech eg. round
phonetic ambiguity
ambiguity caused by mispronounciation eg. Joe's finger got number
word error rate
another language model evaluation method; number of insertions, deletions, and substitutions normalized by sentence length
CKY Parsing
bottom-up parser with a dynamic programming table; complexity O(n^3)
dynamic programming parsing
building a parse for a substring (i, j) based on all parses (i, k) and (k, j) that are included in it
IPA Chart
chart created by the International Phonetics Association to describe sounds
constituent
continuous, non-crossing, subdivisions of a sentence. a word is a constituent, as is anything you could find on the LHS of a rule in a CFG
long distance dependencies
dependencies that span across many words (6+) words in a sentence
nested sentences
eg. "I don't recall whether I took the dog out" = "I don't recall" + "whether" + "I took the dog out"
subjectivity
eg. Joe believes that stocks will rise
center embedding
embedding a phrase in the middle of another phrase of the same type; one of the issues with context free grammars -- can recursively generate sentences that are too long for humans to understand and do not have a mechanism for bounded recursion
viterbi algorithm
find most likely sequence of states (usually POS tag) given words; uses dynamic programming and backpointers
The Noisy Channel Model
framework used in spellcheckers and translators to guess original/intended word that has been garbled; eg. guess original english sentence based on french translation. returns argmax P(e | f) = argmax P(f | e) P(e) = probability of the french translation given the english sentence we are considering (translation model) times the probability of the english sentence we are considering even occurring (language model)
backoff
going back to lower order n-gram model if higher-order model is sparse (eg. frequency <= 1)
phrase-structure grammars
grammars defined by phrase structure rules, not just rules for individual words
polysymous
having multiple meanings eg. ball
lexical semantics
how the meaning of lexical units (words or subwords) correlate with the structure of the language
discourse conventions
how to correctly construct a conversation
perplexity
how well a language model fits the data; nth root (1 / P(w1 w2 ... wn)) where n is the number of words in the sentence
state transition probability
in HMM, probability of current state given previous state(s)
emission probability
in HMM, probability of current word given state
baseline POS tagging method
label everything a noun
The Turing Test
language test invented by Alan Turing to tell machine and human apart; uses language as a test for human intelligence
compositional semantics
meaning of longer utterances and sentences
negation difficulty
no noun and noun -- does no apply to both nouns?
closed-class POS
non regularly accepting new members eg. pronouns, conjunctions
the certain event
omega
the impossible event
phi
P(states, words) =
pi_i P(si | si-1) P(wi | si) = probability up until this point times probability of state given previous state time probability of word given state
posterior probability
probability of event happening in light of other events/ calculations
prior probability
probability of event happening without any other evidence taken into account
maximum likelihood estimates
probability of n-gram = (number of times n-gram observed) / (number of times (n-1)-gram observed) eg. unigram = number of appearances of word / number of words; bigram = number of appearances of "with spinach" / number of appearances of "with"
problem with unigram model?
probability of sentence is the same even if you mix up order of the words
law of total probability
probability of something happening = sum of all the ways it could happen. in other words if B1 ... Bn is a partition of sample space S, Pr(A) = Pr(A, Bi) for all i < n
text preprocessing
processing text for running through algorithm; eg. removing ads and javascript, dealing with unicode, capitalization, word segmentation, sentence boundary recognition, etc.
probability of next word in a sentence =
product of the probabilities of all the preceding words
unconditional language universals
qualities that all languages have eg. all languages have verbs and nouns; all spoken languages have vowels and consonants
conditional language universals
qualities that all languages with a certain other quality have eg. if a language has inflection, it always has derivation
smoothing
reassigning some probability mass to unseen data
open-class POS
regularly accepting new members eg. nouns, verbs, adjectives
Penn Treebank
resource/corpus of words for training automatic parsers
burstiness
seeing rare word in document --> more likely to see it again
Hidden Markov Models
sequence of variables that depend on each other; dependent states are hidden
discrete sample space
sides of cube
forward algorithm
similar to the viterbi algorithm except instead of finding the maximum probability you sum all the probabilities
examples of diversity of language
some languages dont have articles; cases (such as in Latin); sound systems; social statuses; kinship systems
phonetics
study of speech sounds; helps model how words are pronounced in colloquial speech
morphology
study of the forms of words; how words are shaped and behave in context; studies prefixes, suffixes, roots, stems, etc.
event
subspace of the sample space
continuous sample space
temperature
collocation
the act of two words often appearing together to create a phrase eg. stock market
conditional probability
the probability of one even occurring given the other event occurred; P(A | B) = P(A, B) / P(B) ... posterior probability divided by prior
joint probability/the chain rule
the probability of two events occurring together/at the same time; P(A, B) = P(A|B) P(B) = P(B, A) = P(B|A) P(A)
pragmatics
the study of how language is used to accomplish goals; eg. polite language, indirect language/intent
semantics
the study of meaning
syntax
the study of the structural relationships between words
referential difficulty
two subjects, pronoun used later, which person does it refer to?
push down automata
type of automata that employs a stack and input tape; equivalent to a CFG
types v. tokens
types are distinct words, tokens are words
problem with picking a word at random to generate a sentence?
uniform probability for words
interpolation
use lambda1 * trigram prob + lambda2 * bigram prob + lambda3 * unigram prob
bottom-up parsing
uses rules from grammar to generate parse tree from the bottom up; labels terminals (words) with non-terminals and then tries to group based on grammar rules; ends up exploring options that don't lead to a full parse
top-down parsing
uses rules from grammar to generate parse tree from the top down; requires backtracking if tree generated does not translate into terminals (Words) in sentence; ends up exploring options that don't match full sentence
n-gram models
using markov assumption, only look at the previous n words eg. unigram (1), bigram (2), trigram (3)
problem with MLE values?
usually too small; use logarithm probability instead
syntactic ambiguity
when a sentence may be interpreted in more than one way due to ambiguous sentence structure eg. Call Joe a taxi
metonymy
when a thing or concept is referred to by something closely related to it eg. Boston is calling
assimilation
when one sound changes to accommodate another eg. NPR sounds like MPR
The Markov Property
when the conditional probability distribution of future states depends only on the current state and not on the events that came before it
prepositional phrase attachment ambiguity
when the same preposition is used for different meanings eg. Joe at pizza with pepperoni/with Samantha
morphological ambiguity
when two words appear to have same morphology (eg. same root) but do not eg. impossible = not possible but important != not portant
sense ambiguity
when word in sentence has multiple senses which can change interpretation of sentence eg. Joe took the bar exam
cognate
words that sound similar across different languages eg. plato and plate (Spanish and English)
markov assumption
you can only look at a limited history of preceding words and still get a good estimate for the probability of a sentence
