nlp exam 2

Ace your homework & exams now with Quizwiz!

Distributional Hypothesis

"If A and B have almost identitical environments we say that they are synonyms"

Semantic Similarity "fast" is similar to ...

"rapid" and "speed"

Skip-Gram (Context Embedding)

A column in the output. matrix

Lexeme

A pairing of a particular word form with its sense

Homonymy

A relation between concepts/senses the word "Bat" is a homonymy word because bat can be an implement to hit a ball or bat is a nocturnal flying mammal also.

Metonomy

A subtype of polysemy; one. aspect of a. concept is used. to refer to other aspects ofa. concept (or the concept itself) BUILDING <->. ORGANIZATION ANIMAL <-> MEAT

Encoder

Also uses only the final output vector yn, however, the final vector is treated as an encoding of the information in the sequence, and is used as additional information together with other signals. For example, an extractive document summarization system may first run over the document with an RNN, resulting in a vector yn summarizing the entire document. en, yn will be used together with other features in order to select the sentences to be included in the summarization.

Distributional Semantic Model

Any matrix M such that each row represents the distribution of a term x across context, together with a similarity measurement

ELMo (Embeddings from Language Models)

Computes contextualized word representations that are used as a stand-in for static word embeddings by training bi-RNNs that predict a token based on left and right contexts, then sums up the probability for each token

BERT (Bidirectional Encoder Representations from Transformers)

Consider the task of sequence tagging over a sentence x1; : : : ; xn. An RNN allows us to compute a function of the ith word xi based on the past—the words x1Wi up to and iincluding it. However, the following words xiC1Wn may also be useful for prediction, as is evident by the common sliding-windoe approach in which the focus word is categorized based on a window of k words surrounding it. Much like the RNN relaxes the Markov assumption and allows looking arbitrarily back into the past, the biRNN relaxes the fixed window size assumption allowing to look arbitrarily far at both the past and the future within the sequence.

Word Senses

DIfferent word senses of the same word can denote different concepts

Vauquois Triangle

Each of semantics, syntax, and phrases (morphology) is related to the other 2 - changes one can change the others

Word Alignments Steps 1

Figuring out word-to-word translations

Noisy Channel Model for MT

Given an observation in the source language, figure out what was said in the target language

Computational Semantics

How do ?we compute language meaning from word meaning

Skip-Gram model

Input is a single. word in one-hot rep, probability to see any single word as a context word

Stacked/Deep RNNs

Input is output from a previous RNN

Multi-Head Attention

Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. all layers have output dimensionality dmodel, but each linear projection can produce a different dimensionality for dk and dv.

Polysemy

Multiple. semantically related concepts correspond to the same word form The verb "get" is a good example of polysemy — it can mean "procure," "become," or "understand."

Word embeddings

Representations of words in a low dimensional dense vector space; should capture relationship between word and the context in running text

MT challenges in Syntax and Morphology

Sentence word order , word order in phrases, prepositions and case marking

Lexical Sample task

Small pre-selected set of target words. • Inventory of senses for each word.

Zipfian Distribution

Some terms appear too often, some are too rare

GPT fine tuning

Train GPT on this language modeling objective. But in the process, it has learned these representations for tokens in the context of its left tokens. can be used for the last token, so we have a representation of the entire sentence. So, we're going to train this model and then apply it to some specific error 2. Now can predict the sentiment.

Supervised Learning

Training data consists of training examples (x_1, y_1), ... , (x_n, y_n). where x_i is an input example(a. d-dimensional vector of attribute. values) and y_i is the label

Lexical substitution

Two lexemes are synonyms if they can be substituted for each other in a sentence, such that the sentence retains its meaning (truth conditions).

Synonyms

Two lexemes refer to the same concept. • couch/sofa • vomit/throw up • car/automobile • hazelnut/filbert • water/H2O

Acceptor

We decide our output only at the final output vector y_n -> for example, consider training an RNN to read the characters of a word one by one and then use the final state to predict the part-of-speech of that word; Typically, the RNN's output vector yn is fed into a fully connected layer or an MLP, which produce a prediction. e error gradients are then backpropagated through the rest of the sequence

Lexical Semantics

What is the meaning of individual words?

WordNet

WordNet is a lexical database containing English word senses and their relations. • Represents word sense as synsets, sets of lemmas that have synonymous lexemes in one context.

Semantic Similarity

Words that can be substituted for another

Semantic Relatedness

Words that occur nearby each other, but are not necessarily similar

Parallel Corpora

a type type of multilingual corpus that consists of two or more texts in different languages that are aligned at the sentence or phrase level

Gloss

dictionary def

Good translation needs to be...

faithful and fluent

Attention as Lookup

find the key that is most important, if multiple keys match, take a combination of them; the queries are the hidden states in the decoder, the keys are the hidden states of the encoder, the values are the hidden states in the encoder

Recurrent neural networks ...

take entire history into account

Skip-Gram (Target Embedding)

takes as input a word from a text corpus and tries to predict the surrounding words within a certain window of context. The model is trained on a large corpus of text and learns to represent each word as a vector in the same high-dimensional space.; A row in the input matrix

Zeugma

when a single word is used with two other parts of a sentence but must be understood differently (word sense) in relation to each. Does United serve breakfast and JFK? He lost his gloves and his temper.

The problem with RNNs for MT:

• For long phrases, fixed-length encoded representation becomes information bottleneck. • Not everything in the input sequence is equally important to predict each word in the decoder.

Extensions to Lesk Algorithm

• Often the available definitions and examples do not provide enough information. Overlap is 0. • Different approaches to extending definitions: • "Corpus-Lesk": Use a sense-tagged-example corpus, add context from example sentences. • Extended Gloss Overlap (Banerjee & Pedersen 2003): Add glosses from related words (hypernyms, meronyms, ...) • Use embedded representations for each word in the definition. Choose the sense with highest average vector similarity to the context.

Hyponymy

• One sense is a hyponym (or subordinate) of another sense if the first sense is more specific, denoting a subclass of the other. (IS-A relationship). • dog is a hyponym of mammal. • mammal is a hyponym of animal. • desk is a hyponym of furniture. • sprint is a hyponym of run. • The inverse relation is called hypernymy, so furniture is a hypernym (or superordinate) of desk.

Meronymy

• Part-whole relationship. • A meronym is a part of another concept. • leg is a meronym of chair. • wheel is a meronym of car. • cellulose is a meronym of paper. (substance meronymy) • The inverse relation is holonymy. Car is a holonym of wheel.

Antonyms

• Senses are opposites with respect to one specific feature of their meaning. • Otherwise, they are very similar! • dark / light • short / long • hot / cold • rise / fall • front / back Antonyms typically describe opposite ends of a scale, or opposite direction/ position with respect to some landmark (reversives). (level of luminosity) (length) (temperature) (direction) (relative position)

Simplified Lesk Algorithm

• Use dictionary glosses for each sense. • Choose the sense that has the highest word overlap between gloss and context (ignore function words). The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities.


Related study sets

Entrepreneurship Unit 1: What Is Entrepreneurship?

View Set

intro Digital forensics Final study

View Set

Introduction to Coordinate Systems

View Set

SEP - Foundationalist theories of epistemic justification

View Set

Microbiology Lab Midterm Unit 7 Supportive, Selective, Differential Media (chocolate, TSA, CNA, MacConkey

View Set