LIN 120 Final *Flipped Side*

Ace your homework & exams now with Quizwiz!

an inventory of what there is like a dictionary but gives more information like structure, attributes, semantic roles an example of an ontology is Propbank. It defines the word "buy" as "purchase" and defines roles like buyer, thing bought, seller, price paid, etc. also gives "accept as truth" as definition, with the roles believer and thing believed.

Ontology

Coreference

The first sentence has information for the next. Paul was poor. John bought him a car.

use mixed n-grams. if sentence contains the mixed 2-gram Verb you (ie: let you, did you) it's not an ODP if sentence contains the mixed 2-gram Verb me (ie: send me, tell me) it's an ODP

an example of how to distinguish if a sentence is an ODP

0

given a word Fenster in german, it's plural form is Fenster what is the gold class?

machine translation automatic speech recognition handwriting recognition

uses of N-gram models

sample = sample.capitalize()

capitalize the first character of the words in the string sample

words written the same with different meaning

homograph

reduce useless variation in results

what is the goal of an activation function?

Garden Path Sentence

what is this sentence an example of The horse raced past he barn fell.

[]

["John", "Mary", "Sue"][1:1]

'Mary'

["John", "Mary", "Sue"][1]

words pronounced the same with different meaning

homophone

a process in which a computer can generalize from seen data

machine learning

demonstrates that NLP tools can be used to study interesting questions about how people use language

the significance of studying ODP of female and male superiors

all letters of the alphabet letters of other alphabets digits not white spaces not special characters

what does \w match

the linear functions of their input input can be represented graphically using a line which separates the two classes of data points

what does a perceptron compute

Bow of a present, bow and arrow, beau as in male beloved (all pronounced the same)

what is an example of 3-way ambiguity in spoken language?

Bow of a ship, Bow to the king, Bow of a present, Bow an arrow (all spelled the same)

what is an example of 4-way ambiguity in written language?

import re def digits(string): return re.findall(r"[0-9]+", string)

what is another way to write this code: import re def digits(string): return re.findall(r"\d+", string)

def print_first_last(n): print(hamlet[:n]) print(hamlet[-n:])

Write a small custom function print_first_last that prints the first n and last n words of hamlet

['b','c',d','e']

list= ["a","b","c","d","e","f"] print(list[1:5])

['My', 'phone', 'number', 'is', '555-123-4567']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\S+", string) return token_list tokenize("My phone number is 555-123-4567")

['Stalag', '17', 'might', 'be', 'Billy', "Wilder's", 'best', 'movie!']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\S+", string) return token_list tokenize("Stalag 17 might be Billy Wilder's best movie!")

['True', 'music', 'aficionados', 'listen', 'to', 'Taylor,', 'Harry,', 'and', 'Drake...']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\S+", string) return token_list tokenize("True music aficionados listen to Taylor, Harry, and Drake...")

['My', 'phone', 'number', 'is', '555', '123', '4567']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\w+", string) return token_list tokenize("My phone number is 555-123-4567")

['Stalag', '17', 'might', 'be', 'Billy', 'Wilder', 's', 'best', 'movie']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\w+", string) return token_list tokenize("Stalag 17 might be Billy Wilder's best movie!")

['True', 'music', 'aficionados', 'listen', 'to', 'Taylor', 'Harry', 'and', 'Drake']

what is the output of this code? def tokenize(string): token_list = re.findall(r"\w+", string) return token_list tokenize("True music aficionados listen to Taylor, Harry, and Drake...")

a very simple but effective machine learning algorithm easy to implement in python basis for neural networks and deep learning we can interpret the models (the sequences of weights) learned by perceptrons

what is the perception

['antler', 'beast', 'cat', 'deer', '👍']

word_list = ["cat", "antler", "👍", "deer", "beast"] print(sorted(word_list))

Yes No No Yes No Yes Yes Yes

Are these examples of ODP: "Please give me your views ASAP." a student emails a teacher "I need my grade today." "can you believe this bloody election?" "can you please keep me in the loop" "Enjoy the rest of your week!" "I need the answer ASAP" "Would you work on that" "Call me on my cell later"

requests that create constraints on its response so you can't say no. person requesting most be higher up. for example, I need the report today. vs Do you think you can send the report today?

ODP (Overt Display of Power)

words that are the same, have the same spelling, and same pronunciation but different meaning

What do these sentences demonstrate? John bought the car. John bought the story.

homophone

What do these sentences demonstrate? John went to the bank to deposit money. John drank from the river bank.

Cognitive State- how confident the speaker is

What do these sentences demonstrate? John will leave tomorrow. Mary says John will leave tomorrow. I hope John will leave tomorrow

polysemy

What do these sentences demonstrate? The book fell on the floor. The book tells the story of world war 2.

homograph

What do these sentences demonstrate? The bow of the ship was torn. The ribbon was tied in a bow.

homograph and homophone

What do these sentences demonstrate? The ribbon was tied in a bow. Katniss used a bow and arrow.

implicatures: we know you bought 2, not 3, cause if you bought 3, you would have said 3.

What does this sentence demonstrate? I bought two pencils.

implicatures: we know sandy is not lover or spouse, cause if she was, it would have been said. (demonstrates Grice's maxims, that acquaintance<friend<lover<spouse)

What does this sentence demonstrate? Sandy is a friend.

matching anything that is not matched by \d

\D

matching anything that is not matched by \s

\S

matching anything that is not matched by \w

\W

matches digits

\d

matches whitespace (and tabs)

\s

matches word characters

\w

vase_entry = dict([]) vase_entry["POS"] = "noun" vase_entry["definition"] = "A container for flowers." vase_entry["plural"] = "vases" english_dictionary["vase"] = vase_entry

add the word "vase" to the dictionary along with its POS, definition, and plural english_dictionary = dict([])

there are not so many possible part-of-speech n-grams because there are far fewer parts of speech than words. So we can have longer n-grams (4-grams, 5-grams) and still do computation on them.

advantage of part-of-speech n-grams

the sentence the boy saw a girl with a telescope the "with a telescope" can either be connected to the verb "saw" or the noun phrase "a girl"

an example of ambiguity using trees

meaning of all sentences in text + common sense/background knowledge + inference

aspects required for deep understanding

def clean_up (reply): reply = re.sub(r"[\.\?!,;]", r"", reply) return reply.lower()

create a function that cleans the reply of the user

def tokenize(the_string): new_string = re.sub(r"(?=[\.\?!,;])", r" ", the_string) token_list = re.findall(r"\S+", new_string) return token_list

create a function that tokenizes "Sue, stop!" as ["Sue", ",", "stop", "!"], and "Sue and Bill..." as ["Sue", "and", "Bill", ".", ".", "."]

BOS the = BOS Det the dog = Det Noun-sg dog barked = Noun-sg Verb-past barked at = Verb-past Prep at the = Prep Det the black = Det Adj black cat = Adj Noun-sg cat . = Noun sg Punc . EOS = Punc EOS

create a part of speech n-gram for the dog barked at the black cat.

import re def digits(string): return re.findall(r"\d", string)

define a function that returns a list of all the individual digits in a string ie: string = "James Madison had 0 sons and fought the war of 1812." => ['0', '1', '8', '1', '2']

import re def digits(string): return re.findall(r"\d+", string)

define a function that returns a list of all the numbers in a string ie: string = "James Madison had 0 sons and fought the war of 1812." => ['0', '1812']

import re def tokenize(string): return re.findall(r"\w+", string)

define a function that tokenizes a string by word characters

def tokenize(string): token_list = re.findall(r"\S+", string) return token_list

define a function that tokenizes a string on spaces

tokens = re.findall(r"\w+", str.lower(string)) counts_tokens = Counter(tokens) del counts_token["the"]

delete the stop word the from a string

A.

dictionary = dict([]) dictionary["a"] = "A." dictionary["b"] = "B." dictionary["c"] = "C." dictionary["d"] = "D." print(dictionary["a"])

{1: 'The number one.', 2: 'The number two.', 3: 'The number three.', 4: 'The number four.'}

dictionary = dict([]) dictionary[1] = "The number one." dictionary[2] = "The number two." dictionary[3] = "The number three." dictionary[4] = "The number four." print(dictionary)

cannot do morphology: even though dogs is just the plural of dog, they would have to be represents as different vectors no representation of meaning: travel and voyage are similar but cannot be represented as such through vectors huge vectors in real life: can only consider so many words, or vectors too large

disadvantages of one-hot vectors

['box', 'pan', 'vase']

english_dictionary = dict([]) english_dictionary["vase"] = "A container into which we can put flowers, and water to keep them fresh." english_dictionary["pan"] = "A flat container used to cook food in." english_dictionary["box"] = "An enclosed container, usually of wood or cardboard." print(sorted(english_dictionary))

[('box', 'An enclosed container, usually of wood or cardboard.'), ('pan', 'A flat container used to cook food in.'), ('vase', 'A container into which we can put flowers, and water to keep them fresh.')]

english_dictionary = dict([]) english_dictionary["vase"] = "A container into which we can put flowers, and water to keep them fresh." english_dictionary["pan"] = "A flat container used to cook food in." english_dictionary["box"] = "An enclosed container, usually of wood or cardboard." print(sorted(english_dictionary.items()))

b ['a'] ['b'] ['c', 'd', 'e', 'f'] ['f'] [] ['a', 'b', 'c', 'd', 'e', 'f']

example_list = ["a", "b", "c", "d","e","f"] print(example_list[1]) print(example_list[:1]) print(example_list[1:2]) print(example_list[2:]) print(example_list[5:6]) print(example_list[10:]) print(example_list[0:100])

final syllable (unaccented schwa, unaccented closed...) gender (masculine, fem, ntr) alveolar (true or false)

features used in decision tree training

Counter({'is': 2, 'a': 2, 'this': 1, 'sentence': 1, 'and': 1, 'that': 1, 'tree': 1}) [('a', 2), ('and', 1), ('is', 2), ('sentence', 1), ('that', 1), ('this', 1), ('tree', 1)]

from collections import Counter word_count = Counter(["this", "is", "a", "sentence", "and", "that", "is", "a", "tree"]) print(word_count) print(sorted(word_count.items()))

sentence that is expected to end, but then has a verb at the end. difficult for both humans and computers to understand.

garden path sentence

dictionary["pan"]["plural"]

given a dictionary with words, their POS, definition, and plural get the plural form of the word pan

(0, 1, 0, 0, 0, 1, 0, 0)

given a word Frage in german, it's final syllable is represented as (0, 1, 0, 0) it's gender is (0, 1, 0) it's alveolar is (0) what is the full input representation?

1

given a word Frage in german, it's plural form is Fragen what is the gold class?

dog= (1,0,0,0) cat= (0,1,0,0) bark= (0,0,1,0) run= (0,0,0,1) dog + bark = (1,0,1,0) cat + run = (0,1,0,1) dog + run = (1,0,0,1 sum = (2,1,1,2)

given the words dog cat bark run. use one-hot vectors to find the vector for dog bark, cat run, dog run

for word in stopwords: del counts_hamlet[word]

given this list of stop words, remove all of them from counts_hamlet stopwords = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they"]

combine multiple perceptrons into a complex architecture. this is called neural machine learning. the output from one perceptron becoming the input to the next

how can we extend perceptrons to do more complex tasks?

annotate the data: identity desired predictions you want system to learn find features of the data that may help the system generalize training: run on given data test performance on new data make changes *rinse and repeat*

how does machine learning work

prob(BOS the) = count of BOS the / count of BOS *anything* prob(the dog) = count of the dog / count of the *anything* prob(dog ran) = count of dog ran / count of dog *anything* prob(ran faster) = count of ran faster / count of ran *anything* prob(faster .) = count of faster . / count of faster *anything* prob(. EOS) = count of . EOS / count of . *anything* (multiplied together)

how to calculate prob(BOS the dog ran faster . EOS)

Female superiors use fewer ODP than male superiors in interactions with their subordinates

hypothesis of ODP in terms of gender

sample = sample.title()

make the first letter of a the string sample uppercase

sample = sample.lower()

make the string sample lowercase

instead of naming the part of speech for closed-class parts of speech, just list the word ie: for the dog, the mixed 2-gram would be: the Noun-sg

mixed 2-grams

how the input is represented in machine learning for instance, gender: masculine = (1, 0, 0) feminine = (0, 1, 0) neater = (0, 0, 1)

one-hot encoding

given a set of vocabulary words, give each a vector.

one-hot vectors

open-class parts of speech: nouns, verbs, adjectives, adverbs are constantly new ones, ever-changing. closed-class parts of speech, prepositions, determiners, pronouns are permanent, new ones are not created.

open-class parts of speech vs closed-class parts of speech

words that are identical but there a multiple "versions" of the word. think book: the physical vs the story.

polysemy

for character in string: print(character)

print each character in a string one line at a time

tokens = re.findall(r"\w+", str.lower(string)) counts_tokens = Counter(tokens) print(Counter.most_common(counts_tokens, 10))

print the 10 most common tokens of a string, along with the counter of the frequencies of each token

print(sum(Counter.values(Counter(["a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "c", "c"]))) / len(test_counter))

print the average number of word tokens per type given: test_counter = Counter(["a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "c"])

print(sorted(numbers_list))

print the sorted list numbers_list = [23, 8, 98, -2, 1330 ]

import re from collections import Counter tokens = re.findall(r"\w+", str.lower(string)) print(Counter(tokens))

print the tokens of a string, along with the counter of the frequencies of each token

input data (a word) is represented by a sequence of numbers, created using the words features. data has a sequence of corresponding weights, which always starts as 0, then you make changes. data * corresponding weight + constant bias = raw prediction value. if raw>0. normalized prediction is 1, if raw<=0, normalized prediction is 0. there's a gold label: the goal value. (in our example, the gold label was whether or not the plural of a word ends in n) if prediction= gold, do nothing. if prediction = 0, gold =1: increase "1" weights by 0.01. if prediction = 1, gold = 0: decrease "1" weights by 0.01.

procedure for the perceptron

import re sample = re.sub(r"\.", r"", sample)

remove every period in the string sample

import re sample = re.sub(r"[abcd]",r"", sample)

remove the letters abcd from the string sample

import re sample = re.sub(r"\w", r"*", sample)

replace all letters with a * in the string sample

import re sample = re.sub(r"!", r"?", sample)

replace every ! with ? in the string sample

import re sample =re.sub(r"[abD!\?]", r"X", sample)

replace every a, b, D, !, or ? character with an X in the string sample

import re sample = re.sub(r".", "?", sample)

replace every character with ? in the string sample

import re sample = re.sub(r"[?!]+", r".", sample)

replace every sequence of punctation marks !?!?!!!!!??? with a period in the string sample

print(list[0:1])

show the first two elements of a list

be able to label whether a sentence contains an ODP or not machine learning

significance of studying ODP

Counter({'d': 4, 'c': 3, 'b': 2, 'a': 1})

string= "a b b c c c d d d d" tokens = re.findall(r"\w+", str.lower(string)) print(Counter(tokens))

['she', 'liked', 'seafood', 'and', 'her', 'husband', 'liked', 'beef']

test_string = "She liked seafood, and her husband liked beef." print(re.findall(r"\w+", str.lower(test_string)))

if (word in english_dictionary.keys()):

the if statement to determine if a word is in the dictionary


Related study sets

Audit Ch 15, Audit Chp 15, Audit Quiz 4-12 Debt equity, Ch 10 Connect Quiz, Audit Ch 15, Audit Ch 15, CH 15 (TB), AC 432 Chp 15, chapter 17, Audit Ch 15, Audit Ch 15, Chapter 15, Audit Ch 15, Audit Ch 15, Chapter 15 Exam 2, Audit Ch 15, Audit Ch. 8 M...

View Set

ECO100: Producers in the Long Run

View Set

510 Chapter 3. Adjusting Accounts for Financial Statements

View Set

Chapitre 1 : Les différentes structures organisationnelles et les types d'organisations existants

View Set

DNA- The Building Blocks of Life

View Set

Honors Chemistry Chapter 9 Assessment

View Set

Social Work and Addictions Ch 12

View Set