Quiz 2 Language Meaning + Computational Approaches (week 7)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Linguistic Inquiry and Word Count (LIWC)

analysis tool that automatically categorizes text into psychological and linguistic categories; their idea of meaning is that word meaning comes out of its association and different things-the word dog-association is that it is an animal and it is a pet; mixed qualitative and quantitative elements to understand how our emotions, selves, and opinions "leak out" in our language

topic spaces

because LSA derives meaning from a word's relationship to other words within the corpus, we can find nuances of associations by comparing words with LSA from different corpora

How does LIWC define meaning

by breaking down each word into a bunch of different categories that describe the word; a bunch of different dictionaries-ex: the word "I" would be in pronoun category as well as self category-we: pronoun/self-she: pronoun/self/female-combination of different categories; can start to pick out what words means based off of existence in certain categories; LIWC has nearly 5,000 words in it which means that it is less susceptible to individual variability at word level

What else can LSA do with meaning?

can capture nuances of context; because word meaning is dependent upon the corpus of texts, we can feed it different corpora and see how it changes relationships or meaning in different topic spaces; word meaning dependent on corporas of text-change in different spaces-if take matrix of only history text and are evaluating the word "club" may mean something like mase or sword

Achievements of LSA

can perform at a high school level on multiple choice vocabulary test; automatically scores expository essays with over 85% accuracy compared to an expert human scorer; infers student comprehension through coherence(whether one paragraph flows from the previous one) better than human graders; mimic human performance seen in cognitive psychology

How does LSA actually work in step by step way

1. create a big word by document matrix (prep the pomegranate) 2. reduce dimensionality (run the juicer) 3. compare words (drink up and enjoy!)

acronym of LSA

Latent: exists but isn't yet developed or manifested; Semantic: related to meaning; Analysis:... analysis; trying to get at meaning; trying to get us to model something that is meaning

Challenges in NLP

complex problem in computer science right now because language can be complicated and messy; computer needs to learn these things to to be able to process and respond to individuals; not learning language the same way people are

NLP and linguistics

despite its critics, NLP provides unprecedented levels of insight into our language use and its meaning; with the growing amount of data from our digital lives, we're able to move outside the lab to understand behavior in the real world

corpus analysis

doing an analysis of a corpus; some large amount of text; corpus of exam essays; if you have more than one corpus you have a corpora; NLP relies on corpus analysis tools and massive input to help make sense of these messy, complex, rich data; calculate different features of language; obviously computers understanding of text is not the same as humans but can be useful

How do I use LSA?

don't have to crunch all the numbers; if know how to code in Python can do it; there is an online tool as well on website made in 2003; can take website and pick a topic space-ex: general readings-look at different age groups see how they understand the word dog differently; can also compare words to other words-similarity matrix; can also scale this up to compare larger sentences or texts

The Simple Motivation of LSA and document matrix

ex: dog may never even occur in the same document as either "parrot" or "pencil"; however parrot and dog may occur with similar words-"breathe", "eat","drink", "owner"; LSA is able to extract these deeper relationships by looking over hundreds or thousands of documents; would be able to identify that dog and parrot are more similar than a dog and a pencil because they are animals that breathe; Assumption: words that represent similar meanings or related concepts will occur with each other (on average) more than words that represent unrelated meanings or concepts; dog and cat will be in similar contexts and doctor and nurse

Qualitative/Quantitative approach of LIWC

gets a bit more "human" in its approach to NLP than LSA does; the way that they originally came up with LIWC is that they had human raters who created over 70 different categories of words; such as linguistics categories: pronouns, articles, tenses-psychological categories: family, friends, negative emotion, positive emotion; swear words

LIWC as Forensic Linguistics

has been applied all over the place--not just by linguists but also by psychologists, computer scientists and tech companies; the idea is that subtle linguistic usage, especially such things as stylistic (pronouns, prepositions, etc.) might serve as a kind of "linguistic forensic tool", that could detect some psychological states

LIWC and President Obama example

his inaugural speech; see differences in personal text versus formal text; the percentage of a speech that belongs in each category; lots of social words compared to more personal words; less negative emotions; how formal is president obama? some similarities to formal and informal text-

"Linguistically leaky"

how much do people refer to themselves? pronouns can be incredibly insightful; turns out liars tend to use language in a way to minimize first person singular; if they are looking into ways to develop better deception algorithms; used to help diagnose different mental health issues based on the way you are speaking; people who are more neurotic use more negative language; measure how people are changing/aging; if you plug in tweets from Kanye and Tom Brady can see how much tweet shows depressed state, happiness, worriness, personable, analytic, space valley girl

How does LSA work: Map analogy

huge map of text information; hard to make sense of and tries to distill it down into something more functional; ex: taking a pomegranate which is really messy and distilling it down into pomegranate juice; takes the corpus/text files in a folder and making that map; these words always go together so they must mean something similar to each other; meaning is coming out of relationships

latent semantic analysis

in order to understand NLP and language meaning; proposed this method to use large amounts of texts to build a model of meaning; idea was to try and see if you can get a computer to get an actual understanding of what we mean by word meanings; NLP analysis method that uses vast amounts of text to build a semantic model that allows us to compare words to each other in terms of their meaning; almost completely on the quantitative side of the spectrum

limitations of LSA

it can't handle syntax-the same words arranged in different ways will be read as identical-the dog ate my homework and the homework ate my dog are read the same; relies on orthography(spelling)-can't handle homonyms (same spelling different meaning); difficulty with antonyms-will think love and hate are more similar than love and admiration because love and hate go together more often

How does LSA determine what meaning actually is

it goes from a huge amount of text data down to a distilled representation of word meaning in the form of a vector space or map; in this space, words don't have meaning on their own but their meaning comes out of relationships to other words; things that are meaningful together will go together ex: cat and dog; car and break; break and work

Famous and Controversial studies using LIWC

massive scale emotional contagion through social networks; manipulate facebook feeds-positive/emotional contact-see if this ends up changing what people are posting

Step 1 of how LSA works: word by word document matrix

might have little corpus-words on y-axis and file or document on x-axis-put whether or not word occurred in those files; the cells would represent how often a word occurs in each file (darker=higher occurrences); which text file contains the word dog in it?

NLP

natural language processing; computer technique that can be used to analyze language using computers; used widely with tech companies; analyzing data being able to tell things about you using language; building robots or teachable agents to tutor people

Step 3 how how LSA works

once you have dimensions you can compare words and get distance measures between two different words; the dimensions are now the space in which words "live" and can be related; quantify similarity as cosine of the two word vectors in our lower dimension-there is a number you can use to see how similar words are-cat and dog cosine is more closer to one because they are more similar-dog and airplane-cosine will be closer to 0

Limitations of LIWC

similar to LSA-can't deal with syntax; can't deal with homonyms; some linguists have expressed concerns about the meaning of the actual categories; humans came up with categories-could be flawed-ex: cognitive mechanism category (words like think and know) is it actually tapping into what we think are cognitive mechanisms?

Starting point of LSA

suppose we have this big corpus where all the language someone would encounter in their life; a computer with power that could match that in a clever learning algorithm; could it learn the meanings of all the words in any language it was given?; philosophers, linguistics, psychologists talk about meaning but never able to quantify it; there may be a technique to be able to try and see what meaning actually is

The problem with LSA

the cells in a word by document matrix are mostly empty-which creates difficulty in relating word meaning; difficult to interpret; called the data sparcity problem-this statistical technique that acts like a juicer or map-maker by extracting the major trends/relationships among words in the matrix; gets rid of the stuff that is not meaningful;

inter-rater reliability in LIWC

the process of comparing judge categorizations to determine how closely judges agreed. Where they do not have a majority vote, they either removed a word or negotiated to come to an agreement. Ultimately LIWC's scores are based on over 90% agreement by judges

What is the assumption of LIWC

the type and amount of different kinds of language that we use provides a window into our cognition and behavior--and even pathologies

How do you use LIWC; what questions might you ask?

use LIWC's hand coded categories to quantify texts-how much do people refer to themselves when talking, texting, or through email; how often do people use social references; how often are people sad

Qualitative methods of LIWC

used qualitative coding from human judges to pick the words and get these dictionaries, pick the words, and categories and categorize the words; good agreement amongst the raters; judges achieve inter-rater reliability

How can we quantify meaning-what can LSA be applied to?

using LSA; LSA is a cool technique-been used in a lot of cool ways; help to do automated grading of papers-used to pass MCAT exam

Step 2 of how LSA works: reduce dimensionality

you have matrix in order to deal with data sparsity problem they use technique of singular value decomposition; taking big matrix and reducing it down into more meaningful dimensions


Ensembles d'études connexes

Biology 121 Chapter 9 Questions, BIOL 410 Exam 2

View Set

C1 S10: Unit 3: Performance and Discharge of Contracts in Texas

View Set

Fundamentals Test 5 Practice Questions

View Set

Chapter 4: Civil Liberties and Public Policy

View Set