Naive Bayes and Sentiment Classification

¡Supera tus tareas y exámenes ahora con Quizwiz!

byte n-grams

where instead of using the multibyte Unicode character representations called codepoints, we just pretend everything is a string of raw bytes

binary multinomial naive bayes

whether a word occurs or not seems to matter more than its frequency. Thus it often improves performance to clip the word counts in each document at 1

feature selection

winnow down to the most informative 7000 final features

effect size

δ(x) a bigger δ means that classifier A seems to be way better than classifier B; a small δ means classifier A seems to be only a little better

cross-validation

we randomly choose a training and test set division of our data, train our classifier, and then compute the error rate on the test set

Classification

Deciding what letter, word, or image has been presented to our senses, recognizing faces or voices, sorting mail, assigning grades to homeworks; these are all examples of assigning a category to an input

development test set

to perhaps tune some parameters

stop words

very frequent words like the and a

Bayesian inference

The idea that our estimate of the probability of an outcome is determined by the prior probability (our initial belief) and the likelihood (the extent to which the available evidence is consistent with the outcome).

macroaveraging

we compute the performance for each class, and then average over classes

supervised machine learning

we have a data set of input observations, each associated with some correct output (a 'supervision signal'). The goal of the algorithm is to learn how to map from a new observation to a correct output.

confusion matrix

a table for visualizing how an algorithm performs with respect to the human gold labels, using two dimensions (system output and gold labels), and each cell labeling a set of possible outcomes

bag-of-words

an unordered set of words with their position ignored, keeping only their frequency in the document

Spam detection

another important commercial application, the binary classification task of assigning an email to one of the two classes spam or not-spam

bootstrap test

can apply to any metric; from pre- cision, recall, or F1 to the BLEU metric used in machine translation

microaveraging

collect the decisions for all classes into a single confusion matrix, and then compute precision and recall from that table

authorship attribution

determining a text's author— are also relevant to the digital humanities, social sciences, and forensic linguistic

model card

documents a machine learning model with information like: a) training algorithms and parameters b) training data sources, motivation, and preprocessing c) evaluation data sources, motivation, and preprocessing d) intended use and users e) model performance across different demographic or other groups and environmental situations

representational harms

harms caused by a system that demeans a social group, for example by perpetuating negative stereotypes about them

naive Bayes assumption

his is the conditional independence assumption that the probabilities P( fi|c) are independent given the class c and hence can be 'naively' multiplied

Naive Bayes unknown words

ignore them—remove them from the test document and not include any probability for them at all

Discriminative classifiers

like logistic regression instead learn what features from the input are most useful to discriminate between the different possible classes. While discriminative systems are often more accurate and hence more commonly used, generative classifiers still have a role.

Generative classifiers

like naive Bayes build a model of how a class could generate some input data. Given an observation, they return the class most likely to have generated the observation.

sentiment lexicons

lists of words that are pre- annotated with positive or negative sentiment

Recall

measures the percentage of items actually present in the input that were correctly identified by the system

Precision

measures the percentage of the items that the system detected (i.e., the system labeled as positive) that are in fact positive (i.e., are positive according to the human gold labels)

prior probability

our initial belief about the probability of an outcome

bootstrapping

refers to repeatedly drawing large numbers of smaller samples with replacement (called bootstrap samples) from an original larger sample

F-measure

simplest metric that incorporates aspects of both precision and recall.

multinomial naive Bayes classifier

so called because it is a Bayesian classifier that makes the simplifying (naive) assumption about how features interact

null hypothesis

supposes that δ(x) is actually negative or zero, meaning that A is not better than B

probabilistic classifier

tell us the probability of the observation being in the class

sentiment analysis

the extraction of sentiment, the positive or negative orientation that a writer expresses toward some object

likelihood

the fact of being likely to happen; something that is likely to happen

language id

the first step in most language processing pipelines

gold labels

the human-defined labels for each document that we are trying to match

character n-grams

the most effective naive Bayes features are not words at all,

p-value

the probability, assuming the null hypothesis H0 is true, of seeing the δ (x) that we saw or one even greater

text categorization

the task of assigning a label or category to an entire text or document

toxicity detection

the task of detecting hate speech, abuse, harassment, or other kinds of toxic language

statistically significant

the δ we saw has a probability that is below the threshold and we therefore reject this null hypothesis


Conjuntos de estudio relacionados

Music of the Beatles: Sgt. Pepper's Lonely Hearts Club Band

View Set

**AP Human Geography Chapter 2 Test - Multiple Choice** NPS

View Set

Social Media: Our Connected World Unit 6 Social Media and Marketing Part 1

View Set