NLP - Quiz #3

Ace your homework & exams now with Quizwiz!

How to train a Perceptron

1. have a decision function. 2. Start all our feature weights at zero 3. Loop over every piece of data and make a prediction

Time complexity for greedy decoding?

O(NC)

Span

a subsection of text

Stratification? why good for training? why bad?

artificially sampling data to achieve the desired balance of classes. like if u divide data into groups, make sure each group is well represented / diverse and BALANCED. good -- balanced representation, avoid bias. bad -- doesnt accurately represent overall distribution in of classes in entire data set.

Classification

assigning one or more classes to each item (AKA instance) by predicting the best class(es) for the item

Ontology

classes, properties and instances

text classification process

1. Choose an initial set of features, a representation of each piece of data (instance) 2. Train a model 3. Tune the model by measuring performance on the devset Adjust hyperparameters Change features Change model if needed 4. When you have a final model or set of models, evaluate performance on the test set

How much data goes into train, dev, and test?

80/10/10 split In general, train is largest, dev and test are approximately equally-sized

What's dynamic programming?

An efficient approach to breaking down problems to make a full solution from partial solutions.

Viterbi Decoding is ______________ programming

DYNAMIC PROGRAMMING!

chrF

Imagine you have a special way of checking how well a language robot talks, and you don't really care about individual words or spaces between them.

What's an entity type?

Entity types allow us to create classes of entities/mentions

GPE vs LOC

GPE are animate whereas LOC are inanimate

why more common to use a log-linear model like logistic regression for classification?

It can produce probabilities, and overall there's much greater control over the mechanics of optimization.

Time complexity for viterbi decoding?

O(N,C^2)

Micro/macroaveraging

Macroaverage: average across the per-class F1 scores • "Average of averages"• Gives all classes the same weight in the overall score. Microaverage: compute F1 across all data points Every data point contributes equally to F1, so the F1 score is biased by the distribution of the labels (common labels dominate)

Viterbi decoding data structures?

Matrix ; 2D list

Perceptron: principle of operation

Perceptrons work like a decision-making system. Imagine teaching a robot to recognize apples and bananas. For each fruit, the robot looks at features like color and shape, assigns importance (weights) to each feature, and adds them up. If the total is more than a certain value, the robot decides it's an apple; otherwise, it's a banana.

AI busts & booms?

The occurrence of AI booms and busts can be attributed to a combination of factors, including technological advancements, expectations, challenges, and broader societal and economic influences.

Perceptron: update rule

The perceptron learns from mistakes using the update rule. If it makes an error in classifying a fruit, it adjusts the weights assigned to features. For example, if it mistakes a red banana for an apple, it might reduce the importance it gives to color. This update rule helps the perceptron get better at making correct decisions.

caricature of NLP process?

Typically we divide the data into the training, development (AKA validation), and test sets. then we train models on train, tune/improve using dev, and do our final evaluation on test

How to get an optimal tag assignment?

Viterbi Decoding

Feature set

a combination of features that represents the input

Class

a label that can be assigned to an item

Mention

a span or section of text that refers to a specific entity

Viterbi decoding

examine all possible tag assignments left to right, at each point identifying the best previous state for each state and what the resulting score would be

how to choose a feature set?

experiment on the dev set / development set.

NER?

identifying entity names in text -- technology that helps computers identify and categorize specific, named things in text, such as people, places, dates, and organizations. It's like teaching a computer to recognize and understand important information in sentences.

Most common hyperparameters for discriminative models?

learning rate and regularization

Learning rate vs regularization?

learning rate is how much we change our parameters each time we update them/ how big a size step to take to be most affective. regularization is how we control the size of our parameters• Many models like to overfit by setting large parameters• Common regularization schemes shrink the parameters slightly every step

Greedy decoding

make your tagging decisions left to right, but decide the best tag immediately at each point (instead of waiting until the end of the sequence)

Features

often we convert the item being classified into features before classifying it

Parameters vs. Hyperparameters

parameters : like the INGREDIENTS of ML -- stuff you control internally that affect the outcome hyperparameters : the external things you can control to get the BEST outcome. not necessarily PART of the internal ingredients, but help in creating the best product/outcome.

Regression vs classification

regression is predicting a continuous outcome based on features whereas classification is assigning a label

Most common hyperparameter for generative models?

smoothing

whats an entity?

something we care about that we want to be able to refer to

cross validation? why not use it?

splitting your data into parts, training your model on some parts of the data and testing it on others to ensure your model performs well on all of the data. we dont use it much bc it requires hella training models.. .like 10

information extraction

teaching computers to find and understand important details, like names of people or dates, in sentences.

entity linking

the process of connecting or linking a named entity mentioned in text to a specific entry or identity in a knowledge base or database.

when would macro f1 score be undefined?

there are no true positives / precision and recall are zero

why use NER?

to figure out what entities are in the data set as well as entity linking and extracting information from input.

Machine translation (MT)

translates data from a source language to a target language, for example English to Spanish

BLEU

way to measure how well a machine translates one language into another... the higher the BLEU score, the better the translation!!


Related study sets

NU272 Case Study: Burns (week 3)

View Set

3. Physics Practice Questions Chapter 6

View Set

New York City Draft Riot note cards

View Set

Φυσική Α' Γυμνασίου

View Set