NLP Intro

¡Supera tus tareas y exámenes ahora con Quizwiz!

Loss

In a machine learning context, loss refers to a measure of how wrong a supervised model is.

Language model

In an NLP context, a language model is a model of the probability distribution of word sequences.

Encoding

In an NLP context, the encoding or charter encoding refers to the mapping from characters, e.g. "a", "?", to bytes.

Token

In an NPL context, a token is a unit of text, generally - but not necessarily - a word.

Sentiment

In an NPL context, sentiment is the emotion or option a human encodes in a language act.

Embedding

In the NPL context, an embedding is a technique of representing words (or other language elements) as a vector, especially when such a representation is produced by a neural network.

Regular expression

A regular expression is a string that defines a pattern to be matched in text.

Types of NLP functions

Parsing, named entity recognition (NER), Part of speech (POS) tagging, Sentiment analysis

Applications of NLP

Sentiment classification, Information retrieval/extraction & Q&A, NL interfaces/dialogue systems, Machine Translation

Grammar

Set of rules that describes what's allowable in a language

Knowledge base

A knowledge is a collection of knowledge or facts in a computationally usable format.

N-gram

An N-gram is a subsequence of words. Sometimes. "N-gram" can refer to a subsequence of characters

Neural Network

An artificial neural network is a collection of neurons connected by weights.

Alphabets

It is a phonetic-based writing system that represents consonants and vowels.

Collaborative filtering

It is the key technology typically used to implement recommender systems ("if you liked this movie, you might also like...").

Corpus

It the field of NLP, it is generally known as a text collection.

Natural language processing (NLP)

NLP is a field of computer science and linguistics focused on techniques and algorithms for processing data, continuing natural language.

Natural language

Natural language is a language spoken or signed by people. In contrast to a programming language which is used for giving instructions to computers. Natural language also contrasts with artificial or constructed languages, which are designed by a person or group of people.

Parts of speech (POS)

POS are word categories. The most well known are nouns and verbs. In an NLP context, the Penn Treebank tags are the most frequently used set of parts of speech.

Lemma

The citation form or dictionary form of a word

Knowledge base

___________ is a collection of knowledge or facts in computationally usable format.

True about TextBlob

· It is a library for processing textual data · provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. · TextBlob is an open-source python library for processing textual data

It is true about NLP

· Natural language lacks mathematical precision. · Nuances of meaning make natural language understanding difficult. · A text's meaning can be influenced by its context and the reader's "world view."


Conjuntos de estudio relacionados

AP Euro Dictatorships and the Second World War

View Set

Unit 1: Lecture 5: Bacterial Flagella

View Set

abnormal psychology test #2 chapters 5-9

View Set

The Autobiography and Poor Richard's Almanack

View Set

EVERYTHING (with multiple choice): 301-400

View Set

Patho Final Exam (Exams 1, 2, 3, 4)

View Set