NLP Intro
Loss
In a machine learning context, loss refers to a measure of how wrong a supervised model is.
Language model
In an NLP context, a language model is a model of the probability distribution of word sequences.
Encoding
In an NLP context, the encoding or charter encoding refers to the mapping from characters, e.g. "a", "?", to bytes.
Token
In an NPL context, a token is a unit of text, generally - but not necessarily - a word.
Sentiment
In an NPL context, sentiment is the emotion or option a human encodes in a language act.
Embedding
In the NPL context, an embedding is a technique of representing words (or other language elements) as a vector, especially when such a representation is produced by a neural network.
Regular expression
A regular expression is a string that defines a pattern to be matched in text.
Types of NLP functions
Parsing, named entity recognition (NER), Part of speech (POS) tagging, Sentiment analysis
Applications of NLP
Sentiment classification, Information retrieval/extraction & Q&A, NL interfaces/dialogue systems, Machine Translation
Grammar
Set of rules that describes what's allowable in a language
Knowledge base
A knowledge is a collection of knowledge or facts in a computationally usable format.
N-gram
An N-gram is a subsequence of words. Sometimes. "N-gram" can refer to a subsequence of characters
Neural Network
An artificial neural network is a collection of neurons connected by weights.
Alphabets
It is a phonetic-based writing system that represents consonants and vowels.
Collaborative filtering
It is the key technology typically used to implement recommender systems ("if you liked this movie, you might also like...").
Corpus
It the field of NLP, it is generally known as a text collection.
Natural language processing (NLP)
NLP is a field of computer science and linguistics focused on techniques and algorithms for processing data, continuing natural language.
Natural language
Natural language is a language spoken or signed by people. In contrast to a programming language which is used for giving instructions to computers. Natural language also contrasts with artificial or constructed languages, which are designed by a person or group of people.
Parts of speech (POS)
POS are word categories. The most well known are nouns and verbs. In an NLP context, the Penn Treebank tags are the most frequently used set of parts of speech.
Lemma
The citation form or dictionary form of a word
Knowledge base
___________ is a collection of knowledge or facts in computationally usable format.
True about TextBlob
· It is a library for processing textual data · provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. · TextBlob is an open-source python library for processing textual data
It is true about NLP
· Natural language lacks mathematical precision. · Nuances of meaning make natural language understanding difficult. · A text's meaning can be influenced by its context and the reader's "world view."