Natural Language Processing PART 1: EMBEDDING

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

When preprocessing our bags of words, we can use ______________________ and _______________________ to either make each bag longer or shorter to have the same length.

Padding, truncating

inverse document frequency - How to calculate?

The significance of a term within the corpus. IDF(t) = log(D/t) - D = total number of documents - t = number of documents with the term

What is the disadvantage of embedding layers versus recurrent neural networks?

Embedding layers do not consider the sequence (order) of words to consider the output

Word Embeddings

Embeddings are clusters of vectors in multi-dimensional space, where each vector represents a given word in those dimensions

During tokenizing, why is it not good enough to assign every letter to numbers, or words to numbers?

In each case, you may face issues with ordering. For example, just relying on letters can result in problems if you have anagrams. (i.e "taste" and "state" have the same letters but different orders) The issue can happen to sentences too. (i.e "I walked my dog" and "dog walked my I"

Lemma (In the context of NLP)

A list of tokens in a message

term frequency-inverse document frequency (TF-IDF)

A measurement used to determine how important a keyword is on a document by comparing its frequency and usage on other documents.

Corpus

Group of all the documents

(Udacity quiz question) Why might being able to visualize word embeddings be useful?

I can verify that my sentiment model is really learning the meaning of words by viewing where they appear versus other similar sentiment words. For example: Words like "famous" and "famously" might be tokenized similarly, with perhaps an extra subword at the end for the "-ly".

Term Frequency

The number of times a term appears in a document

Tokenization

The process of assigning numerical values to strings, whether each letter, word, or phrases have numerical tokens

A document represented as a vector of word counts is called a ________________________________

bag of words

We can use ________________________ similarity to determine similarities between two bags of words

cosine

Naive Bayes classifiers assume that the value of a particular feature is ___________________________ of the value of any other feature, given the class variable (https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

independent


Ensembles d'études connexes

algebra 2a - unit 4: more about polynomial functions

View Set

Quiz: Information Literacy and How Novices Become Experts

View Set

Quadratic Functions: Vertex Form Quiz

View Set

Chapter 20: Transferring the Closely Held Business

View Set

Physical Assessment: Body Systems

View Set