Natural Language Processing PART 1: EMBEDDING
When preprocessing our bags of words, we can use ______________________ and _______________________ to either make each bag longer or shorter to have the same length.
Padding, truncating
inverse document frequency - How to calculate?
The significance of a term within the corpus. IDF(t) = log(D/t) - D = total number of documents - t = number of documents with the term
What is the disadvantage of embedding layers versus recurrent neural networks?
Embedding layers do not consider the sequence (order) of words to consider the output
Word Embeddings
Embeddings are clusters of vectors in multi-dimensional space, where each vector represents a given word in those dimensions
During tokenizing, why is it not good enough to assign every letter to numbers, or words to numbers?
In each case, you may face issues with ordering. For example, just relying on letters can result in problems if you have anagrams. (i.e "taste" and "state" have the same letters but different orders) The issue can happen to sentences too. (i.e "I walked my dog" and "dog walked my I"
Lemma (In the context of NLP)
A list of tokens in a message
term frequency-inverse document frequency (TF-IDF)
A measurement used to determine how important a keyword is on a document by comparing its frequency and usage on other documents.
Corpus
Group of all the documents
(Udacity quiz question) Why might being able to visualize word embeddings be useful?
I can verify that my sentiment model is really learning the meaning of words by viewing where they appear versus other similar sentiment words. For example: Words like "famous" and "famously" might be tokenized similarly, with perhaps an extra subword at the end for the "-ly".
Term Frequency
The number of times a term appears in a document
Tokenization
The process of assigning numerical values to strings, whether each letter, word, or phrases have numerical tokens
A document represented as a vector of word counts is called a ________________________________
bag of words
We can use ________________________ similarity to determine similarities between two bags of words
cosine
Naive Bayes classifiers assume that the value of a particular feature is ___________________________ of the value of any other feature, given the class variable (https://en.wikipedia.org/wiki/Naive_Bayes_classifier)
independent