Analytics II Natural Language Proccessing
How are prescriptive analytics methods different from the other two types?
"what to do?" queries, not "what-is?" queries.
Imagine, you are solving a classification problem with highly imbalanced classes. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case? (1) Accuracy metric is not a good idea for imbalanced class problems. (2) Accuracy metric is a good idea for imbalanced class problems. (3) Precision and recall metrics are good for imbalanced class problems. (4) Precision and recall metrics aren't good for imbalanced class problems. a. 1 and 3 b. 1 and 4 c. 2 and 3 d. 2 and 4
(1) Accuracy metric is not a good idea for imbalanced class problems. (3) Precision and recall metrics are good for imbalanced class problems.
Suppose we train a model to predict whether an email is Spam or Not Spam. After training the model, we apply it to a test set of 500 new emails and the model produces the following table. What is the precision of this model?
70%
Overfitting occurs when: A. The training data is good but the testing data is bad B. The model is too simple C. Both the training and testing data is bad D. Both the training and testing data is good
A. The training data is good but the testing data is bad
Which of the following makes a neural network non-linear?
Activation function
Choose form the following areas where NLP can be useful.
All of the above
Convolutional Neural Network is used in____ a. Image classification b. Text classification c. Computer vision d. All of the above
All of the above
Convolutional Neural Network is used in_________ a. Image classification b. Text classification c. Computer vision d. All of the above
All of the above
How to Improve a Model Performance? A Add more data to Improve data quality B Choose more advanced models C Search for most appropriate parameters D All of the above
All of the above
What are the common types of data analytics? A Predictive. B Descriptive. C Prescrptive. D All of the above.
All of the above
What is learning in deep learning? a. Learning, in the context of machine learning, describes an automatic search process for better representation b. A process that is "learned" from exposure to known examples of inputs and outputs c. The procedure of finding the weights that minimize the loss function d. All of the above e. None of the above
All of the above
Which of the following algorithm have adopted a pre-training paradigm? a. GPT-3 b. BERT c. Autoencoder d. All of the above e. None of the above
All of the above
What is the purpose of a loss function? Calculate the error value of the forward network a. Calculate the error value of the forward network b. Optimize the error values according to the error rate c. Both A and B d. None of the above
Both A and B
What is the use of forward propagation? A. Calculate errors B. Train the model C. Improve the accuracy of the model D. Recalculate the weight of the parameter
Calculate errors
Which data analytics application partitions a collection of objects into non-predefined groupings with similar features?a) Classification
Clustering
Facebook's facial recognition is an example of _________.
Computer vision
What is NOT true about data mining? a. Data analytics is defined as the procedure of extracting information from huge sets of data to support business decision-making b. Data analytics also involves other processes such as Data Cleaning, Data Integration, Data Transformation c. Data analytics is the procedure of mining knowledge from data d. Data analytics will always improve business decision-making e. None of the above
Data analytics will always improve business decision-making
Data obtained during customer review can be used in which stage of data analytics? A Descriptive Analytics B Predictive Analytics C Prescriptive Analytics D None of Above
Descriptive Analytics
Which of the following is a common use of unsupervised clustering?
Detect outliers
Over a N number of experiments what will be the average value of a random variable- A Mean Squared Error B Expected Value C Mean Absolute Error D Area Under Curve
Expected Value
A model with 91% accuracy suggests that the model must work very well.
False
Adding nonlinearity allows the model to express simple nonlinear functions A. True B. False
False
Clustering is the most common type of supervised learning? A. True B. False
False
Dropout can be used to prevent underfitting problems during the deep learning training process.
False
Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network.
False
One Hot Representations use continuous vectors to retain syntactic and semantic word relationships? A. True B. False
False
Text summarization is the process of assigning a topic label to a piece of text.
False
Topic Modeling is a type of supervised learning. A. True B. False
False
True/false question: One of the pros of training language models all prior context information is helpful
False
True/false question: Skip-gram (SG): predict center word from context/surrounding words
False
Which of the following is FALSE about Deep Learning and Machine Learning?
Feature Extraction needs to be done manually in both ML and DL algorithms
What is the purpose of gradient decent? A. Minimize training time B. Compress data hidden layers C. Find the parameters that can give lowest errors D. Train the model to learn new problems faster
Find the parameters that can give lowest errors
Consider the scenario. The problem you are trying to solve has a small amount of data. Fortunately, you have a pre-trained neural network that was trained on a similar problem. Which of the following methodologies would you choose to make use of this pre-trained network?
Freeze first several layers and fine-tune the retaining layers on the new dataset
how a model applies to new data that has not been used to build the model , This is the definition for? a/ Underfitting b/ Bad Generalization c/ Generalization
Generalization
Word2vec is used to_______
Generate vectors out of words
In which scope RNN is used to achieve the best results? A Handwriting and speech recognition B Handwriting and images recognition C Speech and images recognition D Financial predictions
Handwriting and speech recognition
Assume we have built a spam-email detection system. Simon doesn't even know where the "Junk" directory is. He would much prefer to see spam emails in his inbox than to miss genuine emails without knowing. Which of the following evaluation metrics is most important for Simon?
High precision
The main reason why text data is particularly hard to analyze is
Is unstructured and difficult to represent effectively
Which of the following is true about max-pooling?
It allows the network to reduce the number of parameters and in the meantime, to retain the relevant information as much as possible
Which of the following sentence is FALSE regarding regression? a. It relates inputs to outputs b. It is used for prediction c. It may be used for interpretation d. It discovers causal relationships
It discovers causal relationships
What is deep in deep learning?
It stands for the idea of successive layers of representations in deep learning
knowledge
Knowledge: use information for a given task
What if we use a learning rate that is too large?
Modeling training may never converge and even diverge
Suppose we want to compute 10-Fold Cross-Validation error on 100 training examples. We need to compute error N1 times, and the Cross-Validation error is the average of the errors. To compute each error, we need to build a model with data of size N2, and test the model on the data of size N3.What are the appropriate numbers for N1;N2;N3?
N1 = 10;N2 = 90; N3 = 10
___________, is the sub-field of AI that is focused on enabling computers to understand and process human languages.
Natural Language Processing
What is not an advantage of using RNN language models? A. They consider word context B. They can process any length input C. They use the same weight at all sequence points D. None of the above
None of the above
Which of the following model does not apply transfer learning techniques?
None of the above
Which of the following techniques does NOT prevent a model from overfitting?
None of the above
Which of the following techniques does NOT prevent a model from overfitting? a. Data augmentations b. Dropout c. Early stopping d. None of the above
None of the above
What is gradient descent?
Optimization algorithm
tanh activation function is often used in ? A RNN B CNN C Both D Not applicable
RNN
You are giving data about seismic activity in Japan, and you want to predict if an earthquake will happen in the next year, this is an example of
Supervised learning
Spam email detection comes under which domain?
Text classification
Prescriptive Analytics
The set of analytical techniques that yield a best course of action.
What does a gradient descent algorithm do?
Tries to find the parameters of a model that minimizes the cost function based on mathematical calculations
Building effective traditional machine learning models needs expert knowledge about the relevant business process and knowledge about data attributes.
True
Deep Artificial Neural Networks and Deep Learning are generally the same thing and mostly used interchangeably. (True/False)
True
Distributed Vector Representation's advantage over the bag-of-words technique lies in the capability to represent text with limited number of entries it displays, regardless of the length of the text.
True
During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.
True
If we increase the size of the training data, this will likely improve the performance of the model on new data.
True
K-fold cross validation mitigates the biased effects of picking a training and a testing dataset particularly different?
True
Learning means finding a set of values for the weights of all layers in a network, such that the network will correctly map example inputs to their associated targets.
True
One major advantage of using Long Short-Term Memory (LSTM) model, rather than vanilla RNN, is that LSTM can deal with the exploding gradients problem.
True
One typical sign of a learning rate being too small is that the cost function may take a very long time to converge.
True
Recurrent Neural Networks handle sequence input A. True B. False
True
Sentiment analysis using Deep Learning is a many-to one prediction task.
True
The fundamental trick in deep learning is to use this score as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score.
True
The neural networks get the optimal weights and bias values through an Error Gradient. (True/False)
True
The purpose of performing model evaluation is to judge how the trained model performs outside the sample on test data.
True
True/false question: Sentiment analysis for example the negative words in email one of common typr of of NLP Techniques
True
Webpages can be created using HTML (HyperText Markup Language).
True
While recurrent network networks (RNNs) can handle a sequence of arbitrary length, training RNNs is hard because of banishing and exploring gradient problems.
True
While recurrent network networks (RNNs) can handle a sequence of arbitrary length, training RNNs is hard because of vanishing and exploring gradient problems.
True
Natural language processing is divided into the two sub fields of
Understanding and Generation
For a balanced binary dataset, suppose your model has 60% training performance and 55% testing performance, which of the following is a valid way to try and resolve this problem?
Use a more powerful model
What is the basic concept of Recurrent Neural Network? A Use previous inputs to find the next output according to the training set. B Use loops between the most important features to predict next output C Use a loop between inputs and outputs in order to achieve the better prediction. D Use recurrent features from dataset to find the best answers.
Use previous inputs to find the next output according to the training set.
Based on what we have covered about deep learning, we have learned that: A neural network is a (crude) mathematical representation of a brain, which consists of smaller components called neurons. Each neuron has an input, a processing function, and an output. These neurons are stacked together to form a network, which can be used to approximate any function. To get the best possible neural network, we can use techniques like gradient descent to update our neural network model. Given above is a description of a neural network. When does a neural network model become a deep learning model?
When you add more layers and increase depth of neural network
Which of the following statements is false? a. When creating a model, a key goal is to ensure that it is capable of making accurate predictions for data it has not yet seen. Two common problems that prevent accurate predictions are overfitting and underfitting b. Underfitting occurs when a model is too simple to make accurate predictions, based on its training data. An example of underfitting is using a linear model, such as simple linear regression, when in fact, the problem really requires a more sophisticated non-linear model c. Overfitting occurs when your model is too complex. In the most extreme case of overfitting, a model memorizes its training data d. When you make predictions with an overfit model, the model won't know what to do with new data that matches the training data, but the model will make excellent predictions with data it has never seen
When you make predictions with an overfit model, the model won't know what to do with new data that matches the training data, but the model will make excellent predictions with data it has never seen
Autoencoder
an unsupervised approach for learning a lower dimensional feature representation from unlabeled training data
Supervised learning differs from unsupervised clustering in that supervised learning requires
at least one output attribute
Classification problems are distinguished from regression problems in that
classification problems require the output attribute to be categorical
What is not a key component of data analytics? A. Data mining B. Data C. Modeling D. Business Strategies
data mining
Data intelligence/analytics is the conversion of large raw ___ into a smaller amount of more useful _____.
data; information
The Bag-of-Words approach_________
disregards word order, keeps word multiplicity
Predictive Analytics
extracts information from data and uses it to predict future trends and identify behavioral patterns
Data analytics is best described as the process of? [Medium]
identifying patterns in data
Select a non-linear model from below- A linear regression B neural network C logistic regression D support vector machine (SVM)
neural network
CNN
often for visual
RNN
often used for sequence
A trader who wants to predict short-term movements in stock prices is likely to use ________analytics.
predictive
information
processed data with meaning
data
raw facts, no specific meaning
Which of these applications will derive the LEAST benefit from text mining?
sales transaction files
types of data
structured categorical/numerical classification/regression unstructured textual/image or video clustering/association
Descriptive Analytics
the use of data to understand past and current business performance and make informed decisions
Supervised leraning
train the model using labeled data
unsupervised learning
train the model using unlabeled data
Data used to build a data mining model
training data
Overfitting, underfitting
training good testing good good training good testing bad overfitting training bad testing bad underfitting training bad testing good unlikely
___________ refers to a model that can neither model the training data nor generalize to new data.
underfitting