Big Data Test 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A hidden layer is _________. A) A layer in a neural network between the input and outputs. B) A layer in social media networks that is hidden to the user. C) A layer in streaming stacks that processes data while remaining hidden to users. D) None of the above.

A) A layer in a neural network between the inputs and outputs

Convolutional neural networks are most notably used for __________. A) image classification B) social network analytics C) forecasting the stock market D) none of these

A) Image Classification

What are the major data mining tasks? A) Prediction B) Association C) Cluster D) All of them

All of them

Unsupervised Learning

An algorithm explores input data without being given an explicit output variable.

Supervised Learning

An algorithm uses training data and feedback from humans to learn the relationship of a given inputs to a given output.

Activation Function (Transfer Function)

An artificial neural network, the activation function of a neuron defines the output of the neuron given a set of inputs; defines how to pas the value from inputs through the neuron and make the output.

Step 3: Data Preparation

Data Consolidation, Cleaning, and Transformation

Stages of CRISP-DM

1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Model Building 5. Testing and Evaluation 6. Deployment

Most common DNNs

1. Convolutional Neural Network 2. Recurrent Neural Network

Which one defines how to pass the value from inputs through the neuron and make the output? A) Activation function B) Value function C) Control function D) Transform function

Activation Function

Clustering

An unsupervised learning method to segment data into groups that are NOT previously defined

Convolutional neural networks (CNNs)

used for image classification

Classification

used for predicting categorical variables

Regression

used for predicting continuous variables

Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population, it is an example of _______ A) Classification B) Regression C) Clustering D) Structural equation modeling

B) Regression

Prediction

Two major types of prediction are classification and regression

Association Analysis is a/an _________. A) Supervised learning B) Unsupervised learning C) Reinforcement learning

Unsupervised Learning

Support

Refers to the percentage of baskets where the rule was true; the occurring frequency of the rule; probability of simultaneously observing both items sets in a database

Apriori (Association rules)

Rules of Form Condition (antecedent) -> Result (Consequent)

The goal of which task is to predict categories? _______ A) Regression B) Clustering C) Classification D) Association analysis

C) Classification

Select the incorrect statement about prediction that is one of data mining tasks? A) Two major types of predictions are classification and regression. B) Supervised learning methods can be used for classification and regression. C) Classification is used for predicting continuous outcome variables. D) Regression is used for predicting continuous outcome variables.

C) Classification is used for predicting continuous outcome variables

1. Which of the following is not true regarding data mining? A) It focuses on discovering useful information. B) It can use machine learning techniques. C) It is one task within a process. D) It is a process from start to end.

C) It is one task within a process

(_______) is a industry standard data mining process that is iterative in nature and has 6 steps. A) CRISP-DM B) SEMMA C) KDD

CRISP-DM

Generally looking for

Support as high as possible Confidence close to 1.0 Lift higher than 1.0

Step 1: Business Understanding

- Specific goals tied to potential action are critical - These allow development of a project plan - The project plan specifies the people responsible for collecting the data, analyzing data, and reporting findings.

Logistic Regression

- The appropriate regression analysis to conduct when the dependent variable is dichotomous (binary) - While a linear regression model outputs a real number, a logistic regression model outputs a probability value.

Support vector classifier

- The generalization of maximal margin classifier to the non-separable case - In this case, we might be willing to consider a classifier based on a line that does not perfectly separate the two classes.

Classification vs clustering

- classification is a supervised predictive model that segments data by assigning them to groups that are already defined. - Examines already classified data develops a predictive pattern

Review Questions

.

Confidence (Probability)

A measurement of its predicative power

Artificial Narrow Intelligence

A system for a particular task or job (ex. smart speaker)

Artificial General Intelligence

A system for doing everything humans can do

Task of inferring a model from labeled training data is called ________ A) Supervised learning B) Unsupervised learning

A) Supervised Learning

Data mining is defined as a process of identifying valid, novel, potentially useful, and understandable patterns in data. What is the meaning of 'valid'? A) The pattern should hold true on new data. B) It can use machine learning techniques. C) The pattern is easy to understand. D) The discovered patterns should lead to benefits.

A) The pattern should hold true on new data

What is the confidence level of the association rule [Cereal -> Milk] in this data? _______ A) 0.65 B) 0.75 C) 1.0 D) 0.57 Customer Items 1 Cereal, Milk, Bread 2 Eggs, Cereal, Beer, Water 3 Milk, Water, Cereal 4 Eggs, Beer, Bread 5 Cereal, Milk, Bread

B) 0.75

Which of the following is an example of unsupervised learning? _______ A) Classification B) Clustering C) Regression D) None of those

B) Clustering

(_______) is one of the three numeric measures that must be considered for an association rule; it measures its predictive power. A) Support B) Confidence C) Lift D) Precision

B) Confidence

Which of the following is used for clustering? A) Logistic regression B) K-means C) Apriori D) Neural networks

B) K-Means

Regression works by: A) Maximizing the distance between each data point in the dataset and the regression model. B) Minimizing the distance between each data point in the dataset and the regression model.

B) Minimizing the distance between each data point in the dataset and the regression model.

Let's suppose that a retailor found an association rule (Peanut butter à Bread) in their database. This rule's confidence is 0.7, support is 1.0, and lift is 0.85. Select the incorrect statement: _______ A) The right-hand side of this association rule is called the result. B) The retailor can think that those two items are truly associated. C) A purchase involving peanut butter is accompanied by a purchase of bread 70% of the time. D) Every transaction in the database includes the two items.

B) The retailor can think that those two items are truly associated

What is the problem of finding hidden structure in data without given an explicit output variable (unlabeled data)? A) Supervised learning B) Unsupervised learning

B) Unsupervised Learning

What is the first phase of CRISP-DM process? A) Data understanding B) Data collection C) Business understanding D) Modeling building

C) Business Understanding

(_______) is a task to segment data into groups that are not previously defined. A) Clustering B) Classification

Clustering

Summation Function

Computing the weighted sums of all input elements entering each processing element.

CRISP-DM

Cross Industry Standard Process for Data Mining

(_______) is one of CRISP-DM phases. This phase includes several tasks, such as data cleansing and transforming. A) Data consolidation B) Data preparation C) Model evaluation D) Data collection

Data Preparation

( ) is a subset of machine learning that uses multi-layered artificial neural networks.

Deep learning

deep neural network (DNN)

Deep learning is a subset of machine learning that uses multi-layered artificial neural networks (DNN) to deliver state-of-the-art accuracy in tasks such as object detection, speech recognition, and language translation. - Refers to a neural network with more than one hidden layer

artificial neural network

Each neuron 1) calculates a weighted sum of incoming values 2) transforming this input using the activation function, and 3) passes on the value to the subsequent neurons

The goal of k-means algorithms is to maximize the within-cluster-variation. (True / False)

False

The support levels of the two association rules, [A -> B] and [B -> A], are different. (True / False)

False

If lift (Milk->Bread) > 1

It implies that the two items are found together more often than one would expect by chance. - a large lift value is therefore a strong indication that a rule is important, and reflects a true connection between the items.

( ) is the scientific study of statistical models that computer systems use to perform a specific task without using explicit instructions.

Machine Learning

Machine Learning

Machine learning is the scientific study of statistical models that the computer systems use to perform a specific task WITHOUT USING EXPLICIT INSTRUCTIONS

Valid

Means that the discovered patterns should hold true on new data with sufficient degree of certainty

Potentially useful

Means that the discovered patterns should lead to some benefits to the user or task

Understandable

Means that the discovered patterns should make business sense that leads to the user saying mmm! it makes sense!

Novel

Means that the patterns are not previously known to the user within the context of the system being analyzed

Lift

Measures whether the condition product is present without the result product - Lift values > 1.0 indicate that the transactions containing the condition tend to contain the result more often than transaction that do not contain the condition

Support Vector Machine

People often refer to 1) The maximal margin classifier, 2) Support vector classifier and 3) The support vector machine as "support vector machines" A generalization of a simple and intuitive classifier called the maximal margin classifier

Measuring Rules

The effective use of a rule, three numeric measures about the rule must be considered: support, confidence, and lift

Data Mining

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. - Identify useful information and knowledge from large data sets

Hidden layer

The second layer of a three-layer network where the input layer sends its signals, performs intermediary processing and send to output layer

7. Discriminating between spam and non-spam emails is a classification task (True / False)

True

The maximal margin classifier seeks the largest possible margin so that every observation is on the correct side of the separate line. (True / False)

True

The support vector classifier allows some observations to be on the incorrect side of the separate line. (True / False)

True

Recurrent Neural Networks (RNN)

Used for natural language processing and for sequential data.

K-Means

creates k groups from a set of objects so that the members of a group are more similar. It's a popular cluster analysis technique for exploring a dataset. - a good clustering is one for which the within-cluster variation is small as possible

Process

implies that data mining comprises many iterative steps

Artificial Intelligence

the science and engineering of making intelligent machines, especially intelligent computer programs


Ensembles d'études connexes

organizational behavior management (power)

View Set

Elementary Quiz Bowl-Animal Quiz

View Set

FL Trainee Appraiser Subject Matter Electives

View Set

Life Insurance and Annuities - Policy Replacement and Cancellation

View Set

Psych Unit 1 Exam: NCLEX Questions

View Set

Fundamentals of Nursing Chapter 35: Skin Integrity and Wound Care

View Set

Differentiate among different types of medications actions

View Set

12 Indo ORAL EXAM - 01 General qs

View Set