big data test 3
Support vector machine
1) the maximal margin classifier, 2) the support vector classifier, and 3) the support vector machine
CRISP-DM
1. business understanding 2. data understanding 3. data preparation 4. model building 5. testing and evaluation 6. deployment
unsupervised learning
An algorithm explores input data without being given an explicit output variable. The algorithm identifies groups of data that exhibit similar behavior.
supervised learning
An algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output. The model is trained on the data to find the connection between the input variables and the output.
K-Means
An approach for partitioning a data set into K distinct, non-overlapping clusters. is a simple iterative method to partition a given dataset into a user specified number of clusters, k creates k groups from a set of objects so that the members of a group are more similar. It's a popular cluster analysis technique for exploring a dataset. A good clustering is one for which the within-cluster variation is as small as possible.
most common standard processes
CRISP-DM (Cross-Industry Standard Process for Data Mining) SEMMA (Sample, Explore, Modify, Model, and Assess) KDD (Knowledge Discovery in Databases)
CRISP-DM stands for
Cross-Industry Standard Process for Data Mining
refers to knowing what is happening in organization and understanding some underlying trends and causes of such occurrences.
Descriptive Analytics
Artificial Neural Networks (ANN)
Each neuron: 1) calculates a weighted sum of the incoming values, 2) transforming this input using the activation function, and 3) passes on the value to the subsequent neuron(s)
Clustering is supervised learning?
False
Given a set of news articles found on the web, group them into set of articles about the same story. This is a classification task
False
Clustering segments data into groups that are previously defined.
False, NOT previously defined
K-means is used for classification?
False, clustering
K-Means
Hierarchical clustering Principal component analysis (PCA) Singular value decomposition (SVD) Time series clustering
Lift
If lift (milk -> bread) is greater than one, it implies that the two items are found together more often than one would expect by chance. A large lift value is therefore a strong indicator that a rule is important, and reflects a true connection between the items.
Network information processing
Input and output Connection weights: the relative strength or importance of each input to a processing element Summation function: computing the weighted sums of all the input elements entering each processing element
Maximum margin classifier
Margin is the minimum perpendicular distance between each point and the separate line. Find the line which maximizes the margin. The classification of a point depends on which side of the line it falls on.
Apiori
Rules of form: condition - >result E.g. [peanut butter, jelly] -> [bread] This association rule states that if peanut butter, and jelly are purchased together, then bread is also likely to be purchased.
( ) refers to the occurring frequency of the rule.
Support
Association rule mining uses three metrics (________), (_________), and (________).
Support; Confidence; Lift
maximal margin classifier
Suppose that the two classes are "linearly separable" i.e. one can draw a straight line in which all points on one side belong to the first class and points on the other side to the second class. natural approach is to find the straight line that gives the biggest separation between the classes i.e. the points are as far from the line as possible
true negative rate (specificity)
TN/TN+FP
Accuracy
TP + TN/ TP+TN+FP+FN
True positive rate (sensitivity)
TP/TP+FN
Precision
TP/TP+FP
Three examples of the activation function
Threshold function Sigmoid function Rectifier function
Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not. This is an example, which you would address using an supervised learning algorithm.
True
Prediction
Understanding the possibility of future values based on past patterns.
Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of ________.
Unsupervised Learning
The problem of finding hidden structure in unlabeled data is called:
Unsupervised Learning
hidden layer
a layer of neurons that takes input from the previous layer and converts those inputs into outputs for further processing.
Predictive analytics
aims to determine what is likely to happen in the future. It is based on statistical techniques as well as other more recently developed techniques that fall under the categories of data mining or machine learning. What will happen? Why will it happen?
support vector machine (SVM)
an extension of the support vector support vector machine classifier that results from enlarging the feature space in a specific way, using kernels. Kernel functions: linear, polynomial, sigmoid, etc.
logistic regression
appropriate regression analysis to conduct when the dependent variable is dichotomous (binary) Models the probability of an event occurring depending an the values of the independent variables, which can be categorical or numerical Rather than modeling this response Y directly, logistic regression models the probability that Y belongs to a particular category.
processing element (PE)
artificial neuron, receiving inputs, processing them, and delivering a single output
The phases of CRISP-DM are ( ), ( ), ( ), Model building, Testing and evaluation, and Deployment.
business understanding , data understanding, data preparation
ANN
can have one or more layers of neurons. Theses neurons can be fully connected or only certain layers can be connected
two major types of prediction
classification regression
two most common DNNs
convolutional neural network (CNN) recurrent neural networks (RNN)
deep neural network (dnn)
differs from traditional machine learning techniques: deep learning techniques can automatically learn representations from data such as images, video or text, without introducing hand-coded rules or human domain knowledge.
linear regression
establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line
negative predicted class positive true class
false negative count
negative true class positive predicted class
false positive class
Activation function (Transfer function)
in an artificial neural network, the activation function of a neuron defines the output of that neuron given a set of inputs; defines how to pass the value from inputs through the neuron and make the output
lift values > 1.0
indicate that transactions containing the condition tend to contain the result more often than transactions that do not contain the condition
Network architecture
input, hidden and output layers
how to obtain best fit line?
least square method
kernel functions
linear, polynomial, sigmoid, etc.
Confidence
measurement of its predictive power tells us the proportion of transactions where the presence of item (or item-set) X result in the presence of item (or item-set) Y.
A (_______) is the relative importance of each input to a processing element in a neural network.
neuron
classification
predicts categorical variables
regression
predicts continuous variables
The goal of (_______) analytics is to provide a decision or a recommendations for a specific action?
prescriptive
Descriptive analytics
refers to knowing what is happening in organization and understanding some underlying trends and causes of such occurrences. What happened? What is happening?
Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population, it is an example of _______
regression
The goal of which task is to predict continuous variables?
regression
Prescriptive analytics
seeks to make decisions to achieve the best performance possible: provide recommendations on what to do to achieve goals What should I do? Why should I do it?
Deep learning
subset of machine learning that uses multi-layered artificial neural networks (also known as deep neural networks) to deliver state-of-the-art accuracy in tasks such as object detection, speech recognition, and language translation. Refers to a neural network with more than one hidden layer get's it's name from the deep layers associated with the networks - typically there are a lot of hidden layers
classification
supervised predictive model that segments data by assigning them to groups that are already defined examines already classified data and develops a predictive pattern (rule)
data mining
the intersection of machine learning, statistics, database systems... many different disciplines a process that uses statistical, mathematical and artificial intelligence techniques to extract and identify useful information and knowledge (or patterns) from large data sets.
Support
the percentage of baskets where the rule was true; the occurring frequency of the rule
Recall
tp/tp+fn
. Data mining focuses on discovering useful information.
true
Data mining is a cross-disciplinary field
true
Given email labeled as spam/non-spam, building a spam filter model is a classification task.
true
Regression works by minimizing the vertical deviation between each data point in the dataset and the regression model.
true
negative predicted class negative true class
true negative count
positive true class positive predicted class
true positive count
clustering
unsupervised learning method to segment data into groups that are NOT previously defined
clustering
unsupervised learning technique that attempts to create partitions in the data according to some distance metric. Divides data into different groups Finds groups that are different from each other AND whose members are similar
convolutional neural network (CNN)
used for image classification
Recurrent neural networks (RNN)
used for natural language processing and for sequential data
Lift
whether the condition product is present without the result product. Generally looking for: Support as high as possible Confidence close to 1.0 Lift higher than 1.0
Confidence
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 (𝑋→𝑌)= (𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋, 𝑌))/(𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋))
Lift
𝐿𝑖𝑓𝑡 (𝑋→𝑌)= (𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋, 𝑌))/(𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋)⨯𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑌) )
support
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋)=(𝐶𝑜𝑢𝑛𝑡 (𝑋))/𝑁 where N is the # of transactions and count(X) is the # of transactions containing item-set X.