BUAL Exam #2
learning process in ANN
usual process of learning involves 3 tasks: 1. compute temporary outputs 2. compare outputs with desired targets 3. adjust the weights and repeat the process
polynomial regression
when the power of the independent variable is more than one in regression equation
three layers of ANN
1. input layer 2. hidden layer 3. output layer
support vector machines (SVM)
1. maximal margin classifier 2. support vector classifier 3. support vector machine **generalization of a simple and intuitive classifier called the maximal margin classifier
what are the major data mining tasks? a. prediction b. association c. cluster d. all of them
d. all of them
how does data mining work?
data mining uses data to build models that identify patterns or other relationships types of patterns: 1. prediction 2. cluster (segmentation) 3. association
market basket analysis
aim: find association and correlations btwn the different items that customers place in their shopping baskets
unsupervised learning
an algorithm explores input data without being given an explicit output variable
market analysis
transactions are specified in term of item-sets (such as the following transaction that might be found in a typical grocery) ex. store: [peanut, jelly, bread] the result of a market analysis is a collection of association rules that specify patterns found in the relationships among items
machine learning is the scientific study of statistical models that computer systems use to perform a specific task without using explicit instructions
true
the maximal margin classifier seeks the largest possible margin so that every observation is on the correct side of the separate line
true
unsupervised learning: clustering
k-means: 1. hierarchical clustering 2. principal component analysis (PCA) 3. singular value decomposition (SVD) 4. time series clustering
a hidden layer is ___________ a. a layer in a neural network between the input and outputs b. a layer in social media networks that is hidden to the user c. a layer in streaming stacks that processes data while remaining hidden to users d. none of the above
a. a layer in a neural network between the input and outputs
data consolidation
collect and integrate data
the goal of k-means algorithms is to maximize the within-cluster-variation
false
the support levels of the two association rules, A --> B and B --> A, are different
false
support vector classifier
generalization of the maximal margin classifier to the non-separable case in this case, might be willing to consider a classifier based on a line that does not perfectly separate the two classes, in the interest of greater robustness to individual observations and better classification of most of the training observations - extension of the support vector machine classifier that results from enlarging the feature space in a specific way using kernels kernel functions: linear, polynomial, sigmoid, etc.
text mining
process of extracting patterns from large amounts of unstructured data sources such as word or pdf (ex. product reviews on Amazon)
machine learning
scientific study of statistical models that computer systems use to perform a specific task without using explicit instructions - method of data analysis that automates analytical model building **resurging interest in ML is due to the factors, growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data store
4. model building
select an appropriate technique based on need and data types - apply the technique to an already prepared data set
supervised learning: regression
1. linear regression 2. polynomial regression 3. neural networks
association analysis is a/an _______________ a. supervised learning b. unsupervised learning c. reinforcement learning
b. unsupervised learning
______________ is a subset of machine learning that uses multi-layered artificial neural networks
deep learning
the support vector classifier allows some observations to be on the incorrect side of the separate line
true
Convolutional neural networks (CNNs)
use feature detectors that look at small portions of the image separately first feature detector: small matrix used for convolution in image processing
examples of AI in real life
- face detection - smart speaker - alpha go - manufacturing
3. data preparation (data pre-processing)
- gather relevant data and prepare it for analysis - data pre-processing consumers the most time and effort; this step accounts for roughly 80% of the total time spent on a data mining project - data preparation includes several tasks: data consolidation, cleaning, and transformation
2. data understanding
- identify the relevant data from available databases - intimate understanding of the data source - know what data is available or acquirable - understand the data (the data types, variables) - in order to better understand the data, the analyst often uses a variety of statistical and graphical techniques such as simple statistical summaries of each variable, correlation analysis, scatter plots, histograms, and box plots
data cleaning
- impute (fill with a probable value) or ignore missing values - eliminate inconsistencies (unusual values within a variable)
1. business understanding
- know what the analysis is for - specific goals tied to potential action are critical - allows development of a project plan - project plan specifies the people responsible for collecting the data, analyzing data, and reporting the findings - this early stage, a budget to support the project should be also established, at least at a high level with rough numbers
limitations of AI
- performance limitations - biased AI through biased data - AI learning unhealthy stereotypes combating bias: - "zero out" the bias in words - use less biased and/or more inclusive data - transparency and/or auditing processes - diverse workforce
5. testing and evaluation
- the developed models are assessed and evaluated for their accuracy, generality, and usefulness - interaction btwn data analysts and business managers is important - refine and retest the model as necessary
supervised learning: classification
1. logistic regression 2. neural networks 3. support vector machines (SVM) 4. decision tree 5. naive bayes
the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in databases
1. process: implies that data mining comprise may 2. valid: the discovered patterns should hold true on new data with sufficient degree of certainty 3. novel: the patterns are not previously known to the user within the context of the system being analyzed 4. potentially useful: discovered pattens should lead to some benefits to the user or task 5. understandable: pattern should make business sense that leads to the user saying "ohhh" it makes sense
element of neural networks
1. processing element (PE): artificial neuron, receiving inputs, processing them, and delivering a single output 2. network architecture: input, hidden, and output layers 3. network information processing: input and output - connection weights: the relative strength or importance of each input to a processing element - summation function: computing the weighted sums of all the input elements entering each processing element
CRISP-DM
6 steps: 1. business understanding 2. data understanding 3. data preparation (data pre-processing) 4. model building 5. testing and evaluation 6. deployment
rules of form
: condition (antecedent) --> result (consequent) ex. [peanut butter, jelly] --> [bread] **this association rule states that is peanut butter and jelly are purchased together, then bread is also likely to be purchased
6. deployment
ACTION!!!!! - may also include maintenance activities: may be further testing, refinement, or new business policies/processes, etc.
Realistic view of AI
AI cannot do everything, but will transform industries too optimistic: super-intelligent AI killer robots coming soon too pessimistic: AI cannot do everything, so an AI winter is coming
data mining
a way to develop intelligence from data that an organization collects, files, and stores - the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in databases ***statistical, mathematical, and machine learning techniques can be used to extract and identify useful information and knowledge from large data sets
____________ is an industry standard data mining process that is interactive in nature and has 6 steps a. CRISP-DM b. SEMMA c. KDD
a. CRISP-DM
which one defines how to pass the value from inputs through the neuron and make the output? a. activation function b. value function c. control function d. transform function
a. activation function
__________ is a task to segment data into groups that are not previously defined a. clustering b. classification
a. clustering
convolutional neural networks are most notably used for... a. image classification b. social network analytics c. forecasting the stock market d. none of these
a. image classification
task of inferring a model from labeled training data is called... a. supervised learning b. unsupervised learning
a. supervised learning
data mining is defined as a process of identifying valid, novel, potentially useful, and understandable patterns in data, what is the meaning of valid? a. the pattern should hold true on new data b. it can use machine learning techniques c. the pattern is easy to understand d. the discovered patterns should lead to benefits
a. the pattern should hold true on new data
association rules (cont.)
actions based on association rules... - coupons and discounts (discount one product to encourage sales of an associated product) - product placement (close together to encourage sales, far apart to force customers through store) - timing and cross-marketing (promotional mailing based on time since related purchase)
network information processing (cont.)
activation function (transfer function): in an ANN, the activation function of a neuron defines the output of that neuron given a set of inputs; defines how to pass the value from inputs through the neuron and make the output three examples of activation function: 1. threshold function 2. sigmoid function 3. rectifier function
supervised learning
an algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output
cluster
an unsupervised learning technique that attempts to create partitions in the data according to some distance metric - divides data into different groups finds groups that are dif from each other and whose members are similar - should be interpreted by someone knowledgeable in the organization
unsupervised learning: association
apiori (association rules): - have become a popular tool for analyzing very large transactional databases insetting where market basket is relevant - most often applied to binary-valued data where it is referred to as "market basket" analysis (ex. for observation i, xij = 1 when the item is purchased as part of transaction, xij = 0 if not purchased) **goal is to find joint values of the variables that appear most frequently in the database
logistic regression
appropriate regression analysis to conduct when the dependent variable is binary - outputs a probability value - if probability value is larger than a certain threshold probability, we assign the label for that case to be 1 or 0 otherwise ex. logistic regression model: log (p(x)/1-p(x)) = B0 + B1X1 + B2X2 **if B1 is + then increasing X1 will be associated with p(x), if B1 is - then increasing X1 will be associated with decreasing p(x)
what is the confidence level of the association rule cereal --> milk in this data? a. 0.65 b. 0.75 c. 1.0 d. 0.57
b. 0.75
which of the following is an example of unsupervised learning? a. classification b. clustering c. regression d. none of those
b. clustering
which of the following is an example of unsupervised learning? a. classification b. clustering c. regression d. none of these
b. clustering
___________ is one of the three numeric measures that must be considered for an association rule; it measures its predictive power a. support b. confidence c. lift d. precision
b. confidence
what is the first phase of CRISP-DM process? a. data understanding b. data collection c. business understanding d. modeling building
c. business understanding
the goal of which task is to predict categories? a. regression b. clustering c. classification d. association analysis
c. classification
select the INCORRECT statement about prediction that is one of the data mining tasks? a. two major types of predictions are classification and regression b. supervised learning methods can be used for classification and regression c. classification is used for predicting continuous outcome variables d. regression is used for predicting continuous outcome variables
c. classification is used for predicting continuous outcome variables
which of the following is not true regarding data mining? a. it focuses on discovering useful information b. it can use machine learning techniques c. it is one task within a process d. it is a process from start to end
c. it is one task within a process
ANN layers
can have one or more layers of neurons - neurons can be fully connected or only certain layers can be connected hidden layer: layer of neurons that takes input from the previous layer and converts those inputs into outputs for further processing
classification vs. clustering
classification: supervised predictive model that segments data by assigning them to groups that are already defined - examines already classified data and develops a predictive pattern clustering: an unsupervised learning method to segment data into groups that are NOT previously defined
K-means
clustering refers to a very broad set of techniques for finding subgroups or clusters K-means approach partitions a data set into K distinct, non-overlapping clusters **k-means algorithm is a simple iterative method to partition a given dataset into a specified number of clusters, k
data in data mining
collection of facts obtained as the result of experiences, observation of experiments - consists of numbers, letters, words, images, voice, etc. 1. structured data: data mining algorithms 2. unstructured/semi-structured data" text mining, web mining data --> information - vast majority of business data is stored in text documents
k-means (Cont.)
creates k groups from a set of objects so that the members of a groups are more similar - popular cluster analysis technique ****good clustering is one for which the within-cluster variation is as small as possible
recurrent neural network (RNN)
cyclical type of neural network - recurrent b/c perform the same task for every element of a sequence with dif inputs - have a memory which captures information about what has been calculated so far
deep neural network (DNN)
deep learning differs from traditional machine learning techniques: - can automatically learn representations from data such as images, video, or text, without introducing hand-coded rules or human domain knowledge 2 most common DNNs: 1. convolutional neural network (CNN): used for image classification 2. recurrent neural network (RNN): used for natural language processing and for sequential data
ANN neurons
each neuron: 1. calculates a weighted sum of the incoming values 2. transforms this input using the activation function 3. passes on the value to the subsequent neurons
linear regression
ex. predicting salary based on years of experience established a relationships between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (regression line) **obtain line of best fit using least square method: minimizing the sum of the squares of the vertical deviations from each data point to the line
data mining process
manifestation of best practices - systematic way to conduct DM projects - dif groups have dif versions most common standard processes: 1. CRISP-DM (cross-industry standard process for data mining) 2. SEMMA (sample, explore, modify, model, and assess) 3. KDD (knowledge discovery in databases)
confidence (predictability)
measurement of its predictive power confidence (x --> y) = support (x, y) / support (x) tell us us the proportion of transactions where the presence of item X results in the presence of item Y
lift
measures whether the condition product is present without the result product lift (X --> Y) = support (X,Y) / support (x) * support (Y) ***lift values >1.0 indicate that transactions containing the condition tend to contain the result more often than transaction that do not contain the condition
goal of k-means
minimize the within cluster variations (ie. maximize the between cluster variation) - defining the within cluster variation (what it means for observations to be similar or dif) - the sum of the squared deviations from each observation and the cluster centroid **cluster centroid: the mean of a variable for the observations in that cluster
artificial neural networks (ANN)
neural network is an information processing paradigm that is inspired by the biological nervous systems such as the human brain's information processing mechanism **biological neurons in an animal brain are connects, and each connection (a synapse) btwn neurons can transmit a signal from one to another
data transformation
normalize data: eliminate the units of measurement for data, enabling you to more easily compare data from different places formula: (xi - x)^2/S xi = data point x = mean s = standard deviation - dummy variables - converting numeric variables to categorical values (ex. low, medium, high)
association rules
search all transactions from a system for patterns of occurrence (ex. 30% people who buys steaks also buy charcoal) - unsupervised technique - market basket analysis: find association and correlations btwn the dif items that customers place in their shopping baskets
deep learning/DNN
subset of machine learning that uses layered ANN to deliver state of the art accuracy in tasks such as object detection, speech recognition, and language translation ***a neural network with more than one layer
artificial intelligence
the simulation of human intelligence in the machines that are programed to think like humans and mimic their actions - can be applied to nay machine that exhibits traits associated with a human mind such as learning and problem solving ***the science and engineering of making intelligent machines, especially intelligent computer programs
k-mean algorithm's iterative process
three steps: 1. randomly assign k observations as the center of the cluster where k is the numbers of clusters required 2. find the distance between all the observations with the observations chosen at random centers 3. assign the observations to the cluster which it is closest to **this completes the first iteration, now we have k clusters with randomly assigned centers - once we have k cluster, we ned to recalculate the centers and reapply the algorithm **clustering obtained will continually improve until the result no longer changes
measuring rules
to make effective use of a rule, three numeric measures about that rule must be considered: 1. support 2. confidence 3. lift generally looking for... - support as high as possible - confidence close to 1.0 - lift higher than 1.0
CRISP-DM stands for Cross-Industry Standard Process for Data Mining
true
prediction
understanding the possibility of future values based on past patterns 2 types of prediction: 1. classification: used for predicting categorical variables 2. regression: used for predicting continuous variables ***both are class of supervised learning algorithms dif. techniques: linear regression, logistic regression, decision trees, neural networks, support vector machines
regression works by... a. maximizing the distance between each data point in the dataset and the regression model b. minimizing the distance between each data point in the dataset and the regression model
b. minimizing the distance between each data point in the dataset and the regression model
assume you want to perform supervised learning and to predict number of newborns according to size of storks' population, it is an example of... a. classification b. regression c. clustering d. structural equation modeling
b. regression
let's suppose that a retailer found an association rule (peanut butter --> bread) in their database, the rule's confidence is 0.7m support is 1.0, and lift is 0.85, select the incorrect statement... a. the right-hand side of this association rule is called the result b. the retailer can think that those two items are truly associated c. a purchase involving peanut butter is accompanied by a purchase of bread 70% of the time d. every transaction in the database includes the two items
b. the retailer can think that those two items are truly associated
______________ is one of the CRISP-DM phases, this phase includes several tasks, such as data cleaning and transforming a. data consolidation b. data preparation c. model evaluation d. data collection
b. data preparation
maximal margin classifier
suppose two classes are linearly separable (ie. one can draw a straight line in which all points on one side belong to the first class and the points on the other side belong to the second class) then a natural approach to find the straight line that gives the biggest separation btwn the classes (ie. the points are as far from the line as possible) margin: minimum perpendicular distance btwn each pint and the separate line - find the line which maximizes the margin - the classification of a point depends on which side of the line if falls on
support
the % of baskets where the rule was true, the occurring frequency of the rule, probability of simultaneously observing both items in a database support (x) = count (x)/N N= # of transactions count (x) = # of transactions containing item-set X
demystifying AI
there are 2 branches of AI... 1. ANI: artificial narrow intelligence (ex. smart speaker, self-driving car, web search, AI in farming and factories) 2. AGI: artificial general intelligence (do anything a human can do)
what is the problem of finding hidden structure in data without given an explicit output variable (unlabeled data)? a. supervised learning b. unsupervised learning
b. unsupervised learning