BUAL Exam #2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

limitations of AI

- performance limitations - biased AI through biased data - AI learning unhealthy stereotypes combating bias: - "zero out" the bias in words - use less biased and/or more inclusive data - transparency and/or auditing processes - diverse workforce

5. testing and evaluation

- the developed models are assessed and evaluated for their accuracy, generality, and usefulness - interaction btwn data analysts and business managers is important - refine and retest the model as necessary

learning process in ANN

usual process of learning involves 3 tasks: 1. compute temporary outputs 2. compare outputs with desired targets 3. adjust the weights and repeat the process

polynomial regression

when the power of the independent variable is more than one in regression equation

ANN neurons

each neuron: 1. calculates a weighted sum of the incoming values 2. transforms this input using the activation function 3. passes on the value to the subsequent neurons

linear regression

ex. predicting salary based on years of experience established a relationships between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (regression line) **obtain line of best fit using least square method: minimizing the sum of the squares of the vertical deviations from each data point to the line

artificial intelligence

the simulation of human intelligence in the machines that are programed to think like humans and mimic their actions - can be applied to nay machine that exhibits traits associated with a human mind such as learning and problem solving ***the science and engineering of making intelligent machines, especially intelligent computer programs

three layers of ANN

1. input layer 2. hidden layer 3. output layer

support vector machines (SVM)

1. maximal margin classifier 2. support vector classifier 3. support vector machine **generalization of a simple and intuitive classifier called the maximal margin classifier

6. deployment

ACTION!!!!! - may also include maintenance activities: may be further testing, refinement, or new business policies/processes, etc.

Realistic view of AI

AI cannot do everything, but will transform industries too optimistic: super-intelligent AI killer robots coming soon too pessimistic: AI cannot do everything, so an AI winter is coming

what are the major data mining tasks? a. prediction b. association c. cluster d. all of them

d. all of them

how does data mining work?

data mining uses data to build models that identify patterns or other relationships types of patterns: 1. prediction 2. cluster (segmentation) 3. association

market basket analysis

aim: find association and correlations btwn the different items that customers place in their shopping baskets

unsupervised learning

an algorithm explores input data without being given an explicit output variable

market analysis

transactions are specified in term of item-sets (such as the following transaction that might be found in a typical grocery) ex. store: [peanut, jelly, bread] the result of a market analysis is a collection of association rules that specify patterns found in the relationships among items

machine learning is the scientific study of statistical models that computer systems use to perform a specific task without using explicit instructions

true

the maximal margin classifier seeks the largest possible margin so that every observation is on the correct side of the separate line

true

unsupervised learning: clustering

k-means: 1. hierarchical clustering 2. principal component analysis (PCA) 3. singular value decomposition (SVD) 4. time series clustering

a hidden layer is ___________ a. a layer in a neural network between the input and outputs b. a layer in social media networks that is hidden to the user c. a layer in streaming stacks that processes data while remaining hidden to users d. none of the above

a. a layer in a neural network between the input and outputs

which one defines how to pass the value from inputs through the neuron and make the output? a. activation function b. value function c. control function d. transform function

a. activation function

__________ is a task to segment data into groups that are not previously defined a. clustering b. classification

a. clustering

convolutional neural networks are most notably used for... a. image classification b. social network analytics c. forecasting the stock market d. none of these

a. image classification

examples of AI in real life

- face detection - smart speaker - alpha go - manufacturing

k-mean algorithm's iterative process

three steps: 1. randomly assign k observations as the center of the cluster where k is the numbers of clusters required 2. find the distance between all the observations with the observations chosen at random centers 3. assign the observations to the cluster which it is closest to **this completes the first iteration, now we have k clusters with randomly assigned centers - once we have k cluster, we ned to recalculate the centers and reapply the algorithm **clustering obtained will continually improve until the result no longer changes

measuring rules

to make effective use of a rule, three numeric measures about that rule must be considered: 1. support 2. confidence 3. lift generally looking for... - support as high as possible - confidence close to 1.0 - lift higher than 1.0

CRISP-DM stands for Cross-Industry Standard Process for Data Mining

true

3. data preparation (data pre-processing)

- gather relevant data and prepare it for analysis - data pre-processing consumers the most time and effort; this step accounts for roughly 80% of the total time spent on a data mining project - data preparation includes several tasks: data consolidation, cleaning, and transformation

2. data understanding

- identify the relevant data from available databases - intimate understanding of the data source - know what data is available or acquirable - understand the data (the data types, variables) - in order to better understand the data, the analyst often uses a variety of statistical and graphical techniques such as simple statistical summaries of each variable, correlation analysis, scatter plots, histograms, and box plots

data cleaning

- impute (fill with a probable value) or ignore missing values - eliminate inconsistencies (unusual values within a variable)

1. business understanding

- know what the analysis is for - specific goals tied to potential action are critical - allows development of a project plan - project plan specifies the people responsible for collecting the data, analyzing data, and reporting the findings - this early stage, a budget to support the project should be also established, at least at a high level with rough numbers

the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in databases

1. process: implies that data mining comprise may 2. valid: the discovered patterns should hold true on new data with sufficient degree of certainty 3. novel: the patterns are not previously known to the user within the context of the system being analyzed 4. potentially useful: discovered pattens should lead to some benefits to the user or task 5. understandable: pattern should make business sense that leads to the user saying "ohhh" it makes sense

data mining

a way to develop intelligence from data that an organization collects, files, and stores - the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in databases ***statistical, mathematical, and machine learning techniques can be used to extract and identify useful information and knowledge from large data sets

data consolidation

collect and integrate data

the goal of k-means algorithms is to maximize the within-cluster-variation

false

the support levels of the two association rules, A --> B and B --> A, are different

false

support vector classifier

generalization of the maximal margin classifier to the non-separable case in this case, might be willing to consider a classifier based on a line that does not perfectly separate the two classes, in the interest of greater robustness to individual observations and better classification of most of the training observations - extension of the support vector machine classifier that results from enlarging the feature space in a specific way using kernels kernel functions: linear, polynomial, sigmoid, etc.

text mining

process of extracting patterns from large amounts of unstructured data sources such as word or pdf (ex. product reviews on Amazon)

machine learning

scientific study of statistical models that computer systems use to perform a specific task without using explicit instructions - method of data analysis that automates analytical model building **resurging interest in ML is due to the factors, growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data store

association rules

search all transactions from a system for patterns of occurrence (ex. 30% people who buys steaks also buy charcoal) - unsupervised technique - market basket analysis: find association and correlations btwn the dif items that customers place in their shopping baskets

4. model building

select an appropriate technique based on need and data types - apply the technique to an already prepared data set

deep learning/DNN

subset of machine learning that uses layered ANN to deliver state of the art accuracy in tasks such as object detection, speech recognition, and language translation ***a neural network with more than one layer

maximal margin classifier

suppose two classes are linearly separable (ie. one can draw a straight line in which all points on one side belong to the first class and the points on the other side belong to the second class) then a natural approach to find the straight line that gives the biggest separation btwn the classes (ie. the points are as far from the line as possible) margin: minimum perpendicular distance btwn each pint and the separate line - find the line which maximizes the margin - the classification of a point depends on which side of the line if falls on

support

the % of baskets where the rule was true, the occurring frequency of the rule, probability of simultaneously observing both items in a database support (x) = count (x)/N N= # of transactions count (x) = # of transactions containing item-set X

supervised learning: regression

1. linear regression 2. polynomial regression 3. neural networks

supervised learning: classification

1. logistic regression 2. neural networks 3. support vector machines (SVM) 4. decision tree 5. naive bayes

element of neural networks

1. processing element (PE): artificial neuron, receiving inputs, processing them, and delivering a single output 2. network architecture: input, hidden, and output layers 3. network information processing: input and output - connection weights: the relative strength or importance of each input to a processing element - summation function: computing the weighted sums of all the input elements entering each processing element

CRISP-DM

6 steps: 1. business understanding 2. data understanding 3. data preparation (data pre-processing) 4. model building 5. testing and evaluation 6. deployment

rules of form

: condition (antecedent) --> result (consequent) ex. [peanut butter, jelly] --> [bread] **this association rule states that is peanut butter and jelly are purchased together, then bread is also likely to be purchased

____________ is an industry standard data mining process that is interactive in nature and has 6 steps a. CRISP-DM b. SEMMA c. KDD

a. CRISP-DM

task of inferring a model from labeled training data is called... a. supervised learning b. unsupervised learning

a. supervised learning

data mining is defined as a process of identifying valid, novel, potentially useful, and understandable patterns in data, what is the meaning of valid? a. the pattern should hold true on new data b. it can use machine learning techniques c. the pattern is easy to understand d. the discovered patterns should lead to benefits

a. the pattern should hold true on new data

network information processing (cont.)

activation function (transfer function): in an ANN, the activation function of a neuron defines the output of that neuron given a set of inputs; defines how to pass the value from inputs through the neuron and make the output three examples of activation function: 1. threshold function 2. sigmoid function 3. rectifier function

supervised learning

an algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output

cluster

an unsupervised learning technique that attempts to create partitions in the data according to some distance metric - divides data into different groups finds groups that are dif from each other and whose members are similar - should be interpreted by someone knowledgeable in the organization

what is the confidence level of the association rule cereal --> milk in this data? a. 0.65 b. 0.75 c. 1.0 d. 0.57

b. 0.75

which of the following is an example of unsupervised learning? a. classification b. clustering c. regression d. none of those

b. clustering

association analysis is a/an _______________ a. supervised learning b. unsupervised learning c. reinforcement learning

b. unsupervised learning

what is the first phase of CRISP-DM process? a. data understanding b. data collection c. business understanding d. modeling building

c. business understanding

______________ is a subset of machine learning that uses multi-layered artificial neural networks

deep learning

deep neural network (DNN)

deep learning differs from traditional machine learning techniques: - can automatically learn representations from data such as images, video, or text, without introducing hand-coded rules or human domain knowledge 2 most common DNNs: 1. convolutional neural network (CNN): used for image classification 2. recurrent neural network (RNN): used for natural language processing and for sequential data

data mining process

manifestation of best practices - systematic way to conduct DM projects - dif groups have dif versions most common standard processes: 1. CRISP-DM (cross-industry standard process for data mining) 2. SEMMA (sample, explore, modify, model, and assess) 3. KDD (knowledge discovery in databases)

confidence (predictability)

measurement of its predictive power confidence (x --> y) = support (x, y) / support (x) tell us us the proportion of transactions where the presence of item X results in the presence of item Y

lift

measures whether the condition product is present without the result product lift (X --> Y) = support (X,Y) / support (x) * support (Y) ***lift values >1.0 indicate that transactions containing the condition tend to contain the result more often than transaction that do not contain the condition

goal of k-means

minimize the within cluster variations (ie. maximize the between cluster variation) - defining the within cluster variation (what it means for observations to be similar or dif) - the sum of the squared deviations from each observation and the cluster centroid **cluster centroid: the mean of a variable for the observations in that cluster

artificial neural networks (ANN)

neural network is an information processing paradigm that is inspired by the biological nervous systems such as the human brain's information processing mechanism **biological neurons in an animal brain are connects, and each connection (a synapse) btwn neurons can transmit a signal from one to another

data transformation

normalize data: eliminate the units of measurement for data, enabling you to more easily compare data from different places formula: (xi - x)^2/S xi = data point x = mean s = standard deviation - dummy variables - converting numeric variables to categorical values (ex. low, medium, high)

the support vector classifier allows some observations to be on the incorrect side of the separate line

true

prediction

understanding the possibility of future values based on past patterns 2 types of prediction: 1. classification: used for predicting categorical variables 2. regression: used for predicting continuous variables ***both are class of supervised learning algorithms dif. techniques: linear regression, logistic regression, decision trees, neural networks, support vector machines

Convolutional neural networks (CNNs)

use feature detectors that look at small portions of the image separately first feature detector: small matrix used for convolution in image processing

demystifying AI

there are 2 branches of AI... 1. ANI: artificial narrow intelligence (ex. smart speaker, self-driving car, web search, AI in farming and factories) 2. AGI: artificial general intelligence (do anything a human can do)

______________ is one of the CRISP-DM phases, this phase includes several tasks, such as data cleaning and transforming a. data consolidation b. data preparation c. model evaluation d. data collection

b. data preparation

association rules (cont.)

actions based on association rules... - coupons and discounts (discount one product to encourage sales of an associated product) - product placement (close together to encourage sales, far apart to force customers through store) - timing and cross-marketing (promotional mailing based on time since related purchase)

unsupervised learning: association

apiori (association rules): - have become a popular tool for analyzing very large transactional databases insetting where market basket is relevant - most often applied to binary-valued data where it is referred to as "market basket" analysis (ex. for observation i, xij = 1 when the item is purchased as part of transaction, xij = 0 if not purchased) **goal is to find joint values of the variables that appear most frequently in the database

logistic regression

appropriate regression analysis to conduct when the dependent variable is binary - outputs a probability value - if probability value is larger than a certain threshold probability, we assign the label for that case to be 1 or 0 otherwise ex. logistic regression model: log (p(x)/1-p(x)) = B0 + B1X1 + B2X2 **if B1 is + then increasing X1 will be associated with p(x), if B1 is - then increasing X1 will be associated with decreasing p(x)

which of the following is an example of unsupervised learning? a. classification b. clustering c. regression d. none of these

b. clustering

___________ is one of the three numeric measures that must be considered for an association rule; it measures its predictive power a. support b. confidence c. lift d. precision

b. confidence

regression works by... a. maximizing the distance between each data point in the dataset and the regression model b. minimizing the distance between each data point in the dataset and the regression model

b. minimizing the distance between each data point in the dataset and the regression model

assume you want to perform supervised learning and to predict number of newborns according to size of storks' population, it is an example of... a. classification b. regression c. clustering d. structural equation modeling

b. regression

let's suppose that a retailer found an association rule (peanut butter --> bread) in their database, the rule's confidence is 0.7m support is 1.0, and lift is 0.85, select the incorrect statement... a. the right-hand side of this association rule is called the result b. the retailer can think that those two items are truly associated c. a purchase involving peanut butter is accompanied by a purchase of bread 70% of the time d. every transaction in the database includes the two items

b. the retailer can think that those two items are truly associated

what is the problem of finding hidden structure in data without given an explicit output variable (unlabeled data)? a. supervised learning b. unsupervised learning

b. unsupervised learning

the goal of which task is to predict categories? a. regression b. clustering c. classification d. association analysis

c. classification

select the INCORRECT statement about prediction that is one of the data mining tasks? a. two major types of predictions are classification and regression b. supervised learning methods can be used for classification and regression c. classification is used for predicting continuous outcome variables d. regression is used for predicting continuous outcome variables

c. classification is used for predicting continuous outcome variables

which of the following is not true regarding data mining? a. it focuses on discovering useful information b. it can use machine learning techniques c. it is one task within a process d. it is a process from start to end

c. it is one task within a process

ANN layers

can have one or more layers of neurons - neurons can be fully connected or only certain layers can be connected hidden layer: layer of neurons that takes input from the previous layer and converts those inputs into outputs for further processing

classification vs. clustering

classification: supervised predictive model that segments data by assigning them to groups that are already defined - examines already classified data and develops a predictive pattern clustering: an unsupervised learning method to segment data into groups that are NOT previously defined

K-means

clustering refers to a very broad set of techniques for finding subgroups or clusters K-means approach partitions a data set into K distinct, non-overlapping clusters **k-means algorithm is a simple iterative method to partition a given dataset into a specified number of clusters, k

data in data mining

collection of facts obtained as the result of experiences, observation of experiments - consists of numbers, letters, words, images, voice, etc. 1. structured data: data mining algorithms 2. unstructured/semi-structured data" text mining, web mining data --> information - vast majority of business data is stored in text documents

k-means (Cont.)

creates k groups from a set of objects so that the members of a groups are more similar - popular cluster analysis technique ****good clustering is one for which the within-cluster variation is as small as possible

recurrent neural network (RNN)

cyclical type of neural network - recurrent b/c perform the same task for every element of a sequence with dif inputs - have a memory which captures information about what has been calculated so far


Set pelajaran terkait

Physiology of Behavior Chapter 5: Recording and stimulating neural activity

View Set

Perry Ch 22-24 Practice Questions

View Set

BBH 451 Exam 1, BBH 451 Exam 2, Exam 3 BBH 451

View Set

RS MGMT CH 5 Change Management in Health Information Management

View Set