Chapter 8 Learning Question #7: What are typical data-mining applications?
What are the four reasons that big data is controversial?
1. Lack of precision in its definition (when does data become big data) 2. Adds to excessive data collections 3) Big data is very expensive 4) Occasionally results in predictions that do not stand the test of time, are overly vague, or general (e.g. left turns result in lower fuel economy), or confuse correlation with causality
What is one common unsupervised data-mining technique?
A *cluster analysis*
What is *unsupervised data mining*?
A form of data mining where analysts do not create a model or hypothesis before running the analysis. Instead, they apply the data-mining technique to the data and *observe* the results.
What is *supervised data mining*?
A form of data-mining in which data miners develop a model or hypothesis *PRIOR* to the analysis and apply statistical techniques to data to estimate values of the parameters of the model
What are *neural networks*?
A popular supervise data-mining technique used to predict values and make classifications, such as good prospect or poor prospect
What is a regression analysis used to determine?
A regression analysis is used to determine the relative influence of variables on an outcome and also to predict future values of that outcome
What is *regression analysis*?
A type of supervised data mining that estimates the values of parameters in a linear equation
What is *big data*?
An imprecise term that generally refers to large volumes of a variety of data over a long period of time that are used to draw general and specific inferences and analysis--for example, the spread of disease, customer preferences, or individual behaviours
What is a *cluster analysis*?
An unsupervised data-mining technique in which statistical techniques are used to identify groups of entities that have similar characteristics
With the unsupervised data mining method, when do analysts create hypotheses to explain the patterns found?
*AFTER* the analysis to explain the patterns found
What two categories do data-mining techniques fall into?
1) *unsupervised data mining* and 2) *supervised data mining*
What is a common use for cluster analysis?
To find groups of similar customers in data about customer orders and customer demographics