Chapter 1. What is business Analytics?

¡Supera tus tareas y exámenes ahora con Quizwiz!

Measurement Scales - Ratio data

- Data that have a natural zero - Properties -- Strongest form of measurement; both ratios and differences are meaningful.

Data science

- A mix of skills in the areas of statistics, machine learning, math, programming, business, and IT.

Confidence

- A performance measure in association rules of the type "IF A and B are purchased, THEN C is also purchased." Confidence is the conditional probability that C will be purchased IF A and B are purchased. -- Also has a broader meaning in statistics (Confidence interval), concerning the degree of error in an estimate that results from selecting one sample as opposed to another.

Holdout Data

- A sample of data not used in fitting a model, but instead used to assess the performance of that model. This book uses the term "Validation set and test set."

Algorithm

- A specific procedure used to implement a particular data mining technique: classification tree, discrimination analysis, and the like.

Predictor

- A variable, usually denoted by X, used as an input into a predictive model. -- AKA: feature, input variable*, independent variable, or from a database perspective, a field.

Response

- A variable, usually denoted by Y, which is the variable being predicted in supervised learning. -- AKA: dependent variable, output variable, target variable, or outcome variable.

Unsupervised: Association Rules

- Affinity Analysis - Goal: Produce rules that define "what goes on with what" at population level - Ex: "If X was purchased, Y was also purchased." - Rows are transactions - Finds patterns between items in a large database to generate rules for an entire population.

Data mining

- Business analytics methods that go beyond counts, descriptive techniques, and reporting, and methods based on business rules.

Predictive Analytics

- Classification - Prediction - Association Rules (affinity analysis and collaborative filleting) - Clustering

Business Intelligence

- Data visualization and reporting for understanding "what happened and what is happening." -- Done by the use of charts, tables, and dashboards to display, examine, and explore data.

Supervised: Classification

- Goal: Predict categorical target (outcome) variable - Ex: Purchase/no purchase, fraud/no fraud. - Each row is a case (customer, tax return) - Each column in a variable - Target variable is often binary (Yes/no)

Supervised: prediction

- Goal: Predict numerical target (outcome) variable. - Ex: sales, revenue, performance - Each row is a case (customer, tax return) - Each column in a variable

Supervised Learning

- Goal: predict a single "target" or "outcome" variable - Methods: Prediction and classification

Unsupervised leraning

- Goal: segment data into meaningful segments; detect patterns - There is no target (outcome) variable to predict or classify - Methods: Association rules, data reduction & exploration, visualization.

Column

- In spreadsheets, each column represents a variable.

Row

- In spreadsheets, each row typically represents a record.

Challenges with big data

- Often characterized by the four V's -- Volume -- Velocity -- Variety -- Veracity

Challenges with big data - Volume

- Refers to the amount of data

Challenges with big data - Variety

- Refers to the different types of data being generated (currency, dates, numbers, text, ect.).

Challenges with big data - Veracity

- Refers to the fact that data is being generated by organic distributed processes (e.g., millions of people signing up for services for free downloads) and not subject to the controls or quality checks that apply to data collected for a study.

Challenges with big data - Velocity

- Refers to the flow rate - the speed at which it is being generated and changed.

Observation

- The unit of analysis on which the measurements are taken. -- AKA: instance, sample, example, case, record*, pattern, or row.

Measurement scales - Ordinal data

- data ordered or ranked according to some relationship to one another. - Properties -- Categories can be compared with one another

Big data

- data today are big by reference to the past, and to the methods and devices available to deal with them.

Machine learning

- refers to algorithms that learn directly from data, especially local patterns, often in layered or iterative fashion.

Statistical models

- refers to methods that apply globe structure to the data.

Business Analytics

- the practice and art of bringing quantitative data to bear on decision making.

Overfitting

- where a model is fit so closely to the available sample of data that it describes not merely structural characteristics of that data but random peculiarities as well.

Measurement scales - Categorical (nominal)

- Data sorted into mutually exclusive (an observation cannot belong to more than one category) categories -- Geographical region, type of employee - Properties -- No quantitative relationships among categories -- Only mathematical operations are counting and simple statistics.

Measurement Scales - Interval data

- Data that are ordered and characterized by a specified measure of distance between observations, but with no natural zero - Properties -- Ratios are meaningless


Conjuntos de estudio relacionados

Long-Run Growth and Model of Production

View Set

Infertility (Week 8 Prep-U Quiz)

View Set

Keeping Foods Safe Question Guide

View Set

Olds Maternal-Newborn nursing ch 26

View Set

GEB Chapter 5, GEB Chapter 6, GEB Chapter 7, GEB Chapter 8

View Set

therapy III - melatonin, sleep disorders

View Set

Applied Discussion Unit 6 Box Notes 1-16

View Set