Analytics CH 3

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

diagnostic analytics

Procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to drill-down in the data and see how it compares to a budget, a competitor, or trend.

pruning

removes branches from a decision tree to avoid overfitting the model

decision support systems

rule-based systems that gather data and recommend actions based on the input.

summary statistics data reduction or filtering

types of descriptive analytics

profiling clustering similarity matching co-occurrence grouping

types of diagnostic analytics

regression classification link prediction

types of predictive analytics

decision trees

used to divide data into smaller groups.

data reduction or filtering

used to reduce the amount of observations to focus on relevant items (i.e., highest cost, highest risk, largest impact, etc.). It does this by taking a large set of data (perhaps the population) and reducing it to a smaller set that has the vast majority of the critical information of the larger set.

linear classifiers

useful for ranking items rather than simply predicting class probability. useful for determining the really important values, such as valuable customers, or which transactions are most likely fraudulent.

similarity matching

a grouping technique used to identify similar individuals based on data known about them.

goal of classification

to predict whether an individual will belong to one class or another

causal modeling

A data approach similar to regression, but used when the relationship between independent and dependent variables where it is hypothesized that the independent variables cause or are associated with the dependent variable.

support vector machine

A discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe).

XBRL (eXtensible Business Reporting Language)

A global standard for exchanging financial reporting information that uses XML.

decision support system

An information system that supports decision-making activity within a business by combining data and expertise to solve problems and perform calculations

benford's law

An observation about the frequency of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the significant lending digit is likely to be small.

unsupervised approach

Approach used for data exploration looking for potential patterns of interest

supervised approach/method

Approach used to learn more about the basic relationships between independent and dependent variables that are hypothesized to exist.

prescriptive analytics

Procedures that model data to enable recommendations for what should be done in the future. These typically include developing more advanced machine learning and artificial intelligence models to recommend a course of action based on a current problem.

descriptive analytics

Procedures that summarize existing data to determine what has happened in the past. Some examples include summary statistics (e.g. Count, Min, Max, Average, Median), distributions, and proportions.

predictive analytics

Procedures used to generate a model that can be used to determine what is likely to happen in the future. Examples include regression analysis, forecasting, classification, and other predictive modeling.

clustering algorithms

calculate the minimum distance of all observations and groups those elements

structured data

data that are stored in a database or spreadsheet and are readily searchable.

f(independent variable)

dependent variable =

summary statistics

describe a set of data in terms of their location (mean, median), range (standard deviation, minimum, maximum), shape (quartile), and size (count).

co-occurrence grouping

discovers associations between individuals based on common events, such as transactions they are involved in.

regression

estimates or predicts the numerical value of a dependent variable based on the slope and intersect of a line and the value of an independent variable.

training data

existing data that have been manually evaluated and assigned a class.

test data

existing data used to evaluate the model

clustering

helps identify groups of individuals (such as customers) that share common underlying characteristics—in other words, identifying groups of similar data elements and the underlying drivers of those groups.

profiling

identifies the "typical" behavior of an individual, group, or population by compiling summary statistics about the data (including mean, standard deviations, etc.) and comparing individuals to the population

machine learning and artificial intelligence

learning models or intelligent agents that adapt to new external data to recommend a course of action.

fuzzy matching

locates approximate matches. useful for identifying relationships in imperfect data

decision boundaries

mark the split between one class and another.

overfitting

models that are too accurate. they are actually pretty bad at predicting a future observation

classification

predicts a class or category for a new observation based on the manual identification of classes from previous observations.

link prediction

predicts a relationship between two data items, such as members of a social media platform.

1. identify the classes you wish to predict 2. manually classify an existing set of records 3. select a set of classification models 4. divide your data into training and testing sets 5. generate your model 6. interpret the results and select the "best" model

steps for classification

1. identify the attribute to reduce/focus on 2. filter the results 3. interpret the results 4. follow up on the results

steps for data reduction

1. identify objects or activity to profile 2. determine types of profiling to perform 3. set boundaries/thresholds for the activity 4. interpret the results and monitor activity and/or generate a list of exceptions 5. follow up on exceptions

steps for profiling

1. identify the variables that might predict the outcome 2. determine the functional form of the relationship 3. identify the parameters of the model

steps for regression


Set pelajaran terkait

Old Testament Survey Prophets Test

View Set

Nutrition Chapter 6 Midterm Review

View Set

Chapter 33: Drug Therapy for Asthma and Bronchoconstriction

View Set

Women's health/Disorders & Childbearing

View Set

APUSH Ch. 24 An Affluent Society,1953-1960

View Set

CISSP Official ISC2 practice tests - Domain 6 Q&A only

View Set

NUTR 315 Chapter 2: Tools of a Healthy Diet

View Set

LAL Figurative Language Set Group 1

View Set

Intro to Graphic and Web Design - Unit 6: Final Exam

View Set

Scientific Theories and Thinking

View Set