ACC307 Chapter 7- Data Analytics and Presentation

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Nominal/ Categorical

Categorize (equals and does not equal) YES Order (greater than, less than, equal) NO Calculation: NO permissible statistics: Mode more examples: Zip Code, telephone number, eye color Discrete

Ordinal

Categorize: YES Order: YES calculation: NO Permissible statistics: Mode, Median, Percentile more examples: Education Level, Letter Grade Discrete

Interval

Categorize: Yes Order: Yes Calculation: Some Permissible Statistics: Mode, Median, Percentile, Mean, St. Deviation, Correlation, Regression more examples: calendar date, calendar year continuous

Ratio

Categorize? Yes Order: Yes Calculation: All permissible statistics: mode, median, percentile, mean, st. dev., correlation, regression, and others. more examples: Age, heights, etc., continuous

Interval Data

Continuous Data without a meaningful zero point can be categorized, ordered, and calculated in most ways example: Temperature, Likert Scale example: SAT Score Example: GMAT Ex4: Credit Score "The distance between two consecutive data points are the same (+/-) without meaningful zero point (Cannot multiply or divide)

Ratio Data

Continuous data with a meaningful zero point can be categorized, ordered, and calculated in all manners. example: Age, Price, Size, length

Outliers

Descriptive Analytics a type of data structure Definition: observations that are radically different from the rest. no standard definition resulting from measurement error, data entry error, or other causes. the main purpose is to "call attention to data values that require additional review" not to represent the data. Why identify outliers? -might be errors/ fraud -might be more interesting/ useful -statistical assumptions- some stat methods are sensitive to outliers.

Nominal (Categorical) Variables

Discrete and Categorical Data can be categorized or counted cannot be ordered or calculated ex: brands, jobs, job titles, gender, cities, ID numbers Not "Coding" HSV=1 Dec =2 (meaningless) Special Case of Nominal- Binary/ Dichotomous (only two levels of data value) Proportion or Probability 1= Male, 0= Female (0.5)=50% male or female

Ordinal Variables

Discrete and Ranked data can be categorized and ordered cannot be calculated example of car preferences: tesla> Lexus> toyota ex of age group: How old are you? example of education Example of income group "The distance between two consecutive data points are different"

continuous

Interval & Ratio

discrete

Nominal and Ordinal

Diagnostic Analytics

a large manufacturer of farm equipment continuously analyzes data sent from engine sensors to understand how load, temperature, and other factors influence engine failure

Predictive Analytics

a shipyard company runs a computer simulation of how a tsunami would damage its shipyards, computing damages in terms of destruction and lost production time.

descriptive analytics

a small tax services business provides its financial statements to ta bank to get a loan so it can buy a new building to grow its business.

bullet graph

adds a "bullet" or small line by each bar that indicates an important benchmark. linking data only when a logical relationship exists

visualization (AKA "Vis" Visual Analytics/ Infographics)

any static or dynamic representation of data -visualized data is processed father than written or tabular information -visualizations are easier to use -visualization supports dominant learning style of the population because most learners are visual learners. goals for visual communication: decorative, indicative, informative

measures

are numerical values of metrics ex: 200 lbs BMI- 20.0

Pie of Pie Chart

comparing 6-10 components

Bar of Pie Chart

comparing 6-15 components

ratio

continuous data with a meaningful zero point

interval

continuous data without a meaningful zero point

discrete metrics

countable data gender (F/M) On time or not on time (Y/N) Number of on time deliveries

Nominal (categorial)

discrete and categorical data

ordinal

discrete and ranked data

histogram and boxplot

distribution (spread)

impression management

earnings management the process through which financial information perparers try to control or manipulate the impressions other people form of them through disclosures and financial graphs

orientation

easier to understand if oriented appropriately change chart orientation when labels are inevitably long present or sort data meaningfully

bar charts

for one series with a few periods grouped bar charts- only when "Within Group" comparisons are important, otherwise use line charts

line charts

for time series with many periods for multiple series

Quantity

goldilocks principle of containing not too much and not too little, but just the right amount of data -informative titles -succinct but sufficient legends and axes, avoid information overload: divide complicated graphs into separate ones, adjust the granularity/ quantity of information -gridlines or data values whenever appropriate -multiple formats: "Combo chart" only when necessary -Use 3D only when necessary "The third dimension carries information"

funnel charts

highlight the orders

distance

how far apart related information is presented removing distance aids in understanding and removes other unnecessary information

distribution

how often values in the data occur - choose approp. stat. methods -common distributions -Benford's Law

map chart

ideal to present geographic information

Metric

is used to quantify performance ex: lb/ kg/ EPS/ BMI

continuous metrics

measured on a continuum delivery time package weight purchase price

bar chart and bullet chart

numeric and categorical

scatterplot and heatmap

numeric and categorical

doughnut chart

occasionally used to compare two pie charts. Not as good as 100% stacked column/ bar charts

pie chart and treemap

part to whole proportion

radar chart

profile 2-6 variables

indicative visual

purpose: To provoke action examples: colors, animations, sounds principles: separate, divide, distance, contrast

decorative visual

purpose: to evoke feelings examples: shapes, symbols, colors, fonts Principles: do not interfere with the clarity of other elements

informative visual

purpose: to promote understanding examples: headings, titles, subtitles, highlights, summary, executive summary principles: Clear, filter out all but relevant details represent generalizations better than specifics

waterfall charts

reconcile changes over time/ between numbers

scatter plots

show relations between/ among data between two continuous variables

stacked area chart

show the contribution of each set to the total (i.e. values)

100% stacked area chart

show the proportion of each set to the total

measurement

the act of obtaining data associated with a metric

weighting

the amount of attention an element attracts

causation

the criteria to establish causality/ causation or fundamental criteria to judge if a theory fits observations -Covariation- association does not imply causation -reciprocal causality: A causes B, B causes A -absence of plausible rival hypotheses

ordering

the intentional arranging of visualization items to produce emphasis. present or sort data meaningfully

standard deviation

the square root of the variance -Easier to Interpret than the variance -A popular measure of risk or uncertainty influenced by outliers? Yes Mean 60, Score 90 St Dev 15- (90-60)/15 = 2 Z-Score

correlation

to determine the size, direction and strength of relationships between variables

box plot or box & whisker plot

to show more information of numeric variables

histogram

to show the distribution of a numeric variable

Line Chart and Area Chart

trend Evaluation Changes over time

Column Chart/ Rotated Bar Chart/ Horizontal Bar Chart

- categorical data variable on the y-axis and numeric data on the x- axis -ideal when long labels -Excel calls these "Bar Chart"

Predictive Analytics

-Goes beyond examining the past to answer the question, "What is likely to happen in the future?" -Build on Descriptive and Diagnostic Analytics -Basic assumption- History Repeats itself!

Prescriptive Analytics

An online retail company tracks past customer purchases. Based on the amount customers previously spent, the program automatically computes purchase discounts for current customer purchases to build loyalty.

bubble chart

show by size, similar to scatterplot

100% stacked bar/ Area Charts

using less space to tell better stories

variance

The mean of the squared deviation of that variable from its expected value or mean - has several stat advantages over others -influenced by outliers -frequently used, the square of Standard deviation

data deception

a graphical depiction of information designed with or without an intent to deceive, that may create a belief about the message or its components, which varies from the actual message. proportional trend left to right completeness: present complete data given the context.

data mining

a process of discovering patterns involving methods at the intersection of machine learning, statistics, and database systems.

predictive analytics

an airline downloads weather data for the past 10 years to help build a model that will estimate future fuel usage for flights.

prescriptive analytics

an all you can eat restaurant uses automated conveyor belts to bring cold food to the chefs for preparation. The conveyor belts bring the food to the chefs based on algorithms that monitor the number of people entering and leaving the restaurant.

Machine Learning

an application of artificial intelligence that allows computer systems to improve and update prediction models without explicit programming

2D Pie Chart:

comparing 2-5 components

highlighting

using colors, contrasts, callouts, labeling, fonts, arrows and others that bring attention to irem. highlight meaningfully use colors carefully within a culture colors can have natural meanings -use monochrome patters for color blind -gradients are used to indicate progressions from low to high, whereas distinct colors represent categories

Diagnostic Analytics

"Backward looking analytics" Build upon descriptive analytics to determine causal relationships why did this happen? more contextual information, hypothesis testing

Descriptive Analytics

"Backward looking" Focus on the past examines data to understand the past what happened? what is happening? Financial St. Scatterplot correlations basic and frequently used

Prescriptive Analytics

"Forward Looking analytics" Provide a recommendation of what should happen what should be done? Find the optimum solutions ex: highest profit/ lowest cost/ lowest risks/ highest return

Predictive Analytics

"forward looking analytics" Apply assumptions and focus on predicting the future what might happen in the future? regression time- series analysis Data Mining Alternatives

Pie Charts

(Aka Circle Chart) a circular statistical graphic divided into slices to illustrate numerical proportion. explode a pie chart only when necessary

mode

(Descriptive Analytics) -The most frequently occurring value in the sample -The only descriptive stats for variables with NOMINAL scale. - Not influenced by Outliers

Percentile

(Descriptive analytics) -value of a variable below which a certain percent of observations fall -Median = 50th percentile, Min 0 percentile, Max

overfitting

(Predictive Analytics) a model overly captures random errors or noises, instead of describing underlying relationships

Median

(descriptive Analytics) -The numerical value separating the higher half of a dataset from the lower half (50% above, 50% below, 50th percentile) -For Ratio, Interval, and Ordinal Variables Not influenced by outliers

range

(descriptive analytics) -the difference between the maximum value and the minimum value in the dataset. -influenced by outliers

Quartile

(descriptive analytics) a set of values are the three points that divide the data set into four equal groups - 1st quartile- 75th percentile -2nd quartile- median/ 50th percentile -3rd Quartile- 25th percentile -not influenced by outliers

correlation

(descriptive analytics) any statistical relationship between two random variables or bivariate data common measure- pearson correlation coefficient a measure of the linear association between two variables

mean

(descriptive analytics) the average amount -the sum of the observations divided by the number of observations - For Interval and Ratio variables -Influenced by the values of outliers

confirmatory modeling

(predictive Analytics) -to fit historical data closely -the entire dataset is used for estimating the best-fit mode, to max the amount of information that we have about the hypothesized relationship in the population -might overfit- captured all relationship in the historical dataset including non-recurring events.

predictive modeling

(predictive Analytics) to best predict the future partitioned datasets are used, where training dataset is used to estimate the model and validation dataset to assess this model's performance on new, unobserved data.

Diagnostic Analytics

- goes beyond examining "what happened" to answer the question "Why did this happen" -build on Descriptive Analytics using logic and basic tests to reveal relationships and explain historical events or associations -can be formal or informal- Hypothesis Testing

Prescriptive Analytics

-Offers recommendations to take or programmed actions, just like doctors recommend a substance or action - utilizes artificial intelligence, machine learning, and other stats to make predictions Common Techniques: Linear Programming- an optimization technique for a system of linear constraints and a linear objective function. - Self Driving Cars

Predictive Analytics

-Recurring Events -non- recurring events (Noises) -two different/ conflicting goals using historical events to predict the future: confirmatory modeling and Predictive Modeling

basic visualization design principles

-Simplification: making a visualization easy to interpret and understand -emphasis: assuring the most important message is easily identifiable -ethical data presentation- avoiding the intentional / unintentional use of deceptive practices that can alter the user's understanding of the data being presented.

How to Identify Outliers

-Sort the Records - Examine Max and Min - Compare Mean and Median -Create Scatterplots -perform conditional formatting -perform cluster analysis for complicated data How to treat outliers> depends on true cause of outliers if type, correct if not, keeping the outliers influences the outcomes

Bar Chart/ Bar graph/ Bar Plot/ Vertical Bar Chart

-categorical data variable on the x-axis and numeric data on the y-axis -ideal to show trend fewer than 12 periods -excel calls these "Column Chart"

treemaps

-nested rectangles to show the amount that each group or category contributes -used to highlight hierarchy among data elements

heatmap

-show by colors, looks like a data table but uses colors to show the magnitude of the different entries. -easily created by using conditional formatting in excel

SEC's Plain English Disclosure

1. Short Sentences 2. definite, concrete, everyday language 3. active voice 4. bullet lists 5. no legal jargon 6. no double negatives

area chart

A line chart with the areas below the lines filled with colors. highlight one portion of the line

Descriptive Analytics

A self driving car company uses artificial intelligence to help clean its historic social media data so they can analyze trends

Descriptive Analytics

Address the question "What Happened?" Historical Applies exploratory data analysis to: find mistakes in the data to understand the structure of the data to check the assumptions required to determine the size, direction, and strength of relationships between variables (Correlation)

Diagnostic Analytics

An Accounting Firm is trying to understand if its external audit fees are appropriate. They compute a regression using public data from all companies in their industry to understand the factors associated with higher audit.


Kaugnay na mga set ng pag-aaral

Sub-queries and MERGE statements

View Set

Chapter 7: Concepts of Bio Midterm

View Set

Social studies map like 10 biggest country Saudi Arabia

View Set

75% Math, Data Analysis and Probability

View Set

Prokaryotic and Eukaryotic Cells

View Set

Engine Performance 2: Chapter 1: Fuel-Injection System Diagnosis and Service

View Set