BIS 633 Exam 1
the big sigma sign means...
"summation of"
Formula for median with an ODD numbered list...
(# data points + 1) / 2
**Revisit A02 #3 (hard!)
**Revisit A02 #3 (hard!)
the sum of all possible probabilities is:
1.0
To operate at a 6 Sigma quality level, what is the maximum number of defects per million opportunities in long-term data?
3.4
empirical rules
68-95-99.7; 68 - 1 SD, 95 - 2 SD, 99.7 - 3 SD); does NOT apply to Chebyshev's Theorum
Formula for % increase (See A02 #1)
=(n-x)/x (for example)
Formula for generating a random variable...
=*.INV(RAND(), <other parameters>); eg. =NORM.INV(RAND(), 0, 1)
Formula for Exponential Distribution
=EXPON.DIST(x, mean, cumulative)
Formula in Excel to calculate RANGE
=Max(data range) - Min(data range)
Formula for Poisson Function...
=POISSON.DIST(x, mean, cumulative)
Descriptive Model
A model of the average profit earned per product sold last month would be what category of...
Predictive Model
A model of the expected profit per product that we expect to sell next month would be what category of...
Prescriptive Model
A model that indicates what the item price should be to optimize total revenue (price * units sold) would be what category of...
Business Analytics
A process of transforming data into actions through analysis and insights in the context of organizational decision making and problem solving.
Formula for frequency (See A04 #25 solution)
COUNTIF(range (ab. ref), criteria (ab. ref.)
The hair color of a person (brown, black, blonde, red) would be what type of data?
Categorical (based on characteristics of the instance or observation)
Data Mining
Focused on better understanding characteristics and patterns among variables in large databases using a variety of statistical and analytical tools.
Comparing two datasets, the dataset with the larger standard deviation:
Has greater dispersion; is more spread out away from the mean.
In an inter-relationship diagram, what determines the strongest independent (input) variable?
Highest number; arrows going out
In an inter-relationship diagram, what determines the strongest dependent (output) variable?
Lowest number; arows coming out
The order (1st, 2nd, 3rd, ...) that runners finish a distance race is what type of data?
Ordinal (indicates order or rank against a "scale")
Formula for finding Percentiles...(See A04 #30)
PERCENTILES.INC(range, nth% in decimal format)
According to Six Sigma, if quality is not acceptable, what is broken?
Processes
The weight of my dog, Winston, would be what type of data?
Ratio/Interval (has a natural or arbitrary zero)
Formula to calculate total sales revenue with certain criteria. (See A02 #2)
SUMIF(from range, specific criteria cell, total revenue)
Formula for median with an EVEN numbered list...
The average of the two middle values. eg. (x + y) / 2
If the mean and median of a distribution are not equal, this indicates:
The distribution is not symmetric around the mean; it is skewed.
The standard deviation is an important metric of a distribution because...
The standard deviation is expressed in the same units as the variable, and can be used to estimate the percentage of population values within a range around the mean.
If I collect the last digit of the license plate numbers of several random cars in the parking lot, what is the likely distribution of the dataset of those digits?
Uniform distribution
one way to generate random samples is to use...
VLOOKUP
Formula for VLOOKUP is...
VLOOKUP(lookup value, lookup table, column reference, match type true OR false). the column index # is the column in the table range we wish to retrieve.
mean
average
Formula to find a standardized value, or "z-score"
by hand: zi = (xi-x) / s; in Excel: =STANDARDIZE(n, mean, sd)
examples of interval data
calendar dates, time, temperature
sample size "rule of thumb"
collect a sample 10x the number of independent variables
Use bar charts to...
compare categorical or ordinal data
a discrete probability distribution is...
constructed from a set of random variables and the associated probabilities of their values
continuous probability distribution
constructed from a set of random variables and the associated probabilities of their values
outliers
data points that are far outside the typical values in a data set, usually > 3 standard deviations away from the mean..INVESTIGATE!
square nodes are...
decision variables
uniform distribution means...
every value has the same probability
Formula for Relative Frequency (See A04 #25)
frequency / sum of frequencies
NORM.INV()
function that can simulate a random variable that follows the normal distribution
bi modal distribution
graph that looks like a camel's back
ratio data
have a natural zero (money balances, quantities multiplied/divided by one another)
Inter-Relationship Diagram
helps to determine which variables are likely independent, and which are likely dependent on other variables
ordinal data
indicate order or rank against a scale (eg. likert scale)
Sample Correlation Coefficient
indicates how correlated the variables are to one another (eg. green, yellow, and orange arrow with "Perfect" & "Fair")
skewness statistic
indicates whether the data is skewed to one side of the mean or the other
circular nodes are...
intermediate variables
median
middle value in the list of numbers IN ORDER, so you may need to rearrange the list!!
to determine the expected value of a discrete distribution...
multiply each possible value of the random variable by its probability, and sum the results
categorical data
not indicated by numbers! (eg. male, female, green, blue)
oval nodes are
outcome of interest variables
Formula for determining input (independent, x) and output (dependent, y) variables...
outgoing - incoming arrows = score. Most outgoing = input. Most incoming = output (Y)
use scatter chart to
plot paired values of one variable against the other to look for relationships
histogram
presents the number of occurrences in an event by category or a range of values
What do you select for a SUMIF function?
range entire column ($), criteria placed somewhere, sum range entire column ($) (See q. 5 on Quiz 2 for eg.)
relative frequency diagram
relates the % of occurrences in an event by category or range of values
Formula for Cumulative Relative Frequency (See A04 #25)
relative frequency + next relative frequency
Pareto Principle
says that typically a large percentage of a characteristic of interest is generated by a relatively small number of items, individuals, or customers (for eg., 80% of wealth is owned by 20% of people)
use line charts to...
show continuous data or change over time
use pie chart to...
show percentages of a whole
use area chart to...
show percentages of a whole for continuous data
use surface chart to...
show the surface created by 3-dimensional data (cowboy hat)
use bubble chart to...
show values of a 3rd variable plotted on a scatter chart
A Venn Diagram is good for...
showing the sample space of an event pictorally
the probability density function or probability mass function...
shows the probability of each value of a discrete random variable; shows the value of each value of the random variable
cumulative distribution function
specifies the probability that a random variable has a value less than or equal to the value of x
central limit theorem
states that the calculated arithmetic mean of a large number of observations will, if repeated many times, be normally distributed
data misrepresentation include:
stretching or compressing the X or Y axis, not starting the axes at 0, using counts vs. percentages, presenting only the mean or median
standard deviation of a probability distribution is found by...
summing the squares of the differences between each value of the random and the expected value, times the probability of that value, then taking the square root of the sum
µ
symbol for the mean of a population
"x bar"
symbol of the mean of a sample
To calculate the variance...
take the difference between each value of the random variable and the expected value, square it, and sum the squares.
expected value
the "average" value of the random variable, found by summing each value times its probability; can be considered the "balance point" of the probability density function
sample space
the collection of all possible outcomes
range
the difference between the smallest and largest values
arrows in an influence diagram show...
the direction and type of influence of one variable on another
probability
the likelihood that an outcome will occur
standard deviation is...
the measure of variability
Prescriptive
the model in which you vary inputs to reach a desired output (eg. optimization models)
Data Visualization
the process of displaying data in a meaningful way to provide insights that will support better decisions
experiment or trial
the process that results in an outcome
a standardized value, or "z-score" is...
the relative distance of an observation from the mean, expressed in standard deviations
the calculations for the mean of a sample or a population are...
the same; only the notation differs (µ or x bar)
standard deviation is...
the square root of the variance
mode
the value that occurs most often; if no number is repeated, then there is no mode
probability can be determined by:
theoretical deduction, empirical data, and/or subjective experience
purpose of Six Sigma
to reduce process variation; hit the target and make it more repeatable; reduce the standard deviation
Chebyshev's Theorum
used to determine how many observations you'd expect to find in a certain number of SD's from the mean for ANY distribution, not just normal distributions
Poisson Function
used to model the number of events within a defined period of time
Exponential Distribution
used to model the time between events, if the number of events follows the Poisson distribution
Beta Distribution
useful for modeling the completion time of tasks in a project plan
