STATS S301 IU
mutually exclusive
2 events cannot occur at the same time
sample space
all possible outcomes of an experiment
population
all subjects of interest
inferential statistics
allows us to make claims based on a sample
continuous data
any real number, infinite number of vales
percentiles
approximate percent of values in data set that are below value of interest
mean
average
pareto chart
bar charts that show frequency of the categories that cause quality control problems. show categories in a decreasing order
discrete data
can be counted, whole numbers, finite number of values
class
category
median
center point value, best choice when outliers present
CV=standard deviation (s) divided by mean(x with line), all times 100
coefficient variation formula
primary data
collected for your own use
nCx=n!/(n-x)!x!
combination formula
empirical probability
conducting an experiment to observe the frequency with which event occurs
secondary data
data collected by someone else that you're "borrowing", no control over how the data was collected
Information
data that are transformed into useful facts that can be used for a purpose like making a decision
relative frequency distributions
display the proportions of observations of each class relative to the total number of observations. Sum of distributions totals to 1
bar charts
good tool for displaying qualitative data that is organized in categories and can be arrange in horizontal or vertical fashion
stacked bar charts
group several values in a single column within the same category, shows totals
clustered bar charts
group several values side by side within the same category, shows comparison
range
highest value-lowest value, one measure of variability
contingency table
identify relationship between 2 or more variables
(AnB)
intersection, and
ratio data
interval, true point of zero
median>mean
left skew
discrete probability distribution
list of all possible outcomes for discrete random variable along with relative frequencies
expected monetary value
mean of discrete probability distribution in terms of money
sample correlation coefficient
measures both strength and direction of linear relationships between two variables
coefficient variation
measures standard deviation in terms of percentage of the mean
sample covariance
measures the direction of the linear relationship between two variable
outliers
much higher/lower values in the set, can mislead the mean value
<mean
negative z-score
ordinal data
nominal, can be ranked
z-score
number of standard deviations a particular value is from the mean of its distribution
dependent
occurrence of 1 event affects the other
independent
occurrence of 1 event has no impact on the other, P(A|B)=P(A)
value of interest= percentile divided by 100, all times the total number of data values
percentile formula
scatter plot
picture relationship between two quantitative variables
u
population mean symbol
N
population size symbol
o^2
population variance symbol
index point
position of median in data set
>mean
positive z-score
conditional probability
probability of an event, given another event has occurred
nominal data
qualitative, descriptive
interval data
quantitative
subjective probability
rely on experience and intuition
ogive
represented by red line above a pareto chart
symmetrical distribution
right side of histogram mirrors left side
mean>median
right skew
empirical rule
rule that says distribution that follows a bell curve means approx 68%, 95%, 99.7% of values fall between 1, 2, 3, standard deviations above and below the mean
rxy= sample covariance, divided by x standard deviation times y standard deviation
sample correlation coefficient formula
rxy
sample correlation coefficient symbol
sxy=sum of (x values minus x mean) and (y values minus y mean), all divided by n-1
sample covariance formula
sxy
sample covariance symbol
x with line over it
sample mean symbol
s^2
sample variance symbol
frequency distribution
shows the number of data observations in specific intervals
central tenancy
single value to describe the center point
more, less
smaller CV means ____ consistency within a set of values, larger CV means ____ consistency
variance
spread of data points around a set mean
square root of variance = ____
standard deviation
s
standard deviation symbol
sample
subset of population
descriptive statistics
summarize/display data
n
total number of data values in sample
cumulative relative frequency distribution
totals the proportion of observations that are less/equal to the class you're looking at
(AuB)
union, or
pie charts
used when comparing relative sizes of classes to each other
x
value in sample
mode
value that appears most often, best describes central tendency
cross-sectional data
values collected from multiple subjects during a SINGLE time period
time series data
values that correspond to specific measurements taken over a RANGE of periods
classical probability
we know number of possible outcomes
law of large numbers
when an experiment is conducted many times, empirical probability will be similar to classical probability
when the mean and median are roughly equal
when is data symmetrical
z=x value-mean, divided by standard deviation
z-score formula
=mean
z-score of 0
-3>x>3
z-score that shows outlier