MGSC 291 Hendrix
Z=
(x - μ) / σ
Pie Charts
-Qualitative -displays parts of a whole -not good when there are too many categories -NEVER make 3D or titled -not good for comparisons
Boxplot
-displays quantitative data - works for small to large datasets -plots the five number summary -great for side-by-side comparisons
Histograms
-medium to large quantitative sets -bins touch -choice of number of bins can distort features of the shape of the distribution
Bar Graph
-qualitative data -can be horizontal or vertical -display parts of a whole or separate value
Line Graph
-quantitative data changing over time -use different lines to denote separate categories -beware of plotting different scales
Skewed left
-the tail to the left of the peak is longer than the tail to the right of the peak -mean<median
Skewed right
-the tail to the right of the peak is longer than the tail to the left of the peak -mean>median
Scatterplot
-used to depict two potentially related variables -linear, curvilinera, or no relationship -positive or negative relationship
1. Chebychev's 2. Empirical Rule
2 ways to estimate percent of observations with certain sd
0.5
=p
dim()
Check how big a set is in R
name<-read.csv(file.choose(),header=TRUE)
Code to call a data set into R (csv)
name<-table.read(file.choose(),header=TRUE)
Code to call a data set into R (excel table)
1-P(A)
Complement rule
P(AintersectB)= P(A|B)P(B)
Conditional probability
X-bar has a normal distribution
Confidence intervals work when
P(X)<a
Continuous probability P(X)<=a is the same as
Poisson Process
Discrete Probability -only one event can occur at a particular point -events occurring in one range are independent of events occurring in other ranges -the expected number of events during any such interval is constant m
Binomial Experiment
Discrete Probability n identical trials where: - each trial has only 2 outcomes - probability of "success" is a constant p for every trial -Trials are independent
np
E[X]=
1/lambda
Expected value for continuous probability
Mean
Expected value=
Q3-Q1
IQR
the distribution is normal
If the sample mean is normal
n
No matter what, round up. needs to be a whole number
Event A will not occur
P(A)=0
Event A will surely occur
P(A)=1
Event A will happen 50% of the time
P(A)=1/2
P(AintersectB)/P(B)
P(A|B)=
When events are disjoint
P(A|B)=0 P(B|A)=0
When events are independtent
P(A|B)=P(A) P(B|A)=P(B)
pbinom(j,n,p)
R code for binomial experiment where P(X)<=j
dbinom(j,n,p)
R code for binomial experiment where P(X)=j
qnorm(prob.,mu,sd)
R code for finding the percentile of a probability under a normal distribtion
ppois(j,lambda)
R code for poisson process where P(X)<=j
dpois(j,lambda)
R code for poisson process where P(X)=j
pnorm(x,mu,sd)
R code for probabilities under a normal distribution
pexp(t,lambda)
R code to compute probability under the exponential distribution
prop.test()
R code to compute the CI for the population proportion, p.
t.test(name,conf.level=.9)
R code to give the confidence interval of 90%
t.test()
R code to give the confidence intervals of 95%
s
Sample standard deviation
Coefficient of Variation
Sd expressed as a percent of the mean s/x̅ *100 -compare variation in datasets with different units or means
Big Data
The huge capaciity of warehouses
CLT
The sampling distribution of a sum or percentage will become approximately normal as the sample size gets larger
Empirical Rule
Unimodal distributions that are fairly symmetric - 68% = 1 sd of the mean -95% = 2 sd of the mean - 99.7% = 3 sd of the mean
U
Union
P(A U B)= P(A)+P(B)-P(AintersectB)
Union (Addition) rule
np(1-p)
Var(X)=
mean is pulled in the direction of the outlier, median stays the same
What happens to mean/ median when an outlier is added?
report the mean and sd
When data is symmetric
report the 5 number summary
When data skewed left/right
nameofdoc$Columnname
Work with 1 column at a time in R
Variables
are measured in the columns
Characteristics
are measured in the rows
Union rule
at least one event happened
Descriptive statistics
collecting, organizing, and presenting the data
Data warehouse
data are recorded and stored electronically, in vast digital repositories
Larger sample size
decrease width of CI
Sampling variability
different samples from the same population may yield different values of the sample statistic
Inferential Statistics
drawing conclusions about a population based on sample data from that population.
<-
equal
Chebychev's Rule
for any population with a mean and sd the percent of observations that lie within k sd of the mean is at least 1-(1/k^2)*100
Standard normal distribution
has a mean of 0 and variance of 1.
Quantitative
have a numerical value (must have units)
Disjoint events/ mutually exclusive
have no intersection
Independent
if the occurrence of one event does not affect the probability of the occurrence of the other event
Larger confidence level
increase width of CI
Qualitative
is categorical
lambda
mean (poisson)
Sampling error
minimize the difference in statistics from sample to sample
Statistic (#)
number calculated from a sample and is used to estimate the parameter
Parameter
number used to describe a population
Random sampling
reduce bias
Increase sample size
reduce variability, reduce sampling error
x̅
sample mean
Z-score
the number of standard deviations a particular score is above or below the mean (normal distribution)
Z-score
the number of standard deviations above or below average
Statistics
the study of the collection, organization, analysis, interpretation, and presentation of data
Y
the variance of a discrete random variable=
If the sample is large enough
then: -the sampling distribution of x-bar is approx. normal -the mean of the distribution is mu -the sd is (sigma)/sqrt(n)
Nominal
used only to name categories
Ordinal
variables have an order to them
Time Series
variables that are measured at regular intervals over time
Cross-sectional data
when several variables are all measured at the same time point
Biased sample
when summary characteristics of the sample differ systematically from those of the population
Events are dependent
when the given intersection % does not equal the individual %'s added together