STAT 302 - Exam 1
~
"has the distribution" so X ~ N(u) for example
Categorical Variable
- Based on groupings - Which one?
Quantitative Variable
- Based on numbers in which a mean makes sense - How many? How much?
Statistic Types Used for Quantitative Variables
- Center (mean/median) - Spread (range, standard deviation, IQR)
Charts for Quantitative Variables
- Dot plots - Histograms - Box plots
Statistic Types Used for Categorical Variables
- Frequency - Proportion - Percentage
Describing the distribution of quantitative variables
- Shape (modality/skewed) - Center (mean/median) - Spread (range) - Abnormalities
Correlation, r
- Values between 1 and -1 - r = 0 means 0 correlation - Not resistant to outliers
Moderate r values
0.3 - 0.7
Strong r values
0.7+
1.5 X IQR Criteria
1.5 x IQR below Q1 or above Q3 to be an outlier
Summary Statistic
A calculation for a group of data, such as a total, an average, or a count
Random Variable
A numerical description of the outcome of an experiment - X, Y refer to the random variable itself (weight) - x, y refer to values taken by the random variable (152 lbs)
Random Experiment
A situation involving chance that leads to an outcome
Resistant
A statistic that is not strongly influenced by outliers
Influential
A statistic that is strongly influenced by outliers
Confounding Variable
Associated with both the response and explanatory variable, makes stats difficult
Simpson's Paradox
Associations between variables are reversesd when different categories are combined
Mean
Average of a sample obtained by dividing the sum of all values by the number of values obtained - Not resistant to outliers
Statistic
Calculated, numerical value of the sample
Bar Chart
Categorized, order doesn't matter
Response Variable
Dependent Variable - Y axis
Deviation
Difference between observation and the mean __ x - x
Population
Entire group of people that the researcher is interested in
Association
Exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable
Proportion
Frequency divided by total number of individuals
Sample
Group of people we collect data on
Correlation
How close the dots are to the line
Explanatory Variable
Independent Variable, explains why the response variable is the way it is - X axis
Random
Individual outcomes are uncertain but nonetheless a regular distribution of outcomes in a large number of repetitions
Inferential Statistics
Inferences made based on the data collected
IQR
Inter-Quartile Range - Chop off bottom 25% and top 25% percentile - Middle 50% of all data values - IQR = Q3 - Q1 - Resistant to outliers
Weak r values
Less than 0.3
Range
Maximum - minimum - Worst measure of spread - Not resistant to outliers
If the shape of the distribution is skewed left...
Mean < Median
If the shape of the distribution is symmetric...
Mean = Median (or very close)
If the shape of the distribution is skewed right...
Mean > Median
Not resistant to outliers
Mean, range, standard deviation
Q2
Median of entire distribution - Middle of the overall median
Q1
Median of first half of values - Left of the overall median
Q3
Median of second half of values - Right of the overall median
Resistant to outliers
Median, IQR
Median
Middle value of a sample - Resistant to outliers
Independent
No relationship between explanatory and response variable
Normal Distribution
Normal, bell-shaped curve - Continuous random variable - Area = 1, center u = 0
Frequency
Number of individuals in a category
Z-score
Number of standard deviations an observed measurement x is from the mean - Above the mean: + - Below the mean: negative - "Observation minus mean over standard deviation"
Parameter
Numerical summary of the POPULATION
Descriptive Statistics
Numerical summary of the SAMPLE
Positive Deviation
Observation is above average
Negative Deviation
Observation is below average
Rule of Multiplication
P(AnB) = P(A) x P(B|A) - Probability that A and B both occur
Rule of Addition
P(AuB) = P(A) + P(B) - P(AnB) - Probability that either event occurs
Conditional Probability
P(A|B) - The probability that A occurs given that B has occurred
Bayes's Formula
P(A|B) = P(B|A) x P(A) ---------------- P(B) - Probability of an event, based on conditions that might be related to the event - Helps us find P(A|B) given P(B|A)
u
Parameter mean
M
Parameter median
What types of charts can you use for categorical variables?
Pie charts and bar charts
Continuous Random Variable
Possible values are an interval rather than a set
Histogram
Quantitative (use numbers), order matters
Conditional Distribution
Referring to a specific variable in a table--in the picture, it would be one of the "Neither Disagree or Agree" columns/rows
Marginal Distribution
Referring to the margins of a table--usually the calculated totals of rows/columns (in the picture, the totals)
__ X
Sample mean
^ M
Sample median
Discrete Random Variable
Set of separate values (0, 1, 2, etc.) - Find the probability for any event by adding the probabilities of the individual outcomes for that event
Distribution
Shows all possible values of data
SRS
Simple Random Sample - Set of individuals chosen from a larger population
Probability Distribution
Specifies values and their probabilities for a random variable
Variance
Square of the standard deviation
p-value
Subtract this number from 1 and then multiply by 100 to find how accurate you may claim to be
Complement
The event is not occurring - A'
Law of Large Numbers
The larger the number of individuals that are randomly drawn from a population, the more representative the resulting group will be of the entire population
Probability
The likelihood that a particular event will occur - Before the event has occurred
Mutually Exclusive / Disjoint Events
Two events that cannot occur at the same time
Discrete Variable
Type of Quantitative Variable - Amount of something
Continuous Variable
Type of Quantitative Variable - Numerical values over an interval - How much?
Standard Deviation
Typical distance of an observation from the mean - Not resistant to outliers
Mode
Value that appears the most in a set of data
Lurking Variable
Variable not considered in a study but has an effect on the results
Empirical Rule
When the distribution of data is normal: - 68% of observations fall within 1 standard deviation of the mean - 95% of observations fall within 2 standard deviations of the mean - 99.7% of observations fall within 3 standard deviations of the mean
If the standard deviation = 0, then ___
all numbers are equal.
Mean is ___ than median if the graph is skewed right
larger
Side-by-side bar charts are best at measuring ___ values.
numerical
Stacked bar charts are best at measuring ___
proprotions
Mean is __ than median if the graph is skewed left
smaller