7007
ANOVA assumptions
- Continuous outcomes and independent groups - Independent observations - Normally distributed outcomes - Equal variance of the outcome
One-sample t-test assumptions
- Continuous variable - Independent observations - Normal distribution
Dependent-samples t-test assumptions
- Continuous variable and two dependent groups - Independent observations - Normal distribution of differences
Independent-samples t-test assumptions
- Continuous variable and two independent groups - Independent observations - Normal distribution in each group - Equal variances for each group
Correlation Assumptions
- Observations are independent - Both variables are continuous - Both variables are normally distributed - Relationship between the two variables is linear (linearity) - Variance is constant with the points distributed equally around the line (homoscedasticity)
Multiple linear regression
- Observations are independent - Outcome is continuous - Relationship between the outcome and each continuous predictor is linear - Variance is constant with the points distributed equally around the line - Residuals are independent - Residuals are normally distributed - No perfect multicollinearity
Linear regression assumptions
- Observations are independent - Outcome is continuous - Relationship between the two variables is linear (linearity) - Variance is constant with the points distributed equally around the line (homoscedasticity) - Residuals are independent - Residuals are normally distributed
Chi-squared test Assumption
- Variables must be nominal or ordinal (usually nominal) - Expected values should be 5 or higher in at least 80% of groups - Observations must be independent
Measures to help identify outliers and influential observations
- standardized residuals - df-betas - Cook's distance - leverage
Which of the following would be considered a very strong negative correlation? - .89 - -.09 - -.89 - .09
-.89
What percentage of the variance is shared if two variables are correlated at .4? - 40% - 4% - 8% - 16%
16%
How many pairwise comparisons would there be for an ANOVA with four groups? - 16 - 4 - 12 - 6
6
left skewed
A density curve where the left side of the distribution extends in a long tail. (Mean < median.)
Chi-square test
A statistical method of testing for an association between two categorical variables. Specifically, it tests for the equality of two frequencies or proportions.
standard error
An estimate of the standard deviation of the sampling distribution of a statistic.
Graphs for one continuous and one categorical variable
Bar Point Boxplot Violin
Which of the following is appropriate to graph a single categorical variable? - Histogram - Bar chart - Boxplot - Scatterplot
Bar chart
ANOVA post hoc tests
Bonferroni - conducts t-test for each pair of means Tukey's honestly significant - less conservative of a test than Bonferroni
Alternative for failing homogeneity assumption (ANOVA)
Brown-Forsythe Welch's
Correlation effect size
Coefficient of determination r-squared or R^2
T-test Effect Size Statistic
Cohen's d .2 to <.5 = small .5 to <.8 = medium .8+ = large
What is the primary purpose of ANOVA? - Comparing means across three or more groups - Comparing medians across three or more groups - Examining the relationship between two categorical variables - Identifying normally distributed data
Comparing means across three or more groups
What is the primary purpose of the three t-tests? - Comparing means among groups - Comparing medians among groups - Examining the relationship between two categorical variables - Identifying normally distributed data
Comparing means among groups
The results of running R code show in which pane? - Source - Environment - History - Console
Console
Chi-squared effect sizes
Cramer's V Phi coefficient Odds ratio
Which t-test would you use to compare mean BMI in sets of two brothers? - One-sample t-test - Independent-samples t-test - Chi-squared t-test - Dependent-samples t-test
Dependent-samples t-test
Custom functions are useful when doing which of the following? - Loading a library - Visualizing the distribution of one variable - Working with continuous variables - Doing the same thing multiple times
Doing the same thing multiple times
What is the primary purpose of Pearson's and Spearman's correlation coefficients? - Examining the relationship between two noncategorical variables - Identifying deviations from normality for continuous variables - Examining the relationship between two categorical variables - Comparing means across group
Examining the relationship between two noncategorical variables
ANOVA statistic
F(4, 1404) = 43.3; p < .05
Which R data type is most appropriate for a categorical variable? - Numeric - Factor - Integer - Character
Factor
Alternative test for two-way ANOVA
Friedman test
Which of the following is appropriate to graph a single continuous variable? - Waffle chart - Histogram - Bar chart - Pie chart
Histogram
Graphs for single continuous variables
Histogram Density Plot Box Plot
Which of the following assumptions does not apply to all three t-tests? - Independent observations - Normal distribution of continuous variable - Homogeneity of variances - Inclusion of one continuous variable
Homogeneity of variances
Which of the following measures would be most appropriate for describing the spread of a variable that is extremely right-skewed? - Standard deviation - Range - IQR - Mode
IQR
Which of the following assumptions does not apply to ANOVA? - Independent observations - Normal distribution of continuous variables - Homogeneity of variances - Inclusion of one bivariate variable
Inclusion of one bivariate variable
Which of the following is true about the adjusted R2? - It is usually larger than the R2 - It is only used when there is just one predictor - It is usually smaller than the R2 - It is used to determine whether residuals are normally distributed
It is usually smaller than the R2
Alternative for failing the normality assumption (ANOVA)
Kruskhal-Wallis
Graphs for two continuous variables
Line Scatterplot
Data transformations
Linear transformations - keep existing linear relationships between variables, often by multiplying or dividing one or both of the variables by some amount Nonlinear transformations - increase (or decrease) the linear relationship between two variables by applying an exponent (power transformation) or other function to one or both of the variables
Alternative to the independent-samples t-test
Mann-Whitney U Kolmogorov-Smirnov test
When an independent-samples t-test does not meet the assumption of normality, what is an appropriate alternative test? - Sign test - Levene's test - Mann-Whitney U test - Dependent-samples t-test
Mann-Whitney U test
Violating independent observations assumption (Chi-squared)
McNemar's test Cochran's Q-test
Which of the following measures would be most appropriate for describing the central tendency of a variable that is continuous and normally distributed? - Mean - Variance - Median - Mode
Mean
The normal distribution depends on which of the following? - Mean and standard deviation - Sample size and probability of success - Standard deviation and number of successes - Mean and probability of success
Mean and standard deviation
Which of the following is not an assumption for the Pearson's correlation analysis? - Normally distributed variables - Monotonic relationship - Linear relationship - Constant variance
Monotonic relationship
Graphs for two categorical variables
Mosaic Bar
Which of the following is not an assumption for simple linear regression? - Normally distributed variables - Multicollinearity - Linear relationship - Constant variance - Normally distributed residuals
Multicollinearity
Apply a Bonferroni adjustment to a p-value of .01 if the analyses included six pairwise comparisons. If the threshold for statistical significance were .05, would the adjusted p-value be significant? - Yes - No
No
Which of the following is not an assumption for binary logistic regression? - Normally distributed variables - No multicollinearity - Linearity - Independence of observations
Normally distributed variables
Which of the following tests would be used to test the mean of a continuous variable to a population mean? - One-sample t-test - Independent-samples t-test - Chi-squared t-test - Dependent-samples t-test
One-sample t-test
Which test is used to determine whether a correlation coefficient is statistically significant? - Paired samples t-test - Chi-squared test - One-sample t-test - P-value
One-sample t-test
Graphs for single categorical variables
Pie Waffle Bar Point
Which of the following is not a recommended type of graph? - Pie chart - Bar chart - Waffle chart - Density plot
Pie chart
For a categorical predictor in a logistic regression model, what is the group that other groups are compared to called? - Null group - Independent group - Standard group - Reference group
Reference group
The chi-squared distribution often has what type of skew? - Left - Right - It depends - It is not skewed
Right
The binomial distribution depends on which of the following? - Mean and standard deviation - Sample size and probability of success - Standard deviation and number of successes - Mean and probability of success
Sample size and probability of success
Alternative to one sample t-test
Sign test - examines the median instead of the mean
Alternative test for Correlations
Spearman's rho
A significant odds ratio of 2.5 for BMI as a continuous predictor of heart disease in a binary logistic model would indicate which of the following? - The odds of heart disease increase 2.5% for every 1-point increase in BMI. - Those with heart disease have 2.5 times higher odds of having an increasing BMI compared to those without heart disease. - The odds of heart disease are 2.5 times higher for every 1-point increase in BMI. - There are 2.5 times as many people with heart disease as without among those with higher BMI.
The odds of heart disease are 2.5 times higher for every 1-point increase in BMI.
True or False? In R, categorical variables are best represented by the factor data type and continuous variables are best represented by the numeric data type. - True - False
True
Violating expected values assumption (Chi-squared)
Use Fisher's exact test
t-test used when the variances n two groups are unequal
Welch's t-test
In which situation would you use planned comparisons? - After a significant ANOVA to compare each pair of means - Instead of an ANOVA when the data did not meet the normality assumption - When you have to choose between two categorical variables - When you conduct an ANOVA and have hypotheses about which sets of means are different from one another
When you conduct an ANOVA and have hypotheses about which sets of means are different from one another
Alternative to dependent-sample t-test
Wilcoxon signed-ranks test
right skewed
a distribution with a tail that extends to the right (Mean > Median)
Two-way ANOVA
a hypothesis test that includes two nominal independent variables, regardless of their numbers of levels, and a scale dependent variable
monotonic
a relationship that goes in only one direction
Significance for the coefficients (b) is determined by - an F-test. - an R2 test. - a correlation coefficient. - a t-test.
a t-test.
Ordinary least squares
a type of linear least squares method for estimating the unknown parameters in a linear regression model
Durbin-Watson test
can be used to determine whether the model violates the assumption of independent residuals
ANOVA effect size tests
eta-squared omega-squared .01 to <.06 = small .06 to <.14 = medium .14+ = large
Density plots, histograms, and boxplots can all be used to... - examine frequencies in categories of a factor. - examine the relationship between two categorical variables. - determine whether two continuous variables are related. - examine the distribution of a continuous variable.
examine the distribution of a continuous variable
Pearson's partial correlation
examining how multiple variables share variance with each other
Deterministic
have one precise value for y for each value of x
A confidence interval indicates a significant odds ratio when - it includes 1. - it includes 0. - it does not include 1. - it does not include 0.
it does not include 1.
Which of the following opens the ggplot2 library? - install.packages("ggplot2") - library(package = "ggplot2") - summary(object = ggplot2) - open(x = ggplot2)
library(package = "ggplot2")
y = mx+b
m is the slop of the line b is the y-intercept x and y are the coordinates of each point along the line
Computing the percent correctly predicted by the model is one way to determine... - model fit. - model significance. - predictor significance. - if assumptions are met.
model fit.
Platykurtic
normal curves that are short and more dispersed (broader)
Leptokurtic
normal curves that are tall and thin, with only a few scores in the middle of the distribution having a high frequency
In a data frame containing information on the age and height of 100 people, the people are the _____________ and age and height are the _____________. - observations, variables - variables, observations - data, factors - factors, data
observations, variables
Chi-squared is computed by first squaring the differences between... - observed frequencies and expected frequencies. - observed frequencies and the total sample size. - observed frequencies and observed percentages. - expected values and observed percentages.
observed frequencies and expected frequencies.
Which of the following is not an effect size for chi-squared? - Cramér's V - Odds ratio - Phi - p-value
p value
The block of text at the top of a code file that introduces the project is called - library. - summary. - prolog. - pane.
prolog.
Covariance
quantifies whether two variables vary together
Correlation coefficients
r = 0 - no relationship r = .2 - weak relationship r = .5 - moderate relationship r = .8 - strong relationship r = 1 - perfect relationship
F-statistic
ration of explained information (in the numerator) to unexplained information (in the denominator)
Continuous predictors influence the ______ of the regression line, while categorical predictors influence the _____________. - slope, intercept - intercept, slope - R2, p-value - p-value, R2
slope, intercept
omnibus test
tests for an overall effect, but does not provide info on which means are unequal
Residuals
the difference between an observed value of the response variable and the value predicted by the regression line
A sampling distribution shows... - the distribution of means from multiple samples. - the distribution of sample sizes over time. - the distribution of scores in the population. - the distribution of observations from a single sample.
the distribution of means from multiple samples.
The z-score is... - the number of standard errors between the mean and some observation. - the difference between the sample mean and population mean. - the width of the 95% confidence interval. - the number of standard deviations an observation is from the mean.
the number of standard deviations an observation is from the mean.
A mosaic plot is used when graphing... - the relationship between two continuous variables. - the relationship between one continuous and one categorical variable. - the relationship between two categorical variables. - data that are not normally distributed by group.
the relationship between two categorical variables.
standard deviation versus standard error
the standard deviation is a measure of the variability in the sample, while the standard error is an estimate of how closely the sample represents the population
To learn which cells are contributing the most to the size of a chi-squared statistic, compute... - the standardized residuals. - the p-value. - the odds ratio. - Cramér's V.
the standardized residuals.
Wald test
the statistical significance of the slope in linear regression
predicted values
the values of y predicted by the model for a given value of x
Chi-squared can be used to understand the relationship between... - any two variables. - two categorical variables. - two continuous variables. - one categorical and one continuous variable.
two categorical variables.
Stochastic
when you are unable to predict or explain something
In a normal distribution, 95% of observations are... - within one standard deviation of the mean. - included in computing the mean. - within two standard deviations of the mean. - divided by the sample size to get the standard deviation.
within two standard deviations of the mean.
The R2 is the squared correlation of which two values? - y and the predicted values of y - y and each continuous x - b and t - b and se
y and the predicted values of y
Chi-squared statistic
χ² = (3) = 28.95; p < .05