Midterm Exam
The _______ divides the scores so that 50% of the scores in the distribution have values that are equal to or less than it
Median
The _________ is the most frequently occurring category or score in the distribution
Mode
What are the three most commonly used techniques for measuring central tendency?
(1) Mean (2) Median (3) Mode
The sum of squares, variance, and standard deviation all represent the same thing, specifically:
- The "fit" of the mean of the data - The variability in the data - How well the mean represents the observed data - Error
What are the regression assumptions?
-Homoscedasticity -Linearity -Independence of observation (i.e., random sample... and assignment if relevant) -Normality of IV and DV or large N
Why is central tendency important?
-It allows researchers to summarize or condense a large set of data into a single value... -It serves as a descriptive statistic because it allows researchers to describe or present a set of data in a very simplified and concise form
In regression, we have a difference between the ______ versus _________
-Observed (actual) versus predicted; i.e., the observed values based on what they are expected to get --> the discrepancy between what they get versus what they should go
What are the benefits of regression over correlation?
-Regression yields information -Regression allows us to think of relation between 2 variables intuitively in terms of prediction
Homoscedasticity
-When you have the same scatter... it means the variance of the DV is the same for all data... it's used in parametric testing -or homogeneity of variances, is an assumption of equal or similar variances in different groups being compared.
What is a correlation?
A measure of degree to which 2 variables are related to each other
What is standard deviation?
A measure of variation for interval-ratio variables; it is equal to the square root of the variance
What is variance?
A measure of variation for interval-ratio variables; it is the average of the squared deviations from the mean
Which type of graph can we use to compare frequency distributions of several groups simultaneously?
A population pyramid
What is pearson correlation?
A standardized simple linear regression coefficient
Define central tendency
A statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores; aka "average"
The mean, median, and mode are always equal in what?
A symmetrical distribution
Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)? A. All of the options are true. B. Some feature of the data should be normally distributed. C. The samples being tested should have approximately equal variances. D. The data should be at least interval level.
A. All of the options are true
What is the assumption of independence?
All values of the outcome should come from a different source
Which of the following best describes the variable 'Gender'? A. A between-group variable. B. A coding variable. C. All of the possible answers are correct. D. A grouping variable.
C. All of the possible answers are correct
Ordinal level data are characterized by:
Data that can be meaningfully arranged by order of magnitude
For what is the 'variable view' in IBM SPSS's data editor used?
Defining characteristics of variables
How do you report a multiple regression analysis?
Example: F (3, 196) = 129.50, p < .001 -The F value is given in the ANOVA table -R squared = variance accounted for (make it into a percentage) *All predictors were found to be statistically significant in predicting _______...
Frequency distributions are also known as __________
Histograms
What is homoscadacity?
Homoscedasticity means a situation in which the variance of the dependent variable is the same for all data... it facilitates analysis b/c most methods are based on the assumption of equal variance; it also means "having the same scatter"
What is an advantage of the median?
It is relatively unaffected by extreme scores
What is a boxplot and what does it do?
It's a graphic device that visually presents the range, the inter-quartile range, the median, the quartiles, the minimum (lowest value), and the maximum (highest value)
Why is homoscedasticity important?
It's an important assumption of parametric statistical tests because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and skewed test results.
When there are "heavy" tails, what kind of kurtosis is this?
Leptokurtic
The __________ is the balance point of the distribution becausse the sum of the distances below it is exactly equal to the sum of the distances above it
Mean
What are measures of variability?
Numbers that describe diversity or variability in the distribution
When there are "light" tails, what kind of kurtosis is this?
Platykurtic
In a regression equation, y = what?
Predicted value of the DV
___________ involves fitting a line through the scatter of points
Regression
___________ measures the standard distance between a score and the mean
Standard deviation
What does "normality of data" mean?
That the data follows a normal distribution (aka- a bell curve); this assumption applies only to quantitative data
When should you use a t-test?
The Independent Samples t Test is commonly used to test the following: Statistical differences between the means of two groups. Statistical differences between the means of two interventions. Statistical differences between the means of two change scores.
Which measure of central tendency is the least affected by outliers?
The Median
A measure of variation in interval-ratio variables; the difference between the highest (maximum) and the lowest (minimum) scores in the distribution
The Range
In a regression equation, x = what?
The actual value of the IV
In a regression equation, e = what?
The error of disturbance; the amount Y not accounted for by a and bX
What does the term "kurtosis" refer to?
The heaviness of the tails
In a regression equation, a = what?
The intercept or y-intercept
What is the most common measure of central tendency?
The mean
What does the assumption of "independence of observations" mean?
The observations/variables you include in your test are not related
What's the coefficient of determination?
The proportion of variance accounted for in y determined by x
What is the assumption of linearity?
The relationship we model is, in reality, linear
If we were to pull all possible samples from a population, calculate the mean for every sample, and construct a graph of the shape of the distribution based on all of the means, what would we have?
The sampling distribution of the mean
In a regression equation, b = what?
The slope, or rather, the amount that Y changes for each one unit change in X
What does the term "skew" refer to?
The symmetry of the distribution
If you see a straight line when running a regression test, what does that mean?
There's no relationship between the variables... you want to see paint splatter
What does "homogeneity of variance" mean?
This is when the variance within each group being compared is similar among all groups; if one group has more variation than others, it will limit the test's effectiveness
What makes regression different from correlation?
We are looking to predict the value of the DV (Y) from the actual value of the IV (X)
What is multi-colinearity?
When 2 or more IV's are highly correlated with one another in a regression model... that is, an IV can be predicted from another IV in that regression model; multicollinearity is a problem b/c it undermines the statistical significance of an IV
Multicollinearity
When 2 or more IV's are highly correlated with one another in a regression model; makes it difficult to interpret the model
When isn't the mean the best option?
When a distribution contains a few extreme scores (or is very skewed), the mean will be pulled towards the extremes (displaced toward the tall)... in this case, the mean will not provide a "central" value
When do we look at central limit theorem?
When assessing/testing the assumption of normality
When can you use a correlation or regression?
When examining 2 continuous variables
What is the equation for regression?
Y = a + bX + e