STAT-150 Mid Term
sigma
sigma notation explanation
unsystematic variability
the variation (error) not due to the variables tested, but due to something else that may have affected the outcome
measurement error (error variance)
the variation of a number around its true mean due to uncontrolled, essentially random influences
STAT Value
total tested / error
Which of the numbers below might IBM SPSS report as 10.574 E−05? 1. 0.00010574 2. 10.569 3. 1057400.0 4. 0000.10574
1. 0.00010574
Which of the numbers below might IBM SPSS report as 10.574 E−05? 1. 0.00010574 2. 10.569 3. 1057400.0 4. 0000.10574
1. 0.00010574
The degree to which a statistical model represents the data collected is known as the: 1. Fit 2. Homogeneity 3. Reliability 4. Validity
1. Fit
Out of the following options, which type of graph could we use to compare frequency distributions of several groups simultaneously? 1. Population pyramid 2. Simple histogram 3. Frequency polygon 4. Simple 3-D bar chart
1. Population pyramid
If we use the mean as a model, what does the variance represent? 1. The average error between the model and the observed data. 2. The total error between the model and the observed data. 3. The squared total error between the model and the observed data. 4. The square-rooted average error between the model and the observed data.
1. The average error between the model and the observed data.
Differences between group means can be characterized as a regression (linear) model if: 1. The experimental groups are represented by a binary variable (i.e. coded 0 and 1). 2. The outcome variable is categorical. 3. The groups have equal sample sizes. 4. Differences between group means cannot be characterized as a linear model, they must be analyzed with an independent t-test.
1. The experimental groups are represented by a binary variable (i.e. coded 0 and 1).
What is b0 in regression analysis? 1. The value of the outcome when all of the predictors are 0. 2. The relationship between a predictor and the outcome variable. 3. The value of the predictor variable when the outcome is zero. 4. The gradient of the regression line.
1. The value of the outcome when all of the predictors are 0.
A correlation of .7 was found between time spent studying and percentage on an exam. What is the proportion of variance in exam scores that can be explained by time spent studying? 1. .70 2. .49 3. .30 4. .7
2. .49
Given a test is normally distributed with a mean of 30 and a standard deviation of 6: • What is the probability that a single score drawn at random will be greater than 34? • What is the probability that a sample of 9 scores will have a mean greater than 34? • What is the probability that the mean of a sample of 16 scores will be either less than 28 or greater than 32? 1. 0.0228 2. 0.2524 3. 0.1826
2. 0.2524
Which of the numbers below might IBM SPSS report as 8.96 E+03? 1. 89.60 2. 8960.0 3. 0.008960 4. 8.960
2. 8960.0
Out of the following options, which type of bar chart would we produce to look at the mean ratings of 'taste' and 'value for money' for two new varieties of Sauvignon Blanc wine? 1. Clustered bar chart 2. All of the options are possible. 3. Stacked bar chart 4. 3-D bar chart
2. All of the options are possible.
A researcher measured people's physiological reactions to horror films. He split the data into two groups: males and females. The resulting data were normally distributed and men and women had equal variances. What test should be used to analyze the data? 1. Dependent 2. Independent t-test 3. Mann-Whitney test 4. Wilcoxon signed-rank test
2. Independent t-test
Which of the following statements about outliers is not true? 1. Outliers are values very different from the rest of the data. 2. Influential cases will always show up as outliers. 3. Outliers have an effect on the mean. 4. Outliers have an effect on regression parameters.
2. NOT TRUE: Influential cases will always show up as outliers.
A researcher was interested in stress levels of lecturers during lectures. She took the same group of 8 lecturers and measured their anxiety (out of 15) during a normal lecture and again in a lecture in which she had paid students to be disruptive and misbehave. What test is best used to compare the mean level of anxiety in the two lectures? 1. Independent samples t-test 2. Paired-samples t-test 3. One-way independent ANOVA 4. Mann-Whitney test
2. Paired-samples t-test
Variation due to some genuine effect is known as: 1. Unsystematic variation 2. Systematic variation 3. Homogeneous variance 4. Residual variance
2. Systematic variation
If we were to pull all possible samples from a population, calculate the mean for every sample, and construct a graph of the shape of the distribution based on all of the means, what would we have? 1. The population distribution of the mean 2. The sampling distribution of the mean 3. The bootstrap distribution of the mean 4. The standard error of the mean
2. The sampling distribution of the mean
The owner of the large chain of coffee shops called 'MoonBucks' decided to calculate how much revenue was gained from lattes each month in a nationwide sample of 2445 cafés. To measure the variance of revenue gained from lattes, he computes SS = 351,936 for this sample. • What are the degrees of freedom for variance? • Compute the variance. • Compute the standard deviation. 1. 144 2. 12 3. 2444
3. 2444
How much variance has been explained by a correlation of .9? 1. 18% 2. 9% 3. 81% 4. None of these
3. 81%
Which type of graph can we use to compare frequency distributions of several groups simultaneously? 1. A histogram 2. A bar chart 3. A population pyramid 4. A boxplot.
3. A population pyramid
Which of the following best describes the variable 'Gender'? 1. A between-group variable. 2. A coding variable. 3. All of the possible answers are correct. 4. A grouping variable.
3. All of the possible answers are correct.
Which of the following best describes the variable 'Gender'? 1. A between-group variable. 2. A coding variable. 3. All of the possible answers are correct. 4. A grouping variable.
3. All of the possible answers are correct.
When items on a questionnaire appear to correspond to the construct that the questionnaire claims to measure it is said to have: Answer choices 1. Factorial validity 2. Ecological validity 3. Content validity 4. Criterion validity
3. Content validity
Ordinal level data are characterized by: 1. Equal intervals between each adjacent score. 2. A fixed zero. 3. Data that can be meaningfully arranged by order of magnitude. 4. None of the above.
3. Data that can be meaningfully arranged by order of magnitude.
If we calculated an effect size and found it was r = .42 which expression would best describe the size of effect? 1. Small 2. Small to medium 3. Medium to large 4. Large
3. Medium to large
Which of the following statistical tests allows causal inferences to be made? 1. Analysis of variance 2. Regression 3. None of these, it's the design of the research that determines whether causal inferences can be made. 4. t-test
3. None of these, it's the design of the research that determines whether causal inferences can be made.
Which of the following is the least affected by outliers? 1. The range 2. The mean 3. The median 4. The standard deviation
3. The median
What symbol represents the test statistic for the Mann-Whitney test? 1. Ws 2. T 3. U 4. H
3. U
If Pearson's correlation coefficient between stress level and workload is .8, how much variance in stress level is not accounted for by workload? 1. 20% 2. 2% 3. 8% 4. 36%
4. 36%
For what is the 'variable view' in IBM SPSS's data editor used? 1. Entering data. 2. Writing syntax. 3. Viewing output from data analysis. 4. Defining characteristics of variables.
4. Defining characteristics of variables.
For which regression assumption does the Durbin-Watson statistic test? 1. Linearity 2. Homoscedasticity 3. Multicollinearity 4. Independence of errors
4. Independence of errors
What does the error bar on an error bar chart represent? 1. The confidence interval around the mean. 2. The standard error of the mean. 3. The standard deviation of the mean. 4. It can represent any of these.
4. It can represent any of these.
An experimenter measured 30 children's IQ. He then rank-ordered the children and assigned them a score from 30 (most intelligent) to 1 (least intelligent) to create a new variable. Does this new variable consist of: 1. Nominal data 2. Interval data 3. Ratio data 4. Ordinal data
4. Ordinal data
Which of the following options, which type of bar chart would we produce to look at the mean ratings of two new varieties of Sauvignon Blanc (wine)? 1. Clustered bar chart 2. Stacked bar chart 3. Simple 3-D bar chart 4. Simple bar chart
4. Simple bar chart
Which of the following is not a transformation that can be used to correct skewed data? 1. Log transformation 2. Square root transformation 3. Reciprocal transformation 4. Tangent transformation
4. Tangent transformation
A researcher was interested in stress levels of lecturers during lectures. She took the same group of 8 lecturers and measured their anxiety (out of 15) during a normal lecture and again in a lecture in which she had paid students to be disruptive and misbehave. The data were not normally distributed. Which test should she use to compare her experimental conditions? 1. Paired samples t-test 2. Mann-Whitney test 3. Wilcoxon rank-sum test 4. Wilcoxon signed-rank test
4. Wilcoxon signed-rank test
platykurtic distribution
A less peaked distribution that indicates that more returns with large deviations from the mean have occurred, or are expected to occur, than with a normal distribution. (Plat is Flat).
central tendency
A measure that represents the typical response or the behavior of a group as a whole. Influenced quite heavily by extreme values.
leptokurtic distribution
A more peaked distribution that indicates that more returns are clustered around the mean than with a normal distribution. (Lepto=Less Risky)
standard normal distribution
A normal distribution with a mean of 0 and a standard deviation of 1.
Null Hypothesis (H0)
A statement of "no difference."
dependent variable
A variable thought to be affected by changes in an independent variable. You can think of this variable as an outcome.
independent variable
A variable thought to be the cause of some effect. This term is usually used in experimental research to describe a variable that the experimenter has manipulated.
outcome variable
A variable thought to change as a function of changes in a predictor variable. For the sake of an easy life this term could be synonymous with 'dependent variable'.
predictor variable
A variable thought to predict an outcome variable. This term is basically another way of saying 'independent variable'.
interval variable
Equal intervals on the variable represent equal differences in the property being measured (e.g., the difference between 6 and 8 is equivalent to the difference between 13 and 15).
Variability
The extent to which the scores in a data set tend to vary from each other and from the mean.
ratio variable
The same as an interval variable, but the ratios of scores on the scale must also make sense (e.g., a score of 16 on an anxiety scale means that the person is, in reality, twice as anxious as someone scoring 8). For this to be true, the scale must have a meaningful zero point.
binary variable
There are only two categories (e.g., dead or alive).
T/F: If the variables are correlated, when given a measurement of one variable, we can predict the value of the other.
True. But this does not mean we can change the outcome by removing or changing the variable. This is simply an observation.
The smaller the p-value, the better.
True. Under .05, you reject the null-hypothesis and can state your research was statistically significant.
standard deviation
a computed measure of how much scores vary around the mean score
continuous variable
a quantitative variable that has an infinite number of possible values that are not countable
positive correlation
a relationship between two variables in which both variables either increase or decrease together
representative sample
a sample that accurately reflects the characteristics of the population as a whole
method of least squares
a statistical way to find the best-fitting line through a set of data points
extaneous variable
all variables, which are not the independent variable, but could affect the results of the experiment
alpha
alpha parameter set by researcher to reject or accept null value
confounding variable
extraneous factor that interferes with the action of the independent variable on the dependent variable "extra variable"
kurtosis
how flat or peaked a normal distribution is
estimations of likelihood
judging how likely it is that something will occur
slope and intercept calculation
linear regression slope and intercept calculation is Rise / Run
systematic variability
outcome is dependent to the variables (or experiment) tested
random assignment
placing research participants into the conditions of an experiment in such a way that each participant has an equal chance of being assigned to any level of the independent variable
experimental research
research designed to discover causal relationships between various factors
RMSE
root mean squared error
variance
sigma squared is variance of the entire population
confidence interval
statistical range, with a given probability, that takes random error into account
p-value < 0.05
statistically significant
Sum of Squares (SS)
sum of squared deviations from the mean
sum of squared errors
sum of the squared differences between each predicted score and actual score on the criterion variable
What does correlational research measure?
the degree of relationship between two or more variables.
slope and line intercept with predictor
the difference between our actual value or Yi and our fitted (or predicted) values of Y(hat)i is called the residuals (or errors) ei
deviance (error)
the distance of each score from the mean
reliability
the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternate forms of the test, or on retesting
control group
the group that does not receive the experimental treatment.
Research Hypothesis (H1)
the hypothesis that the experiment was designed to investigate
Dispersion
the pattern of spacing of a population within an area
negative correlation
the relationship between two variables in which one variable increases as the other variable decreases
Central Limit Theorem (CLT)
the sampling distribution derived from a simple random sample will be approximately normally distributed
standard error
the standard deviation of a sampling distribution
correlation coefficient
will tell the strength and direction of the relationship
T/F: The smaller the stat value, the better.
False. The bigger the stat value, the better chance that you will get the same result if the test were run again.
nominal variable
There are more than two categories (e.g., whether someone is an omnivore, vegetarian, vegan, or fruitarian).
normal distribution
a bell-shaped curve, describing the spread of a characteristic throughout a population
ordinal variable
a qualitative variable that incorporates an ordered position, or ranking
Parameter
(n.) a determining or characteristic element; a factor that shapes the total outcome; a limit, boundary
Twenty-one cats were given 300g of tuna each. The time in seconds was measured until they had eaten all of the tuna: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57 • Compute the median. • Compute the lower quartile. • Compute the upper quartile. • Compute the interquartile range. 1. 32 seconds 2. 22.5 seconds 3. 42.5 seconds 4. 20 seconds
1. 32 seconds
Rank the score of 5 in the following set of scores: 9, 3, 5, 10, 8, 5, 9, 7, 3, 4 1. 4.5 2. 4 3. 3 4. 6
1. 4.5
Approximately what percentage of people would have scores lower than an individual with a z-score of 1.65 in a normally distributed sample? 1. 95% 2. 98% 3. It is not possible to calculate this unless the mean and standard deviation are given. 4. 1%
1. 95%
Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)? 1. All of the options are true. 2. Some feature of the data should be normally distributed. 3. The samples being tested should have approximately equal variances. 4. The data should be at least interval level.
1. All of the options are true.
The covariance is: 1. All of these. 2. A measure of the strength of relationship between two variables. 3. Dependent on the units of measurement of the variables. 4. An unstandardized version of the correlation coefficient.
1. All of these.
Assuming the assumptions of parametric tests are met, non-parametric tests, compared to their parametric counterparts: 1. Are all of these. 2. Are more conservative. 3. Are less likely to accept the alternative hypothesis. 4. Have less statistical power.
1. Are all of these.
R2 is known as the: 1. Coefficient of determination. 2. Multiple correlation coefficient. 3. Partial correlation coefficient. 4. Semi-partial correlation coefficient.
1. Coefficient of determination.
Which f the following does a box-whisker plot not display? 1. The mean 2. The median 3. Outliers
1. The mean
A researcher measured the same group of people's physiological reactions while watching horror films and compared them to when watching erotic films. The resulting data were skewed. What test should be used to analyze the data? 1. Independent t-test 2. Wilcoxon signed-rank test 3. Dependent (related) t-test 4. Mann-Whitney test
2. Wilcoxon signed-rank test
What is the relationship between the sum of squared errors (SS), the sample size (n) and the variance (s2)? 1. SS = s2/(n - 1) 2. s2 = SS(n - 1) 3. n = (s2/SS) - 1 4. s2 = SS/(n - 1)
2. s2 = SS(n - 1)
A researcher measured the same group of people's physiological reactions while watching horror films and compared them to when watching erotic films, and a documentary about wildlife. The resulting data were skewed. What test should be used to analyze the data? 1. Independent analysis of variance 2. Repeated-measures analysis of variance 3. Friedman's ANOVA 4. Kruskal-Wallis test
3. Friedman's ANOVA
The t-test tests for: 1. Differences between means 2. Whether a correlation is significant 3. Whether a regression coefficient is equal to zero 4. All of these
4. All of these