psyc 2016 exam 2
t distribution
It is like a normal distribution, but with greater variability in the tails (tails in a t distribution are thicker)
APA Format Example (One-Sample t Test):
Social functioning scores among relatives who care for patients with OCD (M = 62.00) were significantly lower than scores in the general healthy population t(17) = −3.126, p < .05
Standard Error Equation: Standard error of a sampling distribution of sample means
Standard error of a sampling distribution of sample means =
t Test Steps:
Step 1: State the hypotheses Step 2: Set the criteria for making a decision Step 3: Compute the test statistic Step 4: Make a decision
Example of a Null Hypothesis:
The average consumer spends roughly 120 min (or 2 hr) a day on social media
covariance
The extent to which the values of two factors vary together
Experimental Sampling means...
The order of selecting individuals does not matter and individual selected is not replaced before selecting again
Experimental Sampling equation to determine the number of samples of any size that can be selected from a population:
Total number of samples possible = N! / n! (N-n)!
Theoretical Sampling equation to determine the number of samples of any size that can be selected from a population :
Total number of samples possible = N^n
If we had a population with a negative skew, how would our distribution of sample means be shaped?
Normally distributed
All correlation coefficients are derived from the __________ correlation.
Pearson
Central Limit Theorem
- The mean calculated across all samples would be normally distributed. - Regardless of the distribution of scores in a population, the sampling distribution of sample means selected from that population will be approximately normally distributed.
T compute a z-obt score the ____________________ must be known so we can calculate standard error of the mean.
Population variance (σ^2)
Example of an Alternative Hypothesis:
Pre-millennial consumers use more or less than 120 min (or 2 hr) a day on social media
Steps to use z-score to find the proportions/area/probability
Raw score → z-score → Unit normal table
strength sizes
.1 = small .3 = moderate .5 = large - Magnitude of the relation does not depend on the direction (positive or negative)
How many experimental samples of 7 could we have from a population of 10?
10! / 7!(10-7)! = 10x9x8x7x6x5x4x3x2x1 / (7x6x5x4x3x2x1)(3x2x1) = 120 possible experimental samples
In the general healthy population, IQ scores are normally distributed with
100 ± 15 (μ ± σ)
If the mean of a population is 4, what would we expect the average mean of our samples to be?
4 is the average mean of our samples Mean of a population = average mean of samples
What does 4! equal?
4! = 4 x 3 x 2 x 1 = 24
The Empirical Rule
At least 95% of possible Ms one could select fall within 2 SDs of μ (68%, 95%, 99.7%)
Two-tailed vs. One-tailed
Each tail (2) = α/2 tail (1) = α tail (1) = α
Steps to find raw score starting with a proportion/percentage/area of interest
Unit normal table → z-score → raw score
Do we know the population variance in a z-test?
Yes
Limitations in Interpretation: Outliers
a score that falls substantially above or below most other scores in a data set. ◦ Outliers can obscure the relationship between two factors by altering the direction and strength of an observed correlation.
Step 4 of Hypothesis Testing: Making a decision is based on ...
based on the probability (p) of obtaining a sample mean, given that the value stated in the null is true
correlation coefficient (phi)
both factors are dichotomous (nominal data)
Omega-squared gives a more
conservative estimate
When plotted in a graph, scores are more ________________ the closer they fall to a line.
consistent
The extent to which values of X and Y vary independently, or separately, is placed in the __________
denominator
In a correlation, we treat each factor like _________________________________ and measure the _______________________ between the pair.
dependent variable; relationship
The df for a t distribution are ___________ to the df of the sample variance: ________
equal; n - 1
As n increases, the probability of outcomes in the tails become ______ likely and the tails approach the x-axis faster
less
Alpha level
level of significance or criterion of a hypothesis test - researchers control for type 1 error by stating a level of significance (alpha level) - the significant level is the largest probability of committing type 1 error that we will allow and still decide to reject the null hypothesis - Criterion is usually set at .05. - α level is compared to the p value in making a decision
Zero correlation means there is no ________________________________ between two factors (i.e., _____________________)
linear pattern; independence
the unit normal table can be used to
locate scores that fall within a given proportion or percentile
The degrees of freedom (df) for a correlation is
n -2
can we infer causality based on correlation alone?
no
Eta-squared tends to
overestimate proportion of variance explained by treatment
The mean of a distribution of sample means is the...
population mean
samples are selected to learn more about _____
populations
Types of Correlation
positive, negative, none
Interpreting correlation coefficients: r = .6 r = -.9 r = .1 r = -.35
r = .6 ◦ Moderate to strong positive correlation r = -.9 ◦ Strong negative correlation r = .1 ◦ Weak positive correlation r = -.35 ◦ Moderate negative correlation
Due to natural sampling variability, the sample mean (center of the CI) will vary from ______________________________________________.
sample to sample
As the n increases, _______________ more closely resembles population variance
sample variance
Sample Design
specific plan or protocol for how individuals will be selected or sampled from a population of interest.
What is the denominator of the test statistic in a z-test?
standard error
correlation
statistical procedure used to describe the strength and direction of the linear relationship between two factors
What is the obtained value of a t Test?
t-statistic, p-value
Assumption of Linear Correlations: Homoscedasticity
the assumption that there is an equal ("homo") variance or scatter ("scedasticity") of data points dispersed along the regression line
Sampling Error
the extent to which sample means selected from the same population differ from one another
Hypothesis Testing
the method for testing a hypothesis about a population parameter, using sample statistics
retain the null
the sample mean is associated with a high probability of occurrence when null is true (failed to reach significance) - p value >.05 (or the set level of significance); failed to reach significance
reject the null
the sample mean is associated with a low probability of occurrence if the null is true (reached significance) - p value <.05 (or the set level of significance); reached significance
The Standard Error of the Mean is...
the standard deviation of the distribution of the sample means
to avoid bias, what do researchers do?
they use a random procedure to select a sample from the population
A distribution of sample means has minimum
variance
when your alpha is set at 0.5, your z scores (in a one-tailed directional test) are always gonna be..
z = 1.645 or z = -1.645
Sampling Distributions
- A distribution of all possible sample means or variances that could be obtained in samples of a given size from the same population.
confidence interval (CI)
- A range of values that's likely to include a population value with a certain degree of confidence. - It is often expressed as a % whereby a population mean lies between an upper and lower interval
What is a distribution of Sample Means? Is it accessible? What is the shape?
- All possible sample means that can be selected, given a certain sample size. - Yes - Normal distribution
Factors That Decrease Standard Error
- As the population standard deviation decreases, standard error decreases - As sample size increases, standard error decreases
Example of Step 1 of Hypothesis testing: Research question: Is the average dose of Advil smaller than advertised? H0: ? HA: ?
- H0 : The average dose of Advil is 200 mg (M = 200 mg) - HA : The average dose of Advil is lower than 200 mg (M < 200 mg)
Example of Step 1 of Hypothesis testing: Research question: Is the average dose of Advil different than advertised? H0 : ? HA: ?
- H0 : The average dose of Advil is 200 mg (M = 200 mg) - HA : The average dose of Advil is not 200 mg (M ≠ 200 mg)
Null and Alternative Hypotheses
- H0 = "H naught" = null hypothesis - considered the status quo assumption - what the world thinks is true - HA/H1 = alternative hypothesis - a challenge to the status quo - what may actually be true
When will sampling order not matter?
- If order does not matter, then the letters are viewed as the same despite being in a different order.
What does the standard error of the mean measure...
- It measures the average distance between a sample mean and the population mean - Provides a measure of how accurately, on average, a sample mean represents its corresponding population means.
What is a sample distribution? Is it accessible? What is the shape?
- Scores of a select portion of persons from the population - Yes - Could be any shape
What is a population distribution? Is it accessible? What is the shape?
- Scores of all persons in a population - Typically, no - Could be any shape
What the 3 steps to estimate the value of a population mean using a point estimate and interval estimate?
- Step 1: Compute the sample mean and standard error - Step 2: Choose the level of confidence and find the critical values at that level of confidence - Step 3: Compute the estimation formula to find the confidence limits
Standard Normal Transformations: The Steps to locate the proportion and therefore probability of obtaining a sample mean:
- Step 1: Transform a sample mean (M) into a z score - Step 2: Locate the corresponding proportion for the z score in the unit normal table
What two strategies does answering both questions lead to in a Sample Design?
- Theoretical Sampling - Experimental sampling
When will sampling order matter?
- When means sample A, C, D and sample C, A, D are considered two different samples despite having the same letters
Sampling Without Replacement
- When sampling, each participant selected is not replaced before the next selection - (most common method used in behavioral research)
Cohen's d
- a measure or the number of standard deviations an effect shifted above or below the population mean stated in the null hypothesis. - We only report Cohen's d when the null hypothesis is rejected
directional one-tailed tests (HA: >) or (HA:<)
- alternative hypothesis is stated as a greater(>) or less than (<) null - upper tail critical tests (HA > H0) - lower tail critical tests (HA < H0)
non-directional two-tailed tests (HA: ≠)
- alternative hypothesis is stated as not equal to (≠) the null - interested in any alternative from the null hypothesis
Treatment
- any unique characteristic of a sample or any unique way that a researcher treats a sample ◦ Can change value of a dependent variable ◦ Associated with variability in a study
estimated Cohen's d
- measure of effect size in terms of the number of SDs that mean scores shift above or below the population mean stated by the null hypothesis - The larger the value of d, the larger the effect
Proportion of variance
- the measure of effect size in terms of the proportion or percent of variability in a dependent variable that can be explained or accounted for by a treatment -variability explained / total variability
the odds of a sample with replacement vs sample without replacement
- the odds of selection stay constant for each selection (repeats) - the odds of selection increase each round of selection (A, B, B repeats) vs (A, D, C) (no repeats)
level of significance is based on...
- the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true - when the probability of obtaining a sample mean is less than 5%, we reject the null - if it is greater than 5%, we accept the null
Rejection Region (non-directional hypothesis)
- the rejection area is the area of a/2 at each tail - when a = 0.5, we are looking for means in the extreme 2.5% for each tail (5% total), to reject the null hypothesis - If the probability is less than 5% (p < .05), we reject the null hypothesis.
Step 3 of Hypothesis Testing: Computing the statistic
- the value of the test statistic can be used to make a decision regarding the null hypothesis. - helps determine how likely the sample outcome is if the population mean stated in the null is true - The larger the value of the test statistic, the further a sample mean deviates from the population mean stated in the null
p-values are usually set at what percentages?
- traditionally, we have to set the p-value to be 0.5 (5%) but in more recent years there has been a push to lower this number to p < .01 (1%) or even p < .001 (0.01%) - in other words, we are less likely to mistakenly reject the null hypothesis if we set the level of significance lower a = 0.05 (most common) a = 0.01 a = 0.001
Type 3 errors
- type of error for one-tailed tests - occurs when we retain the null because rejection region was located in wrong tail
Correlation coefficient (r)
- used to measure the strength and direction of the linear relationship, or correlation, between two factors ◦ The value of r ranges from −1.0 to +1.0 • Values closer to ±1.0 indicate stronger correlations • A value of 0 indicates no relationship - The sign of the correlation coefficient (− or +) indicates only the direction or slope of the correlation
one-sample test
- used to test the hypothesis concerning the mean in a single population with known population variance. - Can be directional or non-directional
Step 2 of Hypothesis Testing: Set the criteria for a decison
- what is the threshold for rejecting/accepting the null hypothesis - done by stating the level of significance - criterion of judgement upon which a decision is made regarding the value stated in the null hypothesis
The word "smaller" (or larger) indicates ____________ of the difference. In this case, we are saying we expect the mean to be ________ _________ 200 (not equal to or greater). The _________________ of the hypothesis ________ specified.
direction; less than; directionality; IS
to be selected at random, all individuals in a population must have an _________
equal chance of being selected/probability of selecting each participant must be the same
what is the denominator of the test statistic in a t-test?
estimated standard error
"_________" is the term we will use to say we are testing the hypothesis using confidence intervals
estimation
You can use _______________ as an alternative to the t test
estimation
The word "different" indicates that the mean can be either ___________ or ______________ to confirm the alternative hypothesis. No ______________________ specified.
higher; lower; directionality
Law of large numbers
increasing the number of observations or sample size will decrease standard error. - The smaller the SE, the closer a distribution of sample means will be from the population mean
In _________ statistics researchers select a sample of data from a much larger population.
inferential
The tails of a t distribution are...
...thicker, which reflects the greater variability in values resulting from not knowing the population variance.
Step 4 of Hypothesis Testing: What are 4 possible outcomes when making a decision?
1) retain the null is correct 2) retin the null is incorrect (type 2 or B error) 3) reject the null is is correct 4) reject the null is incorrect (type 1 error)
What are the 4 steps to hypothesis testing?
1) state the null and alternative hypothesis 2) set the criteria for a decision (find the critical test value) 3) compute the test statistic 4) make a decision to accept or reject the null hypothesis and interpret
Hypothesis Testing and Sampling Distributions: To locate the probability of obtaining a sample mean in a sampling distribution, we must know . . .
1. The population mean 2. The standard error of the mean
Experimental Sampling: If we select as many samples of two participants as possible (n = 2) from a population of three people (N = 3):
3! / 2!(3-2)! = 3x2x1 / 2x1x1 = 3 samples
How many experimental samples of 4 could we have from a population of 6?
6! / 4!(6-4)! = 6x5x4x3x2x1 / (4x3x2x1)(2x1) = 15 possible experimental samples
What does 9! equal?
9! = 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 362880
One-sample t test
A statistical procedure used to test hypotheses concerning a single group mean in a population with an unknown variance.
Coefficient of determination ( r2 or R2)
mathematically equivalent to eta squared, and is used to measure the proportion of variance of one factor (Y) that can be explained by known values of a second factor (X)
if the population of interest is small, accessible, and willing, researchers _____ be able to conduct population-level research
may
z transformation is used to determine the likelihood of...
measuring a particular sample mean from a population with a given mean and variance
The confidence is in the _____________, not in a particular CI. If we repeated the sampling method many times, approximately ____ of the intervals constructed would capture the true population mean. Therefore, as the sample size _________, the range of interval values will __________, meaning that you know the mean with much more accuracy compared with a smaller sample.
method; 95%; increases; narrow
What distribution is used to locate the probability of obtaining a sample mean in a z-test?
normal distribution
The value in the ________ reflects the extent to which values on the x-axis (X) and y-axis (Y) vary together
numerator
perfect correlation
occurs when each data point falls exactly on a straight line
Types of errors (1 and 2)
occurs when the conclusion to retain or reject the null is incorrect
correlation coefficient (point-biserial)
one factor is dichotomous (nominal data) and other factor is continuous (interval or ratio data)
What is the goal of our hypothesis tests?
Determine whether our observed difference - between our sample mean and what we expect from the population - is large enough that we can rule out random chance as the cause.
Effect sizes
Effect sizes are a measure of how important the difference between group means is
Types of Errors (Chart)
Look at Image
Estimation Formulas: One-sample t
Look at image
How many theoretical samples of 4 could we have from a population of 20?
N^n = 20^4 = 160,000 samples
Theoretical Sampling: If we had samples of two participants (n = 2) from a population of 3 people (N = 3). What is the total number of samples possible?
N^n = 3^2 = 9 samples
Are degrees of freedom required for a z-test?
No, because the population variance is known.
Do we know the population variance in a t-test?
No. The sample variance is used to estimate the population variance
Guess the Direction: As you drink more coffee, the number of hours you stay awake increases. The longer someone runs, the less energy they have. The more gasoline you put in your car, the farther it can go. As snowfall totals increase, the number of people driving decreases. As the temperature decreases, the speed at which molecules move decreases. As student absences increase, student exam grades decrease.
Positive; negative; positive; negative; positive; negative
What do the z-test and t-test measure?
The probability of obtaining a measured sample outcome
how is the estimated standard error acceptable?
This is acceptable because sample variance is an unbiased estimator of the population variance.
t statistic is used to...
Used to determine the number of standard deviations in a t distribution that a sample deviates from the mean value or difference stated in the null
Standard Error Equation: Variance of a sampling distribution of sample means
Variance of a sampling distribution of sample means =
What does the factorial sign mean?
We are going to take the product of a number by every number between that number and zero (not including 0)
How is the estimated standard error formed?
We substitute the population variance with the sample variance in the formula for standard error. This substitution, called the estimated standard error, is the denominator of the test statistic for a t test
What can be inferred from the z-test and t-test?
Whether or not the null hypothesis should be rejected.
Assumption of Linear Correlations: Normality rules
Xand Y scores for two factors form a bivariate normal distribution and: ◦ 1. The population of X scores is normally distributed ◦ 2. The population of Y scores is normally distributed ◦ 3. For each X score, the distribution of Y scores is normally distributed ◦ 4. For each Y score, the distribution of X scores is normally distributed
Are degrees of freedom required for a t-test?
Yes. The degrees of freedom for a t test are equal to the degrees of freedom for sample variance for a given sample: n - 1
what is estimated standard error?
an estimate of the standard deviation of a sampling distribution of sample means selected from a population with an unknown variance. ◦ It is an estimate of the standard error or standard distance that sample means deviate from the value of the population mean stated in the null hypothesis.
normality
assume data in the population being sampled is normally distributed
independence
assume that probabilities of each observation in a study are independent
random sampling
assume that the data were obtained using a random sampling procedure
Assumption of Linear Correlations: Linearity
assumption that the best way to describe a pattern of data is using a straight line
correlation coefficient (pearson)
both factors are interval or ratio data
correlation coefficient (spearman)
both factors are ranked or ordinal data
The distribution of sample means follows the
central limit theorem
Example of correlation
class attendance (number of classes attended) and class performance (exam score)
We can actually use __________________________ as a short cut for two-tailed hypothesis testing.
confidence intervals
Measure mean and variance in __________ to gauge value for mean and variance in ____________.
sample; population
Experimental Sampling is a ______.
sampling strategy most commonly used in experimental research (unordered, sampling without replacement)
Theoretical sampling is a ______.
sampling strategy used in development of statistical theory ( ordered, sampling with replacement)
A correlation is typically illustrated through a ________________________________.
scatter plot - illustrate the relationship between two variables (x,y) plotted on the x and y axes of the graph. ◦ Pairs of values for x and y are called data points. ◦ Data points plotted to see if a pattern emerges.
Standard Normal Transformations: A sampling distribution can be converted to a _________________________ ____________________________ by applying the ____ transformation.
standard normal distribution; z
Step 1 of Hypothesis Testing: Null Hypothesis (H0)
statement about the population parameter (such as the mean) that is assumed to be true - starting point to determine if null is likely to be true or not
Step 1 of Hypothesis Testing: Alternative Hypothesis (HA)
statement that contradicts the null hypothesis - we think null is wrong, H1 allows us to state what we think is wrong
The ___________ of a correlation reflects how _______________ scores for each factor change.
strength; consistently
What distribution is used to locate the probability of obtaining a sample mean in a t-test?
t distribution
Theoretical sampling is...
the order of selecting individuals matters and each individual selected is replaced before sampling again
Power (Beta)
the probability of rejecting a false null hypothesis - it is the likelihood that we will detect an effect, assuming an effect exists
level of significance (α) (typically what percentage is it set at?)
typically, the level is set at 5% in behavioral research studied.
The sample mean is an __________ estimator.
unbiased
The sample mean is an
unbiased estimator
Pearson correlation coefficient (r)
used to measure the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement.
Samples and sample statistics (e.g. 𝑋) will naturally, randomly ____. (___________________________)
vary (sampling error)
The goal of hypothesis testing is to test questions, claims, and ideas of a population specifically what?
we are predicting the probability that a population parameter such as the mean is likely to be true
who proposed an alternative to go from z to t?
william gosset
when your alpha is set at 0.5, your z scores (in a two-tailed non-directional test) are always gonna be..
z = 1.96 and z = -1.96
Calculating Probability: - Parameters for 11 year-old males - Average height: 56 inches - Standard deviation: 4 - If we randomly selected 30 11 year-old males (n=30), what is the probability that their average height would be greater than 54 inches? - p(𝑋ത > 54)? - σ = .73
z = 54 -56 / 0.73 = -2/0.73 = -2.74 Area: 0.50 + .4969 = .9969 The odds of observing a sample mean that will be greater than 54% is 99.7% or nearly 100%. If we randomly select a sample of 30 11 year-old males and measure their height, the probability that the sample mean will be greater than 54" is .9969 or nearly 100%.
Z transformation equation..
z = M - uM / σM or z = M - u / σM
What is the obtained value of a z Test?
z-statistic; p-value
Calculating Standard Error: - Population Parameters for 11-year-old males ◦ Average height: 56 inches ◦ Standard deviation: 4 - Let's calculate the standard error of the mean if we have a sample of 30 11-year-old males.
σ𝑀 = 4 / (square root of 30) = 0.73 variation above the below the mean. How accurate our data is relative to the mean.
A correlation can be used to:
◦ (1) describe the pattern of data points for the values of two factors ◦ (2) determine whether the pattern observed in a sample is also present in the population from which the sample was selected
Limitations in Interpretation: Causality
◦ A significant correlation does not show that one factor causes changes in a second factor. ◦ Reverse causality is a problem that arises when the direction of causality between two factors can be in either direction. ◦ Factors could be systematic; they work together to cause change. ◦ A confound variable, or third variable, is an unanticipated variable that could be causing changes in one or more measured variables.
For a One-sample t-test, there are three measures of effect size:
◦ Estimated Cohen's d ◦ Eta-Squared (proportion of variance) ◦ Omega-Squared (proportion of variance)
Limitations in Interpretation: Restriction of range
◦ One or both correlated factors in a sample is limited or restricted, compared to the range of data in the population from which the sample was selected. ◦ When interpreting a correlation, it is important to avoid making conclusions about relationships that fall beyond the range of data measured.
What is the factorial sign?
!
regression line
- the best fitting straight line to a set of data points. - A best fitting line is the line that minimizes the distance of all data points that fall from it
What questions must be addressed in a Sample Design?
1) does the order of selecting participants to matter? 2) do we replace each selection before the next draw?
what 3 assumptions are made in one-sample t tests?
1) normality 2) random sampling 3) independence
Linearity
In behavioral research, we mostly describe the linear (or straight line) relationship between two factors.