Stats Final Exam

Ace your homework & exams now with Quizwiz!

Are negative z-scores shown on the table?

Negative z-scores are not shown in this table.

What are degrees of freedom?

Degrees of freedom describe the number of scores in a sample that are free to vary. Because the sample mean places a restriction on the value of one score in the sample, there are n - 1 degrees of freedom for the sample.

When is a dependent-means t-test used?

Dependent-means t-test is appropriate when the observations used to estimate each population mean are not independent of each other (the most common research design in which such data arise is the pre- and post-test design). Also known as correlated means t-test or the paired difference t-test.

How do you decide whether to reject or fail to reject the null hypothesis in an independent t-test?

If the obtained value of t is larger than the critical value, you can reject the null hypothesis

What is the individuals nested within contexts analysis standard for evaluating the assumption of independence?

"When participants receive the treatments in groups (such as students nested in classrooms), participants within each group are more similar to each other than participants in different groups." This phenomenon (referred to as intraclass correlation) produces an estimated variance that is too small and the degrees of freedom for the t-test that are too large. The assumption of independence of observations is violated and the t-test does not maintain control of Type I error.

If we set alpha at 0.05, what is the critical value for this test?

+/- 1.96 (taken from the z-tables)

Using the same hypothetical example as above, of all the samples we might draw from the population, how many will yield confidence intervals that contain the population mean?

68.26% of the samples (because they fell within 1 SD of the mean) For this reason, we can say that we are approximately 68% confident that our confidence interval contains the population parameter.

How do you use the z-test once you calculate the z statistic?

You see where it falls on the sampling distributions curve. Conceptually this z statistic represents how far the sample mean is from the null hypothesized population mean, measured in standard errors.

What is the Standard Normal Distribution?

a normal distribution expressed as Z-scores

Define null hypothesis

null hypothesis (H0) The hypothesis that is assumed to be true and formally tested the hypothesis that determines the sampling distribution to be employed, and the hypothesis about which the final decision to "reject" or "retain" is made.

What would be the obtained z-score for our example (sample mean=105, population mean=100 and standard deviation=15)?

obtained z-score = 1.67

What do we do if the null hypothesis is not contained in the 95% CI?

reject the null hypothesis

What is the formula for calculating the z-score?

the score minus the mean, all divided by the standard deviation

What table do you use when looking at t-distributions?

the t-distribution table

What is another name for the standard normal distribution?

the z-distribution.

What is the general way the z and t statistics can be expressed?

z or t = (sample mean-population mean)/(estimated)standard error

What tests evaluate the population normality?

• formal tests of normality (Shapiro-Wilk, Kolmogorov-Smirnov) • examine sample skewness and kurtosis

What special statistical tools are used to make such inferences?

inferential statistics.

When we apply inferential statistics, what three distinct distributions do we need to keep in mind? (know the symbols associated with each as well)

-the population in which we are interested -the sample we observe -the sampling distribution

What should researchers concern themselves with in selecting a sample for a study?

-the representativeness of the sample - the size of the sample This is because these sample characteristics affect the inferences that can be made.

In any normal distribution, what known percentages of observations lie within any specified numbers of standard deviation(s) from the mean?

68.26% of the observations lie between one standard deviation below the mean and one standard deviation above the mean. 95.44% of the observations fall within two standard deviations of the mean.

What is alpha?

A hypothetical probability that we determined before the test Alpha =significant level or Type I error rate - The hypothetical remainder of the area under the curve other than the confident interval (CI). - This is the bit of chance that we are playing on. - We decide on this level before we conduct the test.

What does a z-score always reflect?

A z-score always reflects the number of standard deviations above or below the mean of a particular score.

What does applying the z-formula always produce?

Applying the z formula will always produce a transformed distribution with a mean of zero and a standard deviation of one.

What is the advantage of using z-scores?

Because z-scores are standardized, we can compare them across variables, which may be measured on different scales Example: student performance on a classroom exam with her or his performance on a standardized exam

What is the z-score below which 15% of the distribution lies?

Remember, we have a symmetric distribution so just switch the values you are looking for in each column (e.g., look for .15 in Column C) and make the corresponding z-score negative. The z-score = -1.03.

How are statistics represented to help differentiate them from population parameters?

Statistics are represented using either Roman letters or Greek letters with hats on them (to differentiate the sample statistics from the population parameters).

What are the characteristics of populations called?

The characteristics of populations (the population mean, variance, or correlation) are called parameters. Parameters are usually represented using Greek letters.

What are the characteristics of samples called?

The characteristics of samples (the sample mean, variance, or correlation) are called statistics

Where does the null hypothesis come from?

The statistical null hypothesis is translated from the research hypothesis.

What do the variance of the mean and the standard error of the mean indicate?

The variance of the mean and the standard error of the mean indicate the magnitude of sampling error (uncertainty about the population mean).

What does alpha represent?

Type I error rate

Why do we use the term "fail to reject" rather than "accept" when discussing the null hypothesis?

We always say that we "fail to reject" the null hypothesis, instead of we "accept" the null hypothesis because we never know the population parameters.

For z=1, what is the area on the curve between 0 and 1? The area of the curve beyond 1?

When z=1 the area between the z-score of 0 and 1 is .3413. The area beyond the z-score of 1 is .1587.

What are the four steps for tests of hypothesis that are the differences in two means?

1. State the Null (H0). and Alternative (H1) Hypotheses to be tested (H0 and H1). 2. Assume that H0 is true and sketch the theoretical sampling distribution of t. 3. Specify alpha, the degree of risk of a Type I error (the risk of incorrectly concluding that H0 is false when it is really true), to determine the critical t value. 4. Make a decision regarding H0 (reject or fail to reject) based on the obtained t value, compared to the critical t value.

WHat is the analysis plan when looking at three or more means?

1. As usual, we should start by describing the data. For each group, we would like: • a visual picture of the distribution (e.g., box-plot) • descriptive statistics (e.g., mean, SD, minimum, maximum, skewness, kurtosis) 2. We would then like to move into an inferential analysis to see if there is a statistically significant difference in group mean achievement scores.

What are the steps for for figuring out if the variance of out sample means are greater than the expected amount for an ANOVA?

1. Estimate sigma squared of x bar by looking at the variation in our sample means (between-group variation). 2. Estimate sigma squared by looking at how far individual observations deviate from their mean (within-group variation). 3. Compare these estimates formally by conducting an ANOVA.

What is a commonly used decision rule value? How do we use it?

A commonly used decision rule is a probability of .05. This means that we will reject the null hypothesis if the probability of our sample mean is less than .05 (p<.05). How would we represent this on a sampling distribution? The red areas are called critical regions. The z values of plus and minus 1.96 are called critical values. The critical values of +1.96 and -1.96 come from the z table. These are the values that demarcate the most extreme 2.5% of each tail. If the obtained z is greater than +1.96 or less than -1.96 then our decision is to reject the null hypothesis. We refer to the significance level as "alpha" or "Type I error rate." We refer to the yellow +red areas as our p-value. (: p > alpha.)

What do we use to decide between the two possible hypothesis?

A decision rule. Our decision rule aids us in determining which population the sample likely comes from: the population implied by the null hypothesis or another population. Our decision rule involves selecting a probability level and comparing this probability with the probability associated with the sample mean on the null distribution.

Define effect size

A general term for a statistic that communicates the magnitude of a research finding rather than its statistical significance. Effect size can pertain to a bivariate relationship (r2) or differences between two or more means (e.g., d, ωˆ 2).

What effect does a larger standard deviation have on a z-score?

A larger standard deviation produces a lower z-score.

Define variance

A measure of variation that involves every score in the distribution. Stated more technically ,it is the mean of the squared deviation scores.

What does a negative z-score indicate?

A negative z-score occurs if an observation is below the mean.

Are populations always real?

A population can be real -such as, all 6th graders in Florida Or hypothetical -such as the population of 6th graders if each member completed the new Earth Science for the Rest of Us curriculum

What does a positive z-score indicate?

A positive z-score occurs if the observation is above the mean.

Define confidence interval

A range of values within which it can be stated with reasonable confidence (e.g.,95%) the population parameter lies Also called interval estimate

What effect does a smaller standard deviation have on a z-score?

A smaller standard deviation produces a higher z-score.

Define a z-score

A standard score having a mean of 0 and astandard deviation of 1.

How is the shape of the t-test sampling distribution described? How is this different than the z-distribution?

All the central t-distributions are described by symmetric (skewness=0), unimodal curves with a mean of 0. Whereas the variance of the z-distribution is 1, the variances of the t-distributions with various degrees of freedom are greater than 1; t-distributions are leptokurtic. Leptokurtic distributions have thicker tails than the normal distribution as shown on the next slide.

What is the alternative hypothesis? How is it written?

Alternative hypothesis, denoted H1 The alternative hypothesis is the logical complement to the null hypothesis. Often this ends up being the same as the research hypothesis. For this example, the alternative hypothesis is that "the treatment changes IQ scores,"

From our example above, how could we use the p-value to determine whether we should reject or fail to reject the null hypothesis?

An alternate way to conduct a z-test to determine whether we reject the null hypothesis is to compare p value (including the brown plus red areas) to alpha (the red area). To calculate p value, go to the z tables and find the area that corresponds to z > 1.67 and z < -1.67. From the z-table, we found that the area beyond z of 1.67 is .0475. Therefore, p = (.0475)(2) = .095. Since p > alpha, we fail to reject the null hypothesis.

What does the ANOVA test allow you to do?

Analysis of Variance (ANOVA) f ANOVA will let us conduct one test of the null hypothesis that the population means of all groups are equal

What does the Central Limit Theorem tells us about the shape of the sampling distribution?

As sample size increases the shape of the sampling distribution approaches the shape of a normal distribution, regardless of the shape of the population distribution. That is, even if the population distribution is skewed or leptokurtic, we can still use the normal distribution for inferences because the sampling distribution is approximately normal.

What happens to the standard error of the mean as sample size increases? What does this influence?

As the sample size increases, the standard error of the mean decreases. This is the increase in precision that we get with larger samples.

How do you analyze the F-statistic to decide on whether to reject or fail to reject the null hypothesis?

Based on the equation below for the F-statistic, we use the MSbetween as the numerator and MSwithin as the denominator. Therefore, we use dfbetween for the numerator and dfwithin for the denominator. Refer to the Table of critical F-values in your textbook using degrees of freedom between groups =numerator and degrees of freedom within groups = denominator If the obtained F is greater than the critical value, you would reject the null hypothesis

What would happen if the sample above was 1000 students rather than 100 students?

Based on the sampling distribution we can see that: approximately 68% of the sample means will be between 99.21 and 100.79, which is within one standard error of the population mean. Approximately 95% of the sample means will be between 98.42 and 101.58, which is within two standard errors of the population mean.

What do we need to consider when using the t statistic?

Because we are using the sample standard deviation (s) to estimate the population standard deviation (sigma ), then we need to take into account the fact that it is an estimate. If you think back to variability (descriptive statistics), we must take the degrees of freedom into account.

How does increasing the alpha level increase power? How does this affect the type I error rate?

By increasing Type I error rate (for instance, changing .05 to .10), you make the critical region bigger (the colored area under the curve of H0). But you make more power (the area under the curve of H1). In other words, you create more power but also increase the risk of committing a Type I error.

What type of variable can you use z-scores for?

Continuous variables

What test statistics do we use if we don't know sigma?

Estimated standard error of the mean and t-score

How can we evaluate the independence of observations?

Evaluate the design of the study

What is the proportion of scores lying between a z-score of 1.00 and -1.00?

Find z-score of 1.00 in the Z-table. Since the normal curve is symmetric, the area from 0 to 1 and 0 to -1 is the same. The proportion of the normal curve falling between these two values of Z = (.3413+.3413) = .6826

Consider the following hypothetical example to describe what the Central Limit Theorem tells us about the sampling distribution of the mean. I'm interested in studying the performance of 4th grade students on a new vocabulary test. In the population (all 4th grade students in the state of Florida), the average score on this test is 100 (mu = 100.) Also, in the population the standard deviation of vocabulary test scores is 25 (sigma = 25) If I am planning to draw a sample of size 100 from this population, what does the Central Limit Theorem tells me about the sample mean I can expect to see?

First, the Central Limit Theorem tells us that the typical sample mean (of representative samples) will be equal to 100 (the same value as the population mean). Second, the Central Limit Theorem tells us that the standard deviation of the sample means (the standard error of the mean) will be 2.5 - the population standard deviation divided by the square root of the sample size) Finally, the shape of the sampling distribution will be normal -we can use the table of the normal curve to determine that approximately 68% of the samples we might draw will have means within one (z‐score) standard error of 100 - that is, 68.26% of the samples we might draw will have means between 97.5 (100‐(1 x 2.5) and 102.5 (100+(1 x 2.5)!

What are the 5 steps of hypothesis testing for the one mean z-test?

Five steps of hypothesis testing for the one mean z-test: 1. State the null and alternative hypotheses to be tested. 2. Sketch the theoretical sampling distribution of the z score. 3. Specify alpha and determine the critical z-value. 4. Locate the obtained z score. 5. Make a decision regarding the null hypothesis or H0 (we either "reject" or "fail to reject" the null hypothesis).

Describe the robustness of the t-test to violations of homogeneity of variance

For violations of the assumption of homogeneity of variance, the t-test is usually robust if sample sizes are equal. When sample sizes differ between the two groups, the test does not maintain control of Type I error probability.

Describe the robustness of the t-test to violations of the assumption of independence of observations

For violations of the assumption of independence of observations, the t-test is clearly non-robust and Type I error rates become grotesquely inflated.

Describe the robustness of the t-test to violations of the assumption of population normality

For violations of the assumption of population normality, the t-test is usually very robust. The robustness of the test improves as sample sizes increase and when equal numbers of observations are present in each group.

How does sampling error affect the height of the curve?

Greater sampling error flattens the curve Less sampling error makes the curve steeper

What were the three assumptions William Gosset made regarding sampling distribution and t-tests?

Homogeneity of Variance: the populations from which the samples were obtained have identical variances Population Normality: the populations from which the samples were obtained are normally distributed Independence of Observations: the observed values in each sample are independent of each other

What is hypothesis testing?

Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population. In other words, we want to be able to make claims about populations based on samples.

What is the individual unit of analysis standard for evaluating the assumption of independence?

If each participant receives the treatment independently of the other participants, the degrees of freedom and the estimate of the dependent variable's variance are unbiased. The assumption of independence is met.

How are p and alpha related?

If p > alpha, it indicates that our sample is more likely than our decision rule. We therefore conclude that we do not have evidence that the null hypothesis is false. If p < alpha, it indicates that our sample is less likely than our decision rule. We therefore conclude that we have evidence that the null hypothesis is false.

How do you calculate an 95% CI for the difference in population means?

If the null hypothesis is not contained in the confidence interval, we reject the null hypothesis

What happens if the z statistic is less than the critical value (doesn't get into the critical region)?

If the obtained z is not in the critical region, this means the probability associated with our observed data is larger than the rule we set up for "rare" (.05). Then we conclude that the difference between our sample mean and the population mean is sampling error (CHANCE!). We "fail to reject the null hypothesis." This means NO statistically significant DIFFERENCES.

What might happen if there is only a small difference between the means? A large difference?

If there is a small difference between means, then we might miss it unless our study is very well designed. If there is a large difference between the means even a poorly designed study will probably find it.

What is a type II error and what symbol denotes it?

If we fail to reject the null hypothesis when the null hypothesis is false, it is an error, called a "Type II error" (remember this forever). The rate of committing a Type II error is denoted by (beta).

What is power and how is it calculated?

If we reject the null hypothesis when the null hypothesis is false, we make a correct decision. This is "POWER" (remember this forever). The rate of power is denoted by 1 - (1 - beta). Power is our ability to detect a real effect. It is represented as the percent or likelihood of correctly rejecting the null hypothesis. It has been proposed that power of at least .80 is desirable.

What is a type I error? How can we control it?

If we reject the null hypothesis when the null hypothesis is true, we make an error, called a "Type I error" (remember this forever). The rate of a Type I error is denoted by (alpha). We can control the Type I error rate by changing our alpha level.

How is the sampling distribution of the mean obtained (use the FCAT example)?

If we repeat this process of drawing samples of 25 6th grade children from the same population and computing the mean FCAT score in each sample, we could construct a frequency polygon to describe all of the sample means we observed. This distribution of sample means represents the sampling distribution of the mean.

Why is understanding normal distributions beneficial?

If we think about things in the right way, the normal distribution can be used in inferential statistics to think about probabilities.

Since we don't know the population mean, how is the Central Limit Theorem used?

In actual research, of course, we don't know what the population mean is (if we knew that, we wouldn't need to draw a sample to estimate the population mean!). We can still use our knowledge of sampling distributions and the Central Limit Theorem in order to make inferences about the population mean. The first approach to making inferences is through confidence intervals. The second approach is hypothesis testing

What happens to power if you decrease alpha? What happens to the type I error rate?

In contrast, by decreasing Type I error rate (for instance, changing .05 to .01), you make the critical region smaller (the colored area under the curve of H0). At the same time, you also decrease power (the colored area under the curve of H1). In other words, you decrease the risk of committing a Type I error but also decrease power.

What is the formula for calculating a confidence interval?

In general, intervals to give us any desired levels of confidence are constructed by adding and subtracting the appropriate number of standard errors from the sample mean, and this number of standard errors is found in the table of the standard normal distribution.

What are examples of when to use an independent means test?

In many research applications, the observations are not related to each other: - Treatment and control groups when participants are not matched prior to group assignment - Differences between boys and girls when the participants are not sibling pairs or otherwise related to each other For these types of data, inferences about population mean differences are made using the independent means t-test.

What do you need to consider when deciding to accept or reject the null hypothesis?

In testing whether the treatment works, the theoretical sampling distribution represents the distribution of the null hypothesis. That is, the theoretical sampling distribution is based on the premise that the treatment has no effect. If the effect observed in our sample statistic falls in the area of the theoretical sample distribution that comprises a tiny percentage of the normal curve, there is a low probability that the effect is attributed to sampling error (chance). When this happens, we can call the effect statistically significant and reject the null hypothesis.

Define normal curve (normal distribution)

In the idealized form, a perfectly symmetrical, bell-shaped curve. The normal curve characterizes the distributions of many physical, psychoeducational, and psychomotor variables. Many statistical tests assume a normal distribution.

What is the formula for variance for an independent t-test?

In this formula, SS1 and SS2 are the sums-of-squares in groups 1 and 2, respectively. To obtain the pooled variance estimate, the sums-of-squares are added together and the sum is divided by the total number of observations (n1 + n2) minus 2.

Why does increasing the sample size increase power?

Increasing sample size reduces sampling error (a smaller standard error of the mean). Increasing the denominator of standard error formula decreases the overall value of standard error. Decreasing the denominator of a ratio increases the overall value of the statistic (a higher z or t value). The higher the value of the statistic, the more likely it is that we have found a true difference in the population.

When is an independent means t-test used?

Independent-means t-test, in contrast, is used when the observations used to estimate each population mean are independent of each other The most common research design for this is comparing two separate groups of observation

How is the information obtained from sample statistics used?

Information obtained from sample statistics can be used to make reasonable guesses (called inferences or estimates) about population parameters.

What does it mean when we say that the null hypothesis is false? it means that our sample comes from one of these distributions.

It means that our sample comes from one of these distributions. If the sample mean falls out of this range, then we "reject the null hypothesis."

What does it mean when we say that the null hypothesis is true?

It means that our sample comes from this distribution. If the sample mean falls in this range, then we "fail to reject the null hypothesis."

Using the same hypothetical example used above, demonstrate how a confidence interval is calculated

Let's say that our sample mean is 102. Remember that the standard error of the mean was 2.5. We can subtract 2.5 from our sample mean; that is, 102 - 2.5 = 99.5, and we can add 2.5 to our sample mean, that is, 102 + 2.5 = 104.5. These two values are the endpoints of a confidence interval. Notice that the value 100, which is the population mean, is contained in this interval. If we think about this sampling distribution, we'll see that any sample mean between 97.5 and 102.5, that is, 1 standard error below and above the population mean, will yield a confidence interval that contains the population mean of 100.

Using the example above, based on where the sample mean falls on the sampling distribution, does it seem likely that this sample was drawn from this distribution?

Looking at the sampling distribution, we have two possible decisions we can make about the null hypothesis: 1) The sample was drawn from the population with a mean of 100, and it only looks different (a mean of 105) because of sampling error. 2)The sample was not drawn from this population, rather it comes from a population with a mean other than 100. H0 and H1 are hypotheses about true population means, and each hypothesized mean has a sampling distribution that goes with it.

What do z-scores allow us to do?

Make meaningful comparisons

How do you convert a normal distribution to a standard normal distribution?

Normal distributions can be transformed to standard normal distributions by applying the z-score formula:

Define the null hypothesis and the research/alternative hypothesis

Null Hypothesis - The hypothesis that the differences we observe in the data are chance only; they result from sampling error rather than population differences. -Ho = no significant effect Research / Alternative Hypothesis - The hypothesis that there is a difference in the populations; the sample mean reflects more than sampling error. -H1 = significant effect

What do we call the hypothesis that is being tested? What symbol represents it?

Null hypothesis

What is one important use of the standard normal distribution?

One important use of the standard normal distribution is for converting between scores from a normal distribution and percentile ranks.

Define a type I error and what tells us the probability of a type I error

Rejecting a null hypothesis when it is true. Alpha, α, gives the probability of a Type I error.

Give an example of a research hypothesis versus a null hypothesis and also write the symbols for each

Research Hypothesis: "The treatment changes IQ scores." The Null Hypothesis: "The treatment has no effect on IQ scores."

What are researchers usually interested in?

Researchers are usually interested in the characteristics of populations (for example, all 6th graders in the state of Florida).

Define a type II error and what is it equal to

Retaining a null hypothesis when it is false equal to 1 - β.

What does robustness of the assumptions mean?

Robustness of the assumptions means that if the formal assumptions have been violated, the statistical test may still perform adequately in terms of controlling the Type I error probability. When a test maintains the actual Type I error probability close to the researcher's alpha level, despite violation of an assumption, the test is said to be robust to violations of the assumption. The robustness properties of the independent means t-test can best be summarized as: It depends!

What does sample representativeness affect?

Sample representativeness affects the accuracy of inference.

What does sample size affect?

Sample size affects the precision of inference

Are sampling distributions the same as sample distributions and population distributions?

Sampling distributions are fundamentally different from sample distributions and population distributions.

Why do we need to do sampling?

Since it is impossible to collect the information from all the individuals in the population, we can only study a subset of that population (called a sample).

From our example above, would we reject the null or fail to reject the null?

Since the obtained z score (1.67) is smaller than the critical z score (1.96) , we fail to reject the null hypothesis.

What are some examples of situations where a two dependent means test would be used?

Some research questions are investigated by comparing the means of samples that are related to each other: - Pre-test / post-test Designs - Comparison of matched samples - Studies of twins or marital partners When the observations in the two samples are related to each other, this relationship needs to be accounted for in the hypothesis test. The appropriate test to use in this situation is the dependent-means t-test.

What is the mean and standard deviation values for a standard normal distribution?

Standard normal distribution is a normal distribution with mean = 0 and standard deviation = 1.

Define inferential statistics

Statistics that permit conclusions about a population, based on the characteristics of a sample drawn from the population.

Define the t-statistic

Test statistic used for testing a null hypothesis involving a mean or mean difference when the population standard deviation is unknown; also used for testing null hypotheses re

What does the Central Limit Theorem tell us and what does it allow us to do?

The Central Limit Theorem tells us about the characteristics of sampling distributions. The Central Limit Theorem tells us three very important characteristics of the sampling distribution of the mean: -What is the typical value of the sample mean? -What is the amount of dispersion of the sample means? -What is the shape of the sampling distribution of the mean? This theorem allows us to use the sampling distributions of statistics, without actually having to draw 100,000 samples from the population.

Describe the parts of a z-table

The Z-table is a three column table of z-scores and area under the normal curve. z-score Column A represents the z-score. Column B represents the area between the mean and the z-score. Column C represents the area beyond the z-score.

What do we use the central limit theorem for in ANOVA?

The central limit theorem tells us the variance in sample means depends on the variance in the population and the size of the samples.

Compare the formula for the t statistic and that of the z statistic

The formula for the t statistic is similar in structure to that for the z score, except that the t statistic uses the estimated standard error.

What is the major difference between a z-test and a t-test?

The major difference between the z and t tests is that in the one mean z test case we were dealing with situations in which we knew the population mean (mu ) and standard deviation (sigma ). However, in most situations we don't know the population standard deviation. We can still do hypothesis testing, but what we need to do is to use an estimate of the population standard deviation. Instead of using the population standard deviation we will be using the sample standard deviation.

What does the Central Limit Theorem tell us about the typical value of the sample mean? When is this not true?

The mean of the sampling distribution is the same as the mean of the population. Statisticians would say "the expected value of the sample mean is the same as the population mean." From the standpoint of inferential statistics, this means that the typical sample mean has the same value as the population mean. However, this is only true if our sample is representative of the population. If the sample is biased, then this is not true - the typical sample mean is different from the population mean. This is why the representativeness of the sample is very important.

Describe the normal distribution curve

The normal distribution is a symmetric, bell-shaped curve, characterized by its mean and standard deviation. The normal curve approximates the distributions of many variables.

Using the example from above, what would the null and alternative hypothesis be (sample mean=105, population mean=100 and standard deviation=15)

The null hypothesis (H0): The sample drawn comes from a population (no effect) with H0: mu = 100 The alternative hypothesis (H1): The sample drawn comes from a population with a different mean from H0: mu does not =100

How is the obtained value of t calculated for an independent t-test?

The numerator of this formula is just the difference between the two sample means minus the null hypothesized difference in population means - which is usually zero. The denominator is called the standard error of the difference between means (SE subscript x-bar1 - xbar2). Note that the pooled variance estimate is used in this formula. Degrees of freedom for groups 1 and 2 are (n1-1) and (n2-1), respectively. Therefore, the degrees of freedom for independent-means t-test is (n1 + n2-2) To calculate the obtained value of t for the independent samples case, we treat each of the two samples independently and calculate the mean and the sum-of squares in each sample.

What is the pooled variance estimate used to calculate?

The pooled variance estimate is used to calculate the standard error of the difference between the means.

What is p?

The probability of the observed mean on the null distribution - The exact probability that the statistic we calculated on our observed sample could actually occur in our null distribution by chance alone. - We can only calculate this if we have a computer. - Statistical software (SAS) will provide the p-value for you.

What is the formula to calculate the 95% CI for an independent t-test?

Using the critical value of t, the difference in sample means, and the sample estimate of the standard error of the difference between means, we can calculate a 95% confidence interval as:

Describe the resulting distribution once all the scores have been converted into z-scores

The resulting distribution will have a mean of zero and a standard deviation of one. It is important to realize that this transformation to Z-scores does not change the shape of the distribution, just the typical value and the dispersion.

What do the sample and population distributions represent?

The sample and population distributions represent distributions of the values of the variable of interest. For example example -What percentage of students received a score of 420 on the FCAT? -What percentage received a score of 320? -What is the mean FCAT score for the students? -What is the standard deviation of the FCAT scores?

What does the sampling distribution represent?

The sampling distribution represents the distribution of a sample statistic - the values of the statistic that would be observed if repeated samples were drawn from the same population. For example, what percentage of samples of size 50 -would have a mean of 364? -What percentage would have a mean of 322? -What is the mean of all the sample means we would expect to see? -What is the standard deviation of these sample means?

What happens to the shape of the distribution when the z-score transformation occurs?

The shape of the distribution will not be affected by the transformation. If X is not normal then the transformed distribution will not be normal either.

What determines the shape variation of the t-distribution sampling distribution?

The shape of the t-distribution varies as a function of the size of n (really it varies with the degrees of freedom). The bigger the n (the bigger the df), the closer the t-distribution is to a normal distribution. In the graph, the t-distributions with degrees of freedom 1, 5, and 25 appear along with the normal distribution.

How is the null hypothesis stated for a two means test?

The statistical null hypothesis is that is, the null hypothesis states there is no difference between the means of populations 1 and 2. The statistical null hypothesis is that the means of populations 1 and 2 are equal, or that there is no difference between the parameters mu1 and mu2. Eample: When two sample means are compared (e.g., the mean of boys versus the mean of girls on the FCAT exam), the research interest is whether there is a difference between the mean FCAT scores obtained by boys and that obtained by girls (e.g., girls performed better than boys on average). If there is no difference between the population means of FCAT scores of boys and girls, then the difference between the sample means is due to sampling error. H0: μ1- μ2=0 As in our FCAT example, the mean FCAT score for boys is the same as the mean FCAT score for girls.

When is the t-statistic used?

The t statistic is used to test hypotheses about the population mean when the value for the population variance is not known.

Define sampling distribution

The theoretical frequency distribution of a statistic obtained from an unlimited number of independent samples, each consisting of a sample size n randomly selected from the population.

What does the Central Limit Theorem tell us about the variance of the mean? About the standard error of the mean?

The variance of the sampling distribution, called the variance of the mean: - is proportional to variance of the population -and inversely proportional to sample size. The standard deviation of the sampling distribution, the standard error of the mean (SE): -is proportional to the standard deviation of the population -and inversely proportional to the square root of the sample size.

Why is the t-distribution different than the z-distribution table?

The z table is describing one distribution (the normal distribution). The t-distribution table is actually describing several different t-distributions. This is because there is a different t-distribution for every different degrees of freedom (df). Although when df gets large, the differences become really small. Each row corresponds to different t-distributions. As a result of this, there also isn't enough space to put all of the probabilities corresponding to each possible t score. Instead what is listed are the t scores at commonly used critical regions (that is, at popular alpha levels). The t-distribution table also splits one-tailed and two-tailed critical t-values up for you. The following slide is the explanation of the use for the t-distribution table.

What is the purpose of the z-test? Write the formula for the z-test

The z-test, is used to show how far away our sample mean is from the population mean. The statistical z-test is going to create a ratio. This ratio represents the difference between the sample mean and the null-hypothesized population mean divided by the difference we would expect from chance alone.

What if the statistic is larger than the critical value and falls in the critical region?

Then we conclude that there is too big a difference between our sample mean and the population mean to say that it is only sampling error. This is not by chance. So we "reject the null hypothesis." This means statistically significant DIFFERENCES.

Define Central Limit Theorem

Theorem that the sampling distribution of means tends toward a normal shape as the samples size increases, regardless of the shape of the population distribution from which the samples have been randomly selected

What are the five steps in the general procedure of hypothesis testing?

There are five steps in the general procedure of hypothesis testing: Step 1. Stating the null and alternative hypotheses Step 2. Constructing the sampling distribution for the null hypothesis Step 3. Computing a sample statistic and locating it on the null distribution Step 4. Finding out how likely that sample statistic is by chance (i.e., on the null distribution) Step 5. Making a decision about the null hypothesis: reject or fail to reject

What are three ways to increase power?

There are three ways to increase power: -Increase sample size -Increase the alpha (type-I error) level -Increase effect size

How does the size of n relate to the standard deviation of the sample and the standard deviation of the population?

This means that the higher the value of n, the more representative the sample will be of the population, which in turn means that s (standard deviation of the sample) will be a better estimate of (sigma; standard deviation of the population). It also has implications for the test statistic.

How is the within-group variation calculated using ANOVA?

To compute the within-group variation, we look at how far each individual score deviates from the mean of its group. More specifically, we compute the sum-of-squares within groups (SSwithin or SSw).

How do you conduct a dependent means t-test?

To conduct a dependent-means t-test, we calculate the difference score for each research participant, then conduct a one-mean t-test on these difference scores: Since the difference scores are a new data set, we calculate the deviation score by subtracting the mean difference score Xd bar ) from the difference score (Xd). Because sum of the deviation score is zero, we have to square the deviation score.

What is the risk of running multiple t-tests?

Type I error for the set of comparisons would be much higher than .05. If we conduct more than one t-test, we would increase our Type I error rate each time we do a t-test.

How do you decide whether to accept or reject the null hypothesis with a two mean dependent t-test?

Using a table of critical values for t, we can look up the critical value: critical with our given df If our obtained value of t is outside the critical t values, we reject the null hypothesis.

Using characteristics of the sample, what can we estimate?

Using characteristics of the sample (statistics) we can estimate characteristics of the population (parameters).

What is a more practical way to determine the sampling distribution of the mean?

Using the Central Limit Theorem

Are most distributions normal distributions>

Very few variables we actually encounter have distributions that look like the normal distribution,

How do you calculate a given confidence level (use 80% as an example)

We can construct a confidence interval by adding and subtracting more than one standard error from our sample mean. For example, an interval that will contain the population mean (mu) 80% of the time is given by: (X bar) +/- (1.282 x standard error). 1.282 comes from the z-score table The Z value of 1.282 cuts off the top 10% and the Z value of ‐1.282 cuts off the bottom 10%, leaving 80% in the middle. Use Column C (area beyond z‐score) in the Z‐ table to find the value of 0.1.

What is the closest z-score that separates the lower 90th percentile from the upper 10 percent of the distribution?

We can find the z-score based on area. Use Column B (area between mean and z-score) to find the appropriate proportion of interest to locate the corresponding z-score. Find the closest value to .40 (90th percentile because there is .50 below the mean). Alternatively, you can use Column C (area beyond z-score) to find the closest value to .10. The closest z-score is 1.28.

Can we ever know for certain that our inferences from our sample are true for our population?

We can never know for certain the true parent population. What we can determine is the probability that we have made a correct decision, or if we have made an error.

How is the between group variation calculated using ANOVA?

We look at how far each group mean deviates from the grand mean. More specifically, we compute the sum-of-squares between groups (SSbetween or SSb).

Because the sample statistic represents only an estimate of the population parameter, what do we need to consider? What represents this?

We need to consider the uncertainty of this estimate. This uncertainty is represented by the sampling distribution.

How is the F-statistic calculated?

We then compute the F-statistic by taking the ratio of MSbetween to MSwithin.

How do you convert the sum-of-squares to a mean square (MSwithin or MSw)?

We then convert the sum-of-squares to a mean square (MS) by dividing the sum-of-squares by its degrees of freedom. The degrees of freedom within-groups is the total sample size - the number of groups.

How do you convert the sum-of-squares to a mean square (MSbetween or MSb)?

We then convert the sum-ofsquares to a mean square (MSbetween or MSb) by dividing the sum-of-squares by its degrees of freedom. The degrees of freedom between-groups is the number of groups - 1.

What is the notation system used for computing things like between-group variation in ANOVA?

We will use X to indicate a score, and subscript it so that the first subscript indicates position within the group and the second subscript indicates which group. For example X32 would indicate the score for the 3rd person in the second group. More generally, Xij indicates the score for the ith person in the jth group. We will use n to represent the last person of a group, and k to represent the last group.

Explain how alpha and type I errors are related

When the null hypothesis is true, a total of probabilities for a correct decision and a Type I error is 1. Therefore, if the Type I error rate is alpha, the probability of a correct decision is 1 - (1 - alpha). Generally speaking, alpha is set to be .05, which means that we only allow a small probability (5 out of 100) to commit a Type I error when you reject the null hypothesis.

How is variance estimated with an independent t-test?

When we have two independent groups of observations, we could use the sample variance in either group to estimate the population variance. However, the precision of our estimates increases with the number of observations used so our best variance estimate is obtained by pooling the variability of data in both groups. This pooling is accomplished using the sum-of-squares in each group:

How can you use the t-distribution table (assume alpha = 0.05)?

When you set alpha to .05 for the two-tailed test (both sides), there are two ways you can check in the t-distribution table. One is to check alpha =.05 directly for the two-tailed test (yellow shading). You will see that alpha=.025 (one-tailed test) and alpha=.05 (two-tailed test) are in the same column, which means that you selected the correct column.

Describe in words the relationship between population and sample

You collect data from a subset of the population of interest and calculate the statistics from the sample You use those statistics to make inferences back to the population. The characteristics of the sample, such as mean, variance or correlations are called statistics. The characteristics of the population, such as population mean, variance, or correlation are called parameters. Information obtained from the sample statistics can be used to make inferences about the population parameters.

Define degrees of freedom

degrees of freedom (df ) The number of independent pieces of information a sample of observations can provide for purposes of statistical inference.

What test evaluates the homogeneity of variance?

folded F test

How do you calculate the degrees of freedom for an independent t-test?

n1 + n2 -2

What is a special distribution shape that is very useful in statistics?

normal distribution This curve is the most important distribution in statistics.

Why do we find sampling mean differences between samples and what is this called?

sampling error For example: Imagine drawing a sample of 25 6th grade children from the population of 6th grade children in Florida. We can compute the mean FCAT score for this sample. Now, imagine drawing a second sample of 25 6th grade children from the same population and computing the mean FCAT score for this sample. The two samples are not likely to have the same mean FCAT score because of "sampling error".

What test statistics do we use if we know sigma?

standard error of the mean and Z score

What are z-scores?

standardized deviation scores z-scores tell us how far away from the mean a score is in standard deviation units

What is the formula for the t score?

t = (sample mean - population mean)/estimated standard error of the mean

What is the alternative hypothesis in a two means test?

the alternative hypothesis is that the difference is not zero (or that mu1 does not equal mu 2 )


Related study sets

ENZYMES LOWER the ACTIVATION ENERGY for reactions

View Set

Ch 3: The Accounting Cycle: End of the Period

View Set

Exam FX Life and Health Insurance Practice Exam: Learning Mode

View Set