STATS final

Ace your homework & exams now with Quizwiz!

What is an expected​ value?

Expected value is the estimated gain or loss of partaking in an event many times.

The area to the right of the standard score zequals=1.0 is​ 0.1587, so the​ P-value in a​ two-tailed test is 0.1587

The statement does not make sense. The​ P-value is equal to twice the area in the tail past the standard​ score, z.

What is a sample​ proportion?

A fraction​ (or percentage) with which some variable occurs in a sample

What is a probability​ distribution?

A probability distribution represents the probabilities of all possible events of interest.

What should be included when making a graph of a​ distribution?

A title​ and/or caption, scales and titles for the​ axes, and a legend if more than one data set is shown on the graph

What is a type I​ error? What is a type II​ error?

A type I error is the mistake of rejecting the null hypothesis when the null hypothesis is actually true. A type II error is the mistake of failing to reject the null hypothesis when the null hypothesis is actually false.

Which outlier would make it appear that there is correlation when there is​ none?

An outlier far separated from the rest of the data points.

What is​ confounding?

Confounding is the mixing of effects from different factors so that the effects from the specific factors being studied cannot be determined.

Briefly describe the four conditions under which we can expect a data set to have a nearly normal distribution. Select all that apply.

Individual data values result from a combination of many different​ factors, such as genetic and environmental factors. Data values are spread evenly around the​ mean, making the distribution symmetric. Larger deviations from the mean become increasingly​ rare, producing the tapering tails of the distribution. Most data values are clustered near the​ mean, giving the distribution a​ well-defined single peak.

Can the law be applied to a single observation or​ experiment? Explain

It does not apply to a single trial​ (observation or​ experiment), or even to small numbers of​ trials, but only to a large number of trials.

What is a best fit​ line?

It is a line that lies closer to the data points than any other possible line.

What does the square of the correlation​ coefficient, r squaredr2​, tell us about a​ best-fit line?

It tells us the proportion of the variation that is accounted for by the​ best-fit line. For​ example, if r squaredr2equals=​0.9, or​ 90%, then​ 90% of the variability is accounted for by the​ best-fit line, but​ 10% is not.

Does the idea of statistical significance apply to samples or​ populations? Briefly explain why.

Statistical significance applies to samples because the values of population parameters have no uncertainty.

Is the correlation most likely due to​ coincidence, a common underlying​ cause, or a direct​ cause?

The correlation is most likely due to a common underlying cause. Many crimes are committed with handguns that are not registered.

Is the correlation most likely due to​ coincidence, a common underlying​ cause, or a direct​ cause?

The correlation is most likely due to a common underlying​ cause, such as the general increase in the number of cars and traffic.

What is a sample​ mean?

The mean of a particular sample drawn from a population

Suppose that many random samples of size n for a variable are taken and the distribution of means of each sample is recorded. Select all statements that are part of the Central Limit Theorem below.

The mean of the distribution of means approaches the population​ mean,μ. The standard deviation of the distribution of means approaches sigma divided by σ/n​, where σ is the standard deviation of the population The distribution of means will be approximately a normal distribution.

In interpreting a​ P-value of 0.420.42​, a researcher states that the results are statistically significant because the​ P-value is less than​ 0.5, indicating that the results are not likely to occur by chance

The statement does not make sense. A​ P-value of 0.420.42 corresponds to results that are likely to occur by chance

I found a strong negative correlation for data relating the percentage of people in various countries who are literate and the percentage who are undernourished. I concluded that an increase in literacy causes a decrease in undernourishment.

The statement does not make sense. Correlation is not necessarily causation.

Because the significance level is the probability of making a type I​ error, it is wise to select a significance level of zero so that there is no probability of making that error.

The statement does not make sense. It is impossible to have no probability of a type I error.

Based on our​ sample, the​ 95% confidence interval for the mean amount of television watched by adults in a nation is 3.23.2 to 3.43.4 hours per day.​ Therefore, there is​ 95% chance that the actual mean for the population is 3.33.3 hours

The statement does not make sense. The center of a confidence interval is not necessarily the population mean.

The scatterplot showed all the data points following a nearly straight diagonal​ line, but only a weak correlation between the two variables being plotted.

The statement does not make sense. The data points following a nearly straight diagonal line would indicate a very strong correlation between the two variables.

Based on our​ sample, the​ 95% confidence interval for the mean amount of television watched by adults in a nation is 2.12.1 to 2.92.9 hours per day.​ Therefore, there is​ 95% chance that the mean for all adults in the nation will fall somewhere in this range and a​ 5% chance that it will not

The statement does not make sense. The population mean is a fixed constant that either falls within the confidence interval or it does not. There is no probability associated with this.

In hypothesis​ tests, if the significance level is​ 0.01, then the​ P-value is also 0.01.

The statement does not make sense. The significance level and the​ P-value represent different components of the hypothesis​ test, and are generally not the same.

The two variables I studied showed such a strong correlation that they had a correlation coefficient of requals=1.50

The statement does not make sense. The value of the correlation coefficient ranges from −1 to​ 1, so having a value of r=1.50 is not possible

Reasoning that barrels of oil are​ three-dimensional, a newspaper publishes a​ three-dimensional graph showing oil production​ (barrels) in five different years.

The statement does not make sense. There are only two​ variables, oil production and​ time, so only a​ two-dimensional graph is needed.

Our survey found that 5757​% of voters approve of a particular policy of the​ President, with a margin of error​ (for 95%​ confidence) of 77 percentage points.​ Therefore, there is only a​ 5% chance that the proportion of approval among all voters differs from 5757​%.

The statement does not make sense. With​ 95% confidence, the true population proportion is between 5050​% and 6464​%

​There's been only a very slight rise in our stock price over the past few​ months, but I wanted to make it look dramatic so I started the vertical scale from the lowest price rather than from zero.

The statement makes sense because reducing the range of the vertical axis to just fit the data will increase the relative size of the variation in the data.

The standard deviation for the heights of a group of​ 5-year-old children is smaller than the standard deviation for the heights of a group of children who range in age from 3 to 15.

The statement makes sense because the range of data for the heights of a group of​ 5-year-old children is smaller than the range of data for the heights of a group of children who range in age from 3 to 15.

I created a scatterplot of CEO salaries and corporate revenue for 10 companies and found a negative​ correlation, but when I left out a data point for a company whose CEO took no​ salary, there was no correlation for the remaining data.

The statement makes sense. A CEO taking no salary is an​ outlier, and an outlier can make a correlation appear where there otherwise is none.

Researchers conducted animal experiments to study smoking and lung cancer because it would have been unethical to conduct these experiments on humans.

The statement makes sense. Researchers cannot randomly assign people to treatment and control groups and ask subjects in the treatment group to smoke.

A process consists of repeating this​ operation: Randomly select two values from a normally distributed population and then find the mean of the two values. The sample means will be normally​ distributed, even though each sample has only two values.

The statement makes sense. Since the population is normally​ distributed, the distribution of sample means will also be normally​ distributed, regardless of sample size.

In a test of the claim that a majority of Americans believe that human activity is the major cause of global​ warming, the null hypothesis is that pequals=0.5 and the alternative hypothesis is pgreater than>0.5

The statement makes sense. The null hypothesis is the starting assumption and the alternative hypothesis is the claim that needs to be supported by evidence.

Although a company randomly surveys only a few thousand households out of the millions that own TVsTVs​, they have a good chance of getting an accurate estimate of the proportion of the population watchingwatching a particularparticular channelchannel

The statement makes sense. The sample size is large enough for the distribution of sample proportions to be nearly​ normal, so individual sample proportions should be clustered around the actual population proportion.

What are the differences among​ theoretical, relative​ frequency, and subjective techniques for finding​ probabilities?

The theoretical technique is based on the assumption that all outcomes are equally​ likely, while the relative frequency technique is based on observations or​ experiments, and the subjective technique is an estimate based on experience or intuition.

Explain how a graph that shows percentage change can show descending bars​ (or a descending​ line) even when the variable of interest is increasing.

The vertical axis on the graph represents a percentage change such that the​ drop-off means only the actual value of the variable rises by smaller amounts.

A distribution is which of the​ following?

The way the values of a variable are spread over all possible values that can be summarized with a table or a graph

When referring to a​ "normal" distribution, does the word normal have the same meaning as it does in ordinary​ usage? Explain.

The word normal has a special meaning in statistics. It refers to a specific category of distributions that are symmetric and​ bell-shaped with a single peak. The peak corresponds to the​ mean, median, and mode of such a distribution.

The results of my hypothesis test were statistically significant at the 0.01​ level, so no one can doubt my claim any longer.

This statement does not make sense. Statistical significance at the 0.01 level still implies a​ 1% chance that the result is in​ error, leaving room for reasonable doubt.

To learn about smartphone​ ownership, I chose a null hypothesis claiming that the proportion of adults who own a smartphone is equal to​ 0.8, and the result of my hypothesis test proved this claim to be true.

This statement does not make sense. The null hypothesis is​ good, but the hypothesis test cannot result in accepting​ (or proving) the null hypothesis.

Describe conditions under which you could use a t distribution instead of a normal distribution when making inferences about a population mean.

When the population standard deviation is not​ known, and either the population is normally distributed or the sample size is greater than 30.

Give three examples of pairs of variables that are correlated. Select the correct answer below.

amount of smoking and lung​ cancer, height and weight of​ people, price of a good and demand of the good

What factors determine its​ shape? Select all that apply.

mean, standard deviation, and sample size

Under what circumstances is it reasonable to ignore​ outliers?

When there is good reason to suspect that they represent errors in the data.

What is a hypothesis in​ statistics? Choose the correct answer below.

A hypothesis is a claim about a population parameter​ (such as a population​ proportion, p, or a population​ mean, muμ​) or some other characteristic of a population

What is meant by a hypothesis test in​ statistics? Choose the correct answer below.

A hypothesis test is a standard procedure for testing a claim about the value of a population parameter.

Briefly describe how a multiple bar graph can be used to show multiple data sets. Choose the correct answer below.

A multiple bar graph uses a set of bars for each data set.

Briefly describe how a multiple line chart can be used to show multiple data sets. Choose the correct answer below.

A multiple line chart uses a different line on the same chart for each data set.

What is a​ correlation? Give three examples of pairs of variables that are correlated.

A correlation exists between two variables when higher values of one variable consistently go with higher or lower values of another variable.

What do we mean when we say that a distribution is​ skewed? Briefly describe the basic difference between a distribution that is skewed to the right and a distribution that is skewed to the left.

A distribution is skewed if it has values that tend to be more spread out on one side than on the other. A distribution that is skewed to the right will have its values more spread out on the​ right, while a distribution that is skewed to the left will have its values more spread out on the left.

Distinguish between a distribution of sample means and a distribution of sample proportions.

A distribution of sample means results when the means of all possible samples of a given size are​ found, and a distribution of sample proportions results when the corresponding proportions are found

What is a frequency​ table? How does it show categories and​ frequencies? Choose the correct answer below.

A frequency table has two columns. The first column lists all of the categories of data. The second column lists the frequency of each​ category, which is the number of data values in the category.

What do we mean when we say that a result is statistically​ significant?

A result is statistically significant if it is unlikely to have occurred by chance.

When is the stack plot most​ useful?

A stack plot is best used with cumulative or​ relative-frequency data, including​ time-series data. It is most useful when the total of the data sets is important.

Briefly describe how a stack plot can be used to show multiple data sets. Choose the correct answer below.

A stack plot is similar to a multiple bar graph or multiple line​ chart, except that each subsequent bar or line is added to the prior​ one(s), rather than shown independently.

What is a standard​ score? How do you find the standard score for a particular data​ value?

A standard score is the number of standard deviations a data value lies above or below the mean.

Distinguish between a uniform distribution and a distribution with one or more modes. Choose the correct answer below.

All data values in a uniform distribution have the same​ frequency, whereas a distribution with one or more modes has one or more values that occur most frequently.

What is an exponential​ scale, and when is such a scale​ useful?

An exponential scale rises by​ powers, usually but not necessarily powers of 10. An exponential scale is useful when there is a huge range of data values.

What are​ outliers? Describe the effects of outliers on the​ mean, median, and mode.

An outlier in a data set is a value that is much higher or much lower than almost all other values. An outlier can change the mean of a data​ set, but does not affect the median or mode.

Which outlier would make it appear that there is no correlation when there is​ one?

An outlier located in a place opposite where the correlation would predict.

Which one of the following is an example of a relative frequency​ probability?

Based on statistical​ data, the chance of having the championship team coming from the Eastern Conference of a certain basketball league is about 1 in 10.

How is a best fit line​ useful?

It is useful to make predictions within the bounds of the data points.

Which one of the following is an example of a subjective​ probability?

My teacher assures me that he is certain that my SAT scores will be the highest for the entire country.

Define negative correlation. Choose the correct answer below.

Negative correlation means that two variables tend to change in opposite​ directions, with one increasing while the other decreases. An example might be age and vision.

Define no correlation. Choose the correct answer below.

No correlation means that there is no apparent relationship between the two variables. An example might be hair color and weight.

Distinguish between an outcome and an event in probability. Choose the correct answer below.

Outcomes are the most basic possible results of observations or experiments. An event consists of one or more outcomes that share a property of interest.

Define positive correlation. Choose the correct answer below.

Positive correlation means that both variables tend to increase​ (or decrease) together. An example might be shoe size and height.

A pollster randomly selects an adult for a survey. Let M denote the event of getting a​ male, and let R denote the event of getting a Republican. Are events M and R​ overlapping?

Since the pollster could select an adult who is male and​ Republican, the events are overlapping.

Let A denote the event of getting a female when you randomly select a fellow student in your statistics class. Let B denote the event of getting a female when you randomly select a fellow student in your psychology class. Are events A and B independent or​ dependent?

Since the student that is chosen from the statistics class does not affect the probability of choosing a female from the psychology​ class, the two events are independent.

Give an example in which the same event can occur through two or more outcomes. Choose the correct answer below.

Suppose you roll a​ fair, six-sided die. The possible outcomes are rolling the number​ 1, 2,​ 3, 4,​ 5, or 6. The event of rolling an even number will occur with the three outcomes​ 2, 4, and 6

When does the Central Limit Theorem​ apply? Select all appropriate conditions below.

The Central Limit Theorem applies for suitably large sample sizes. A common threshold is n>30 The Central Limit Theorem applies to variables with any distribution​ (not necessarily a normal​ distribution).

In testing a claim about a population​ mean, if the standard score for a sample mean is zequals=​0, then there is not sufficient sample evidence to support the alternative hypothesis

The statement makes sense. A standard score of 0 represents the peak of the sampling​ distribution, so it is a likely outcome if the null hypothesis is true.

What does the area under the normal distribution curve​ represent? What is the total area under the normal distribution​ curve?

The area that lies under the normal distribution curve corresponding to a range of values on the horizontal axis is the total relative frequency of those values. Because the total relative frequency for all values must be 1​ (100%), the total area under the normal distribution curve must equal 1​ (100%).

Test grades are affected by the amount of time and effort spent studying and preparing for the test.

The causal connection is valid. When students spend more time and effort studying for a​ test, their test grades tend to be higher.

Which one of the following is an example of a theoretical​ probability?

The probability of rolling a 3 on a single die is 1/6.

Is the correlation most likely due to​ coincidence, a common underlying​ cause, or a direct​ cause?

The correlation is most likely due to a direct cause. As students study​ more, they gain a better understanding of the subject and their test scores are likely to be higher.

How does this compare to the critical values for statistical significance for a population​ mean?

The critical values are the same as those used with population means.

What do we mean by critical values for significance in a hypothesis test for the population​ proportion?

The critical values are the standard scores required for statistical significance at a given level.

What is meant by cumulative​ frequency?

The cumulative frequency of a category is the number of data values in that category and all preceding categories.

What is the purpose of​ binning? Give an example in which binning is useful.

The purpose of binning is to analyze the frequency of quantitative data grouped into categories that cover a range of possible values. A useful example is grouping quiz scores with a maximum score of 40 points with​ 10-point bins. The first bin contains scores​ 0-9, the second bin contains scores​ 10-19, and so on.

Which of the following is true for the possible range of values for​ P(A)?

The range of possible values for​ P(A) is from 0 to 1​ (inclusive), with 0 meaning there is no chance that event A will occur and 1 meaning it is certain that event A will occur.

What is the law of large​ numbers? Choose the correct answer below.

The law of large numbers states that if a process is repeated through many​ trials, the proportion of the trials in which event A occurs will be close to the probability​ P(A).

How do we use this for making decisions about the hypothesis​ test?

The level of statistical significance is determined by comparing the standard score to the critical values.

Choose the correct description of the mean below.

The mean is the sum of all the values divided by the number of values. It can be strongly affected by outliers.

c. Is the mean of the sampling distribution from part​ (b) equal to the mean of the population of the three listed​ values? Are those means always​ equal?

The mean of the sampling distribution from part​ (b) is equal to the mean of the population of the three listed values. Those means are always equal.

Choose the correct description of the median below.

The median is the middle value in a data set. It is not affected by outliers.

Choose the correct description of the mode below.

The mode is the most common value in a data set. It is not affected by outliers.

Suppose you have measured the mean in a sample drawn from a much larger population. What value should you use as your estimate of the population​ mean?

The sample mean

Suppose you conducted an opinion poll and measured the proportion of your sample that held a particular view. What value should you use as your estimate of the population​ proportion?

The sample proportion

I selected three different samples of size n=1010 drawn from 1600 students at my​ school, and with these I constructed the sampling distribution

The statement does not make sense. A sampling distribution is a distribution of all possible samples of a particular​ size, which is far more than three.

I used a​ best-fit line for data showing the ages and hand size of thousands of boys of various ages to predict the mean hand sizehand size of 8​-year-old boys

The statement makes sense. Assuming the data were collected in a reasonable way and all ages were​ sampled, a scatterplot for thousands of boysboys should produce a​ best-fit line that makes reasonable predictions of mean hand sizehand sizes at different ages

In a​ two-way table, all of the observed frequencies are very close to the expected​ frequencies, so the

The statement makes sense. If the observed frequencies are all very close to the expected​ frequencies, each value of O−E is​ small, and the χ2 statistic is very small.

What is the t​ distribution?

The t distribution is a distribution that is quite similar to the normal distribution. It is a symmetric​ bell-shaped distribution with a single peak.

Once you have constructed the​ 95% confidence interval around your sample​ mean, how do you interpret its possible relationship to the population​ mean?

There is​ 95% confidence that the confidence interval limits actually contain the true value of the population mean.

The distribution of grades was​ left-skewed, but the​ mean, median, and mode were all the same.

This does not make sense because the mean and median should lie somewhere to the left of the mode if the distribution is​ left-skewed.

Explain how to make a table of a probability distribution. Choose the correct answer below.

To make a table of a probability​ distribution, list all possible​ outcomes, identify the outcomes that represent the same​ event, and then find the probability of each event.

Once you have constructed the​ 95% confidence interval around your sample​ proportion, what does this tell you about the estimated value of the population​ proportion?

We have​ 95% confidence that the confidence interval limits actually contain the true value of the population proportion.

Should we always expect to get the expected​ value? Why or why​ not?

We should not always expect to get the expected value because expected value is calculated with the assumption that the law of large numbers will come into play.

c. Why is the standard deviation in part a different from the standard deviation in part​ b? Choose the correct answer below.

With larger sample sizes​ (as in part​ b), the means tend to be closer​ together, so they have less​ variation, which results in a smaller standard deviation.

​P(A) means which of the​ following?

​P(A) means the probability that event A will occur.


Related study sets

Pharmacology of the Autonomic Nervous System

View Set

Chapter 23: Management of Patients With Chest and Lower Respiratory Tract Disorders

View Set

A Raisin in the Sun - Acts I-III- Quote Identification

View Set

(2.8) Strength of Acids and Bases (pp. 664-669)

View Set