Statistics Test 1

Ace your homework & exams now with Quizwiz!

frequency

The number of observations (counts) in each category.

college major

categorical

pareto

has categories listed in order of frequency from highest to lowest

If a statistic is resistant

extreme values relative to the data do not affect its value substantially Are any of the measures of dispersion among the​ range, the​ variance, and the standard​ deviation, resistant? ​No, all of these measures of dispersion are affected by extreme values. The range is more affected by an​ outlier, and the standard deviation uses all the data.

A discrete variable

has possible values that are separate numbers. the possible values of a quantitative variable form a set of separate numbers 0, 1,2,3...(counting). number of socks Students were asked to estimate the number of times a week they read a daily newspaper. Discrete because the value for each person would be a whole number.

A continuous variable

has possible values that form an interval. the possible values of a quantitative variable form an interval (measuring). blood sugar level

A variable is called categorical

if each observation belongs to one of a set of categories. each observation belongs to one of a set of categories.

A variable is called quantitative

if observations on it take numerical values that represent different magnitudes of the variable. observations take on numerical values that represents different magnitudes of the variable. Mathematical operations such as addition or division on these numerical values make sense.

Parameter

is a numerical summary of a population. A numerical summary of the population. Only 12 men have walked on the moon. The average age of these men were 39 years, 11 months, 15 days.

saying "data vary" means

that the values of the variable change from individual to individual. Certain variables can change over time for certain individuals. Because data vary, two different statistical analysis of the same variable can lead to different results.

variance

the average of the squared deviations is called the variance.

number of coins in a jar

the variable is discrete because it is countable

The values of a discrete quantitative variable are whole numbers

true

Inferential Statistics

uses methods that generalize results obtained from a sample to the population and measure the reliability of the results. Making decisions and predictions based on the data for answering the statistical questions. Usually the decision or prediction refers to a larger group of individuals, not merely those in the study.

An athlete who is three standard deviations above the mean would weigh 181 pounds.

​Yes, this would be an unusual observation because typically all or nearly all observations fall within three standard deviations from the mean.

frequency table

A listing of possible values of a variable, together with the number observations (frequency) for each value

What is an random variable?

A random variable is a numerical measure of the outcome of a probability experiment.

Distribution

Is the population distribution of the duration of your phone calls likely to be bell​ shaped, right-, or​ left-skewed? Since there is a minimum but no maximum value, the distribution is skewed to the right. You are on a shared wireless plan with your​ parents, who are statisticians. They look at some of your recent monthly statements that list each call and its duration and randomly sample 45 calls from the thousands listed there. They construct a histogram of the duration to look at the data distribution. Is this distribution likely to be bell​ shaped, right-, or​ left-skewed? Since there is a minimum but no maximum value, the distribution is skewed to the right. From the sample of n=45 ​calls, your parents compute the mean duration. Is the sampling distribution of the sample mean likely to be bell​ shaped, right-, or​ left-skewed, or is it impossible to​ tell? Since the sample size is large, the distribution is approximately normal. The distribution is approximately normal. The number of people in a household. The variable X is​ quantitative, because each observation is a numerical value that represents a magnitude of the variable. It is probably skewed right because the company probably has many​ lower-level employees with lower incomes and a few​ upper-level employees with very high incomes. The data distribution is probably skewed right because the population distribution is probably skewed right. It is approximately normal because all sampling distributions with sufficiently large sample sizes are approximately normal. It would not be unusual to observe an individual earning more than​ $100,000 because this is well within three population standard deviations of the mean. It would be highly unusual to observe a sample mean income above​ $100,000 for a random sample size of 100 people because this is well beyond three sampling distribution standard deviations from the mean. Use technology to create a sampling distribution for the sample mean using sample sizes n=2. Take N=9000 repeated samples of size 2​, and observe the histogram of the sample means. What shape does this sampling distribution​ have? The sampling distribution is triangular. Now take N=9000 repeated samples of size 8. Explain how the variability and the shape of the sampling distribution changes as n increases from 2 to 8? The sampling distribution is more​ normal, and the variability is smaller. Now take N=9000 repeated samples of size 25.Explain how the variability and the shape of the sampling distribution changes as n increases from2 to 25? The sampling distribution is more​ normal, and the variability is much smaller. Compare the results from parts a through c to the displayed example curves? The distributions from parts a through c roughly match the displayed curves. Explain how the central limit theorem describes what has been observed in this problem? The sampling distribution of the mean became more and more normal as the sample size increased from 2 to 8 to 25​, which the central limit theorem says should happen. Suppose a simple random sample of size n is drawn from a large population with mean μ and standard deviation σ. The sampling distribution of x has mean μx=​______ and standard deviation σx=​______? μx=μ and standard deviation σx= σ/square root of n The standard deviation of the sampling distribution of x, denoted σx​, is called the standard error of the mean. The distribution of the sample​ mean, x​, will be normally distributed if the sample is obtained from a population that is normally​ distributed, regardless of the sample size. True To cut the standard error of the mean in​ half, the sample size must be doubled? False. The sample size must be increased by a factor of four to cut the standard error in half. Does the population need to be normally distributed for the sampling distribution of x to be approximately normally​ distributed? Why? No because the Central Limit Theorem states that regardless of the shape of the underlying​ population, the sampling distribution of x becomes approximately normal as the sample​ size, n, increases. What must be true regarding the distribution of the​ population? The population must be normally distributed. What effect does increasing the sample size have on the​ probability? If the population mean is less than 71 ​minutes, then the probability that the sample mean of the time between eruptions is greater than 71 minutes decreases because the variability in the sample mean decreases as the sample size increases. The population mean may be greater than 60. The population mean is 60​, and this is just a rare sampling. To compute probabilities regarding the sample mean using the normal​ model, what size sample would be​ required? The sample size needs to be greater than 30.

Independent Events of P(A and B)=P(A)xP(B)

At the local cell phone​ store, the probability that a customer who walks in will purchase a new cell phone is0.07. The probability that the customer will purchase a new cell phone protective case is 0.35. Is this information sufficient to determine the probability that a customer will purchase a new cell phone and a new cell phone protective​ case? If​ so, find the probability. If​ not, explain why not.​ The information is not sufficient. The events might not be independent. In a​ lottery, 6 numbers are randomly sampled without replacement from the integers 1 to 48. Their order of selection is not important. Find the probability of holding a ticket that has zero winning numbers out of the 6 numbers selected for the winning ticket out of the48 possible numbers. P(have zero of the 6 winning numbers)= 48 possible outcomes with 42 outcomes for not winning so P(42/48) For the 2nd number, there are 47 possible outcomes with 41 outcomes that are not winning numbers so P(41/47). Basically,42/48 x 41/47 x 40/46 x 39/45 x 38/44 x 37/43=answer If E and F are disjoint​ events, then P(E or F)=P(E)+P(F). If E and F are not disjoint​ events, then​ P(E or​F)=P(E)+P(F)-P(E and F).

This question

Find the probability that an observation is at least 1 standard deviation above the mean? 0.159 Find the probability that an observation is at least 1 standard deviation below the mean? 0.159 Find the probability that an observation is within 1 standard deviation of the mean? 0.683

interpreting quartiles

Report the median. =30.5 days By finding the median of the four values below the​ median, report the first quartile. =25.5 days Find the third quartile. =36.5 days 25​% of the countries have residents who take fewer than 25.5 vacation​ days, half of the countries have residents who take fewer than 30.5 vacation​ days, and​ 75% of the countries have residents who take fewer than 36.5 vacation days per year. The middle 50​% of the countries have residents who take an average of between 25.5 and 36.5 vacation days annually.

The following data are the reported CO2 emissions from fossil fuel combustion for 7 countries. The values are reported as million metric tons of carbon equivalent.

The data report total emissions for each country. Describe a situation in which it may be more useful to analyze per capita data. Choose the correct answer below? A researcher is studying consumer emissions.

median and mean indicates

The median household income $66,100 and the mean $62,100. The fact that the mean is less than median indicates that there are extremely low incomes that are affecting the mean, but not the median, suggesting that the shape is skewed to the left. According to a​ study, the median household income was $53,519 for one​ group, and $33,079 for another. The mean for each group was $69,497 and $42,905, respectively? The fact that the mean is larger than the median for each group indicates that there are extremely large incomes that are affecting the​ mean, suggesting that the shape is skewed to the right. The workers and the management of a company are having a labor dispute. Explain why the workers might use the median income of all the employees to justify a raise but management might use the mean income to argue that a raise is not needed? Management would want to use the mean because the mean would be higher due to the outliers. The workers would prefer the median because it is not affected by the outliers and would be smaller.

population

The total set of subjects in which we are interested.

distance to work

quantitative

the most plausible value for standard deviation

the median selling price of new homes was $124,200. $40,000 because the negative value is impossible, $8,000 is too small, and $1,000,000 is implausibly large. For an exam, students range was 64 with a mean of 70. The most realistic value for standard deviation is 12 because the negative value is impossible, 0 would indicate no variability, 1 is too small, and 53 is too large for a typical deviation.

CI

A study dealing with health care issues plans to take a sample survey of 1500 Americans to estimate the proportion who have health insurance and the mean dollar amount that Americans spent on health care this past year. Identify two population parameters that this study will estimate? The population proportion who have health insurance. The population mean dollar amount spent on health care this past year. Identify two statistics that can be used to estimate these parameters? The sample mean. The sample proportion. Mention two population parameters that this survey is trying to estimate? The proportion of adults in the country who primarily watch content​ time-shifted. The mean weekly time adults in the country spend watching content over the Internet. Mention two corresponding statistics that will estimate these population parameters with the help of the survey? The proportion of adults in the sample who primarily watch content​ time-shifted. The mean weekly time that adults in the sample spend watching content over the Internet. The sample mean number of hours watched online is an unbiased estimator for what​ parameter? It is an unbiased estimator for the mean weekly time adults in the country spend watching content over the Internet. An unbiased estimator is centered at the parameter it tries to estimate. Use this example to explain why a point estimate alone is usually insufficient for statistical inference? An interval estimate gives us a sense of the accuracy of the point estimate whereas a point estimate alone does not. Interpret the confidence interval in context? This is the interval containing the most believable values for the mean number of days that people have felt lonely in the last 7 days. Compared to the interval for​ females, is there much evidence of a difference between the​ means? There is not much evidence of a difference between the means because the sample means and standard deviations are very​ similar, and the two intervals overlap significantly. What assumptions are needed to construct a​ 95% confidence interval for μ​? The data are obtained by​ randomization, and the population distribution is approximately normal. Point out any assumptions that seem questionable? The data do not appear to be chosen from an approximately normal population distribution because the boxplot is skewed to the right. Name two things you could do to get a narrower interval than the one in part a? Decrease the confidence level or increase the sample size. Why is the​ 99% confidence interval wider than the​ 95% interval? The​ t-distribution critical value is larger with a higher confidence level. On what assumptions is the interval in part a​ based? The data are obtained by randomization and the population distribution is approximately normal. What must we assume to use these data to find a 95​% confidence interval for the population mean cell phone​ price? The data are obtained by randomization and the population distribution is approximately normal. The table shows the way software reports results. How was the standard error of the mean​ (SE Mean)​ obtained? Divide the standard deviation by the square root of the sample size. Use the MINITAB report to explain how to interpret the 95​% confidence interval in context? There is​ 95% confidence that the population mean cell phone price is between $614.293 and $647.263. is there evidence that the mean price is higher when purchased​ new? Yes, because all plausible values for the price of new phones are higher than the plausible values for the mean price of used phones. Does this change your answer to part​ d? No, because all plausible values for the price of new phones are still higher than the plausible values for the mean price of used phones. Is this a concern for the validity of the confidence​ interval? No, because the sample size is very large so the central limit theorem applies. Interpret the confidence interval from the previous step? There is 90​% confidence that the population mean for the number of hours per week people spend sending and answering​ e-mail is between these two values. Explain why the population distribution may be skewed right? Since there will be many women that are at least 80 years of age who do not use​ e-mail at all but some who use​ e-mail frequently, the distribution is likely to be skewed right. If the population distribution is skewed​ right, is the interval you obtained in b​ useless, or is it still​ valid? Valid What effect does sample size have on the margin of​ error? As sample size​ increases, the margin of error becomes smaller. Does it seem plausible that the population distribution of this variable is​ normal? No, because there will be many students who do not read a newspaper but some students who read at least one newspaper every day. The distribution is likely to be skewed right. Explain the implications of the term​ "robust" regarding the normality assumption made to conduct this analysis? The term​ "robust" means that even if the normality assumption is not completely​ met, this analysis is still likely to produce valid results. A point estimate is the value of a statistic that estimates the value of a parameter. The level of confidence represents the expected proportion of intervals that will contain the parameter if a large number of different samples of size n is obtained. It is denoted (1-alpha)x100%. How does increasing the sample size affect the margin of​ error, E? As the sample size increases​, the margin of error decreases. How does increasing the level of confidence affect the size of the margin of​ error, E? As the level of confidence increases​, the size of the interval increases. Could we have computed the confidence intervals in parts​ (a)-(c) if the population had not been normally​ distributed? ​No, the population needs to be normally distributed. If the sample size is 15​, what conditions must be satisfied to compute the confidence​ interval? The sample data must come from a population that is normally distributed with no outliers. Provide two recommendations for decreasing the margin of error of the interval? Increase the sample size. Decrease the confidence level. With 95​% confidence, the limits of the confidence interval contain the proportion of healthy people aged​ 18-49 who are vaccinated with the vaccine but still develop the illness. What do the numbers in this interval​ represent? The numbers represent the most believable values for the population proportion. The data must be obtained​ randomly, and the expected numbers of successes and failures must both be at least 15. The data must be obtained randomly, the number of successes must be at least 15, and the number of failures must be at least 15. With 95​% confidence, the interval .598 to .643 contains the population proportion of adults in the country who were in favor of the death penalty. Explain what the "95​% confidence" refers​ to, by describing the​ long-run interpretation? If the same method is used to estimate the same population proportion many​ times, then about95​% of the intervals would contain the population proportion. Is it safe to conclude that more than half of all adults in the country were in​ favor? Yes, since the confidence interval lies completely above 0.5. The​ "Sample p" is the proportion of all respondents in the sample who believe stem cell research has​ merit,1523/2109≈0.72. The​ "95% CI" is the​ 95% confidence interval and it means that we can be 95% confident that the interval .7030 to .7413 contains the population proportion. The​ 99% confidence interval would be wider than a​ 95% confidence interval. Treating the sample as a random sample from the population of all​ voters, would you predict the​ winner? The winner can not be predicted because 0.50 does not fall outside of the​ 95% confidence interval. Base your decision on a​ 99% confidence interval? The winner can not be predicted because 0.50 does not fall outside of the​ 99% confidence interval. Explain why you need stronger evidence to make the prediction when you want greater confidence? The more confident you want to be about the​ results, the wider the confidence interval will be. A smaller sample results in a greater standard​ error, which results in a greater margin of error for the same proportions and confidence​ level, meaning less information is provided. Is the interpretation​ reasonable? The interpretation is flawed. The interpretation provides no interval about the population proportion. The interpretation is flawed. The interpretation indicates that the level of confidence is varying. The interpretation is reasonable. The interpretation is flawed. The interpretation suggests that this interval sets the standard for all the other​ intervals, which is not true. There is 85​% confidence that the proportion of the adult citizens of the nation that dreaded​ Valentine's Day is between 0.103 and 0.317. Determine the population of interest? The population is all adults 19 years of age or older. The variable of interest is bringing one's cell phone every trip to the bathroom. This variable is qualitative with two outcomes because individuals are classified based on a characteristic. Why is the point estimate found in part​ (c) a​ statistic? Its value is based on a sample. Why is the point estimate found in part​ (c) a random​ variable? Its value may change depending on the individuals in the survey. What is the source of variability in the random​ variable? The individuals selected to be in the study. We are 95% confident the proportion of adults 19 years of age or older who bring their cell phone every trip to the bathroom is between .206 and .258. What ensures that the results of this study are representative of all adults 19 years of age or​ older? Random sampling. The results are close because 0.54(1−0.54)=0.2484 is very close to 0.25. What does it mean to say the race was too close to​ call? The margin of error suggests candidate A may receive between 44​% and 50​% of the popular vote and candidate B may receive between 43​% and 49​% of the popular vote. Because the poll estimates overlap when accounting for margin of​ error, the poll cannot predict the winner. What does "98​% confidence" mean in a 98​% confidence​ interval? If 100 different confidence intervals are​ constructed, each based on a different sample of size n from the same​ population, then we expect 98 of the intervals to include the parameter and 2 to not include the parameter.

predicting

Constructing a mathematical model, estimating (Constructing a confidence Interval), Testing.

What are the two requirements for a discrete probability​ distribution?

EP(x)=1 and 0 < equal than P(x) < equal than 1. Each probability must be between 0 and 1, inclusive, and the sum of the probabilities must equal 1.

random sampling

Each subject in the population has the same chance of being included in the sample. Random sampling allows us to make powerful inferences about population.

If the values of a quantitative variable are whole numbers, then that variable is always a discrete variable.

False

IQR

IQR= Q3-Q1 is resistant checking for outliers is called 1.5 IQR. Find the IQR. Then Q1-1.5(IQR). Then Q3+1.5(IQR). If the data value is less than or greater than, it is considered an outlier. The IQR is not affected by an​ outlier, while the standard deviation is affected by an outlier. The IQR summarizes the range for the middle half of the data. The middle​ 50% of values stretch over a range of this value.

Effect on the shape

If a distribution is highly skewed, the median is usually preferred over the mean because it better represents what is typical. If the distribution is close to symmetric or only mildly skewed, or if it is discrete with few distinct values, the mean is usually preferred because it uses the numerical values of all the observation. Resistant is a numerical summary of the observations is called resistant if extreme values have little, if any, influence on its value. Mean is not resistant. Median is resistant.

probability

In the short​ run, the proportion of a given outcome can fluctuate a lot. A long run of observations is needed to accurately calculate the probability of flipping heads. Flip the coin many times to obtain a long run of observations. No. In the short​ run, the proportion of a given outcome can fluctuate a lot. As more people are​ sampled, the proportion should approach the real probability. Consider a random number generator designed for equally likely outcomes. If numbers between 0 and49 are​ chosen, determine which of the following is not correct? b is incorrect because in the short​ run, probabilities of each integer being generated can fluctuate a lot. A pollster agency wants to estimate the proportion of citizens of the European Union who support​ same-sex unions. She claims that if the sample size is large​ enough, she does not need to worry about the method of selecting the sample. Is the​ pollster's statement​ correct? The statement is not correct. Samples should be chosen randomly to ensure that each individual in the population has about the same probability of being chosen. Before the first attempt to land on the moon​, the astronaut was asked to assess the probability that he would be successful. Did he need to rely on the relative frequency definition or the subjective definition of​ probability? The subjective definition because the astronaut would have to use his own judgement rather than objective information such as data. Is there intelligent life on other planets in the​ universe? If you are asked to state the probability that there​ is, would you need to rely on the relative frequency or the subjective definition of​ probability? You would need to rely on the subjective definition because you would be relying on your own judgment. Two friends decide to go to the track and place some bets. One friend remarks that in an upcoming​ race, the number 3 horse is paying 60 to 1. This means that anyone who bets on the 3 horse receives ​$60 for each​ $1 bet, if in fact the 3 horse wins the race. He goes on to mention that it is a great​ bet, because there are only eight horses running in the​ race, and therefore the probability of horse 3 winning must be 1/8. Is the last statement true or​ false? The statement is false because there is no reason to think that all horses have an equally likely chance of winning. What is the probability of an event that is​ impossible? 0 Suppose that a probability is approximated to be zero based on empirical results. Does this mean that the event is​ impossible? No What does it mean for an event to be​ unusual? Why should the cutoff for identifying unusual events not always be​ 0.05? An event is unusual if it has a low probability of occurring. The choice of a cutoff should consider the context of the problem. In a probability​ model, the sum of the probabilities of all outcomes must equal 1? True Probability is a measure of the likelihood of a random phenomenon or chance behavior? True In​ probability, a(n) experiment is any process that can be repeated in which the results are uncertain.​ A(n) event is any collection of outcomes from a probability experiment. If a person rolls a six-sided die and then draws a playing card and checks its color​, describe the sample space of possible outcomes using 1, 2, 3, 4, 5, 6 for the die outcomes and B, R for the card outcomes? The sample space is S=​{1B,2B,3B,4B,5B,6B,1R,2R,3R,4R,5R,6R​}

Design

Planning how to obtain data to answer the questions of interest.

symmetric

Scores on a standardized test. The mean should be used because the distribution is symmetric due to there being about the same amount of large and small scores.

Statistics

The art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data. Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. Statistics is about providing a measure of confidence in any conclusions.

A distribution is skewed to the right

The number of speeding tickets in the past year for workers at a certain company. The histogram would be skewed to the right because the majority of the number of tickets would be lower values, with some higher values. The number of music CDs owned for each student in your school. The histogram to be skewed to the right because the majority of the number of CDs would be lower values, with some higher values. because the right tail is longer. The mean is 11.4, standard deviation is 13.1, median is 8, and mode is 8. The distribution is probably skewed to the right because the mean is larger than the median, and the lowest possible value is less than 1 standard deviation below the mean. The empirical rule does not apply because the data is not bell shaped. If the smallest observation is less than 1 standard deviation below the mean, then the distribution tends to be skewed to the right. The fact that the mean is larger than the median for each group indicates that there are extremely large incomes that are affecting the mean, suggesting that the shape is skewed to the right. Salary of employees of a university. The median should be used because the distribution is right skewed due to there being relatively few large salaries. Length of time needed to complete an easy exam. The median should be used because the distribution is right skewed due to there being relatively few large time lengths. Annual incomes for the general population. The median should be used because the distribution is right skewed due to there being relatively few large annual incomes. Assessed value of houses in a large city histogram would be skewed to the right because the majority of the assessed values would be lower values, with some higher. A survey​ asked, "On the average​ day, about how many hours do you personally watch​ television?" Of 1987​ responses, the mode was 2​, the median was 2​, the mean was 2.94​, and the standard deviation was 2.44? This distribution is probably skewed to the right because the mean is larger than the​ median, and the standard deviation is almost as large as the mean. If the smallest observation is less than one standard deviation below the​ mean, then the distribution tends to be skewed to the right.

conditional probability

The probability of being a baseball fan ​(BF​) was 0.73 for males​ (M). Express this as a conditional probability. P(BF I M)=0.73 The probability of being a baseball fan ​(BF​) was 0.49 for females ​(M^C​). Express this as a conditional probability. P(BF I M^C)=0.49 Given is event B P(A and B)/P(B) P(carpooled to work or drove alone to work​)= P(A)+P(B) P(carpooled I carpooled or dove alone to work)= P(carpooled)/P(both) Estimate the probability that the player made the second free​ throw, given that he made the first one. P(2nd made I given 1st made)= Both/P(1st made)Does it seem as if his success on the second shot depends strongly on whether he made the​ first? No because the answer to P(2nd made I given 1st made) does not equal to P(2nd made) Form a contingency table that cross classifies whether a vehicle entering the city contains radioactive material and whether the device detects radiation. Identify the cell that corresponds to the false alarms the police department fears? a b c d The cell that contains b corresponds to false alarms. Sketch a Venn diagram for which each event has similar​ (not the​ same) probability but the probability of a false alarm equals 0? b is in A. Since A contains B, P(A|B)=1. Since B is a subset of A, P(B|A)<1. The poll reported that 69​% of the actively disengaged group claimed to be​ thriving, compared to 43​% of the unemployed group. Are these percentages (probabilities) ordinary or​ conditional?​ Conditional, because the statement gives the probability of one event​ occurring, given that another event has occurred. Express the statement "69​%" of the actively disengaged group claimed to be​ thriving" as a probability?P(respondent claimed to be​ thriving|respondent is actively disengaged)=0.

Proportion and Percentage (Relative Frequency)

The proportion of the observations that fall in a certain category is the frequency ( count) of observations in that category divided by the total number of observations. The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies and serve as a way to summarize the measurements in categories of a categorical variable.

18% of governors of all 50 areas were female

parameter

In a study of 4806 professors, 40% own a tv

parameter

The normal curve is symmetric about its​ mean,μ.

The statement is true. The normal curve is a symmetric distribution with one​ peak, which means the​ mean, median, and mode are all equal.​ Therefore, the normal curve is symmetric about the​ mean, μ. The area under the normal curve to the right of μ equals one half (1/2). The histogram is not bell-shaped, so a normal distribution could not be used as a model for the variable. What happens to the graph of the normal curve as the mean​ increases? The graph of the normal curve slides right. What happens to the graph of the normal curve as the standard deviation​ decreases? The graph of the normal curve compresses and becomes steeper. The notation zα is the​ z-score that the area under the standard normal curve to the right of zα is α.

sample

The subset of the population for whom we have ( or plan to have) data, often randomly selected.

contacts 1247 women who are 30 to 70 and live in the US and asks whether they had a mammogram

population-women 30 to 70 and live in the US sample-1247 women 30 to 70 and live in the US

Variables

are the characteristics of the individuals of the population being studied. any characteristic of individuals observed in a study.

Descriptive Statistics

consists of organizing and summarizing information collected. Summarizing and analyzing data that are obtained. 80% of patients in group 1 and 60% of patients in group 2 with resolution people with the highest levels of the protein were 2.25x more likely to die than those with the lowest levels

A distribution is skewed to the left

if the left tail is longer than the right. That is, there are a few observations significantly smaller than the others. Time needed to complete an exam (max 1 hour) histogram would be skewed to the left because the majority of the times would be higher values, with some lower values. If the largest observation is less than 1 standard deviation above the mean, then the distribution tends to be skewed left. Time spent on a difficult exam. The median should be used because the distribution is left skewed due to there being relatively few small times. Life span of the general population. The median should be used because the distribution is left skewed due to there being relatively few small life spans.

The empirical rule

is when the data makes a bell shaped curve.

binomial experiment

A random sample of 80 middle school students is​ obtained, and the individuals selected are asked to state their weights?​ No, this probability experiment does not represent a binomial experiment because the variable is​ continuous, and there are not two mutually exclusive outcomes. An experimental drug is administered to 170 randomly selected​ individuals, with the number of individuals responding favorably recorded? Yes, because the experiment satisfies all the criteria for a binomial experiment. https://stattrek.com/online-calculator/binomial.aspx 1. is performed a fixed number of times 2. the trials are independent 3. for each trial, there are 2 mutually exclusive (disjoint) outcomes, success, or failure 4. the probability of success is the same for each trial I, IV, and VI The n trials are independent. Each trial has the same probability of a success. Each trial has two possible outcomes. The data are binary. There is the same probability of success for each trial​ (bid). There are a fixed number of bids. The trials are independent. You are bidding on four items available on an online shopping site. You think that you will win the first bid with a probability of​ 25% and the second through fourth bids with probability​ 30%. Let x denote the number of winning bids of the four items you bid on?​ No, because each trial does not have the same probability of success. You are bidding on four items available on an online shopping site. Each bid is for​ $70, and you think there is a​ 25% chance of winning a​ bid, with bids being independent events. Let x be the total amount of money you pay for your winning bids?​ No, because x does not count the number of successes. Each trial has 2 potential outcomes, but a team's previous performance affects their probability to win, and there is a lower probability of winning against the best team than against the worst team. The trials are independent and binary, but each trial may not have the same probability of success, as different size monitors could have a different rate of defect. The trials do not have the same probability and are not independent, because after each trial, the probability of selecting a female changes for the next trial. It is unlikely that each trial has the same probability of success. Voters have their own preferences, so the probability of voting for the Democratic candidate varies among the voters. It is unlikely that the trials are independent of each other, because if one family member goes to church, then the rest will go as well. They are satisfied because​ 1) the data are binary​ (Hispanic or​ not), 2) the probability of success is always0.44 and​ 3) the trials are independent​ (the first selection does not affect the​ next; n<​10% of population​ size).​ Yes, because the chance that this would occur if the selection were done randomly is very low. An investor randomly purchases 14 stocks listed on a stock exchange.​ Historically, the probability that a stock listed on this exchange will increase in value over the course of a year is 45​%. The number of stocks that increase in value is recorded.​ Yes, because the experiment satisfies all the criteria for a binomial experiment.​ Yes, because the probability of 13 or more adults believing the overall state of moral values is poor is very low.

Not independent events

A survey asks subjects whether they believe that global warming is happening (yes or no​) and how much fuel they plan to use annually for automobile driving in the​ future, compared to their past use (less, same, more​)? The two events are not independent. This means the probability of responding "yes​" on global warming and ​"same​" on future fuel use is less than it would be if the two choices were not related. According to a center for disease​ control, the probability that a randomly selected person has hearing problems is 0.148. The probability that a randomly selected person has vision problems is 0.083. Can we compute the probability of randomly selecting a person who has hearing problems or vision problems by adding these​ probabilities? ​No, because hearing and vision problems are not mutually exclusive.​ So, some people have both hearing and vision problems. These people would be included twice in the probability.

probability

the probability of an event that is​ impossible=0 Suppose that a probability is approximated to be zero based on empirical results. Does this mean that the event is​ impossible? NO An event is unusual if it has a low probability of occurring. The choice of a cutoff should consider the context of the problem. In a probability​ model, the sum of the probabilities of all outcomes must equal 1. Probability is a measure of the likelihood of a random phenomenon or chance behavior. In​ probability, a(n) experiment is any process that can be repeated in which the results are uncertain. A(n) event is any collection of outcomes from a probability experiment. A probability model has the probabilities sum to 1 and they are all greater than or equal to 0 and less than or equal to 1 Unusual if P<0.05 Outcome- a result of a random experiment. Event- An event consists of one or more outcomes. Sample space- This is a set consisting of all possible outcomes of a random experiment Notation- We use uppercase letters, such as A,B, C...to denote events. Complement of an event- The complement of an event A, read A-complement , and it consists of all outcomes in the sample space that are not in A. Mutually Exclusive (Disjoint) Events- Two events, A and B that don't have any common outcome.

bias

Because of the way the study was designed, certain outcomes will occur more often in the sample than they do in the population.

The last fatal accident involving the airline occurred more than 2 years ago. Mary comments that she is currently afraid of flying because the airlines are​ "due for an​ accident." Comment on​ Mary's reasoning. A pollster agency wants to estimate the proportion of citizens of the European Union who support​ same-sex unions. She claims that if the sample size is large​ enough, she does not need to worry about the method of selecting the sample. Is the​ pollster's statement​ correct? Explain. Before the first attempt to land on the moon​, the astronaut was asked to assess the probability that he would be successful. Did he need to rely on the relative frequency definition or the subjective definition of​ probability? Is there intelligent life on other planets in the​ universe? If you are asked to state the probability that there​ is, would you need to rely on the relative frequency or the subjective definition of​ probability? Explain. there are only eight horses running in the​ race, and therefore the probability of horse 3 winning must be 1/8. Is the last statement true or​ false? Explain.

The recent absence of fatal accidents does not increase the likelihood of a fatal accident to occur. Mary is mistaken in her understanding of probability. The statement is not correct. Samples should be chosen randomly to ensure that each individual in the population has about the same probability of being chosen. The subjective definition because the astronaut would have to use his own judgement rather than objective information such as data. You would need to rely on the subjective definition because you would be relying on your own judgment. The statement is false because there is no reason to think that all horses have an equally likely chance of winning.

Statistic

is a numerical summary of a sample. Numerical summary of the sample taken from the population. A study of 6076 adults in public restrooms found that 23% did not wash their hands before exiting.

The normal curve is symmetric about its​ mean, μ.

The statement is true. The normal curve is a symmetric distribution with one​ peak, which means the​ mean, median, and mode are all equal.​ Therefore, the normal curve is symmetric about the​ mean, μ. The area under the normal curve to the right of μ equals? 1/2 The histogram is not bell-shaped, so a normal distribution could not be used as a model for the variable. What happens to the graph of the normal curve as the mean​ increases? The graph of the normal curve slides right. What happens to the graph of the normal curve as the standard deviation​ decreases? The graph of the normal curve compresses and becomes steeper. The notation zα is the​ z-score that the area under the standard normal curve to the right of zα is? α Describe the​ shape, mean, and standard deviation of the sampling distribution of the​ player's batting average after a season of 600 ​at-bats. Describe the shape of the sampling distribution? The distribution is​ bell-shaped, centered on a mean of 0.303​, and the majority of the distribution lies within three standard deviations of the mean. Explain why a batting average of 0.284 or 0.322 would not be especially unusual for this​ player's year-end batting average? Year-end batting averages of 0.322 and 0.284 lie one standard deviation from the mean.​ Therefore, it is not unlikely that a player with a career batting average of 0.303 would have a​ year-end batting average of 0.322 or 0.284. Explain how the results in ​(b​) indicate that the sample proportion is closer to the population proportion when the sample size is larger? When n is​ larger, the standard deviation is​ smaller, so the interval is smaller.

z-score

The​ z-score is 3.38. This indicates that the observation of 14.2 is a distance of 3.38 standard deviation(s) above the mean. The​ z-score is -5.31. This indicates that the observation of 2.9 is a distance of 5.31 standard deviation(s) below the mean. Is this observation a potential outlier according to the three standard deviation distance​ criterion? Yes​, because it is greater than three standard deviations from the mean. z-score represents the distance that a data value is from the mean in terms of the number of standard deviations. The z-score allows us to compare two values that belong to different normal distributions, or are stated in different units of measurements with each other. If a data value is above the mean, its z-score is positive. If a data value is below the mean, its z-score is negative. The baby born in week 41 weighs relatively less since its​z-score, -0.83​, is smaller than the​ z-score of −0.5 for the baby born in week 33.

a bar chart

is easier because sketching the exact percentages is more challenging in a pie chart. A graph allows the viewer to more easily judge the relative sizes of the percentages in each category. It would be easier to identify the mode using a bar graph. The mode is found by identifying the highest bar.

Sampling distribution

What does this sampling distribution​ represent? It represents the probability distribution of the sample proportion of the number of​ full-time students in a random sample of 325 students. Choose the correct description of the mean of the sampling distribution? The expected value for the mean of a sample of size 36. Choose the correct description of the standard deviation? The variability of the mean for samples of size 36. The sample proportion, denoted p​, is given by the formula p=x/n​, where x is the number of individuals with a specified characteristic in a sample of n individuals. The population proportion and sample proportion always have the same value? False The mean of the sampling distribution of p is p? True Suppose the random sample of 100 people is​ asked, "Are you satisfied with the way things are going in your​ life?" Is the response to this question qualitative or​ quantitative? The response is qualitative because the responses can be classified based on the characteristic of being satisfied or not. Explain why the sample​ proportion, p​, is a random variable. What is the source of the​ variability? The sample proportion p is a random variable because the value of p varies from sample to sample. The variability is due to the fact that different people feel differently regarding their satisfaction. Choose the phrase that best describes the shape of the sampling distribution of p below? Approximately normal because n≤0.05N and np(1-p)>10.

The standard deviation represents

a typical distance of an observation from the mean. symbol on calc is Sx= The more spread out data the larger the standard deviation. Standard deviation is zero only when all data values are equal, showing no variability from the mean. S is not resistant. Extreme values or outliers affect it. For a bell-shaped distribution, an observation that is more than 2 standard deviations from the mean is considered as an unusual observation. Standard deviation describes a typical distance of how far the data falls from the mean, how they are clustered or spread out about the mean. The sum of the deviations about the mean always equals zero. The standard deviation is used in conjunction with the mean to numerically describe distributions that are bell shaped. The mean measures the center of the distribution, while the standard deviation measures the spread of the distribution. The standard deviation for the overall distribution of a combination will usually be larger than the standard deviation for two distributions with different means because it introduces more spread to the data. The standard deviation of the entire class is more than the standard deviation of the males and females considered separately because the distribution of the entire class has more dispersion. An athlete who is three standard deviations above the mean would weigh 182 pounds. Yes, this would be an unusual observation because typically all or nearly all observations fall within three standard deviations from the mean. A manufacturer of bolts has a​ quality-control policy that requires it to destroy any bolts that are more than 2 standard deviations from the mean. A bolt will be destroyed if the length is less than 13.9 cm or greater than 14.1 cm. The standard deviation takes into account the values of all​ observations, while the IQR only uses some of the data.


Related study sets

Constant of proportionality Unit1

View Set

50 CHAPTER Care of Patients with Musculoskeletal Problems, MUSKOSLETAL, Lewis Ch. 63 Musculoskeletal Problems, Chapter 63: Musculoskeletal Problems Lewis: Medical-Surgical Nursing, 10th Edition, Musculoskeletal NCLEX style questions, Lewis Ch 64 Musc...

View Set

Smartbook: Chapter 4 Completing the Accounting Cycle

View Set

AP Psychology - Social Psychology (Chapter 16)

View Set

RANDOM QUESTIONS 220 FLA - GENERAL LINES AGENT

View Set

Marketing Chapter 10 (Marketing research)

View Set

15 - German Possessive Pronouns - MEIN, DEIN, SEIN, IHR, UNSER

View Set

Adult Nursing - Chapter 43: Assessment of Digestive and Gastrointestinal Function - PrepU

View Set