Intro to stat

Ace your homework & exams now with Quizwiz!

In a random sample 765 adults in the United States, 322 say they could notcover a $400 unexpected expense without borrowing money or going into debt. (a) What is the point estimate for the parameter (b) What is the name of the statistic can we use to measure the uncertainty of the point estimate? (c) Compute the value from part (b) for this context.

(a) 0.421 (b) We quantify this uncertainty using the standard error, which may be abbreviated as SEp (c) 0.0179

(a) What percent of the data fall between Q1 and the median? (b) What percent is between the median and Q3? (c) What percent is between Q1 and Q3?

(a) 25% (b) 25% (c) 50%

Below are the final scores of 20 introductory statistics students. 79, 83, 57, 82, 94, 83, 72, 74, 73, 71, 66, 89, 78, 81, 78, 81, 88, 69, 77, 79 Give the (a) Mean, (b) Standard Deviation, (c) Median (d) IQR

(a) 77.70 (b) 8.44 (c) 78.50 (d) 10

Two books are assigned for a statistics class: a textbook and its corresponding study guide. The university bookstore determined 20% of enrolled students do not buy either book, 55% buy the textbook only, and 25% buy both books, and these percentages are relatively constant from one term to another. If there are 100 students enrolled, then around 20 students will not buy either book (0 books total), about 55 will buy one book (55 books total), and approximately 25 will buy two books (totaling 50 books for these 25 students). The bookstore should expect to sell about 105 books for this class. Would you be surprised if the bookstore sold slightly more or less than 105 books and why?

If they sell a little more or a little less, this should not be a surprise. Recall that there is natural variability in observed data. For example, if we would flip a coin 100 times, it will not usually come up heads exactly half the time, but it will probably be close.

(a) In your own words explain what is meant by �0:Independence Model and ��:Alternative Model. (b) Also, give what it means for the data to support �0 and for the data to support ��. Note: This is not machine graded. Your Answer:

(a) Ho: Independence Model refers to a statistical hypothesis in which two variables are considered to be independent of each other, meaning that the value of one variable does not affect the value of the other variable. Ha: Alternative Model refers to a statistical hypothesis that contradicts the null hypothesis (Ho) and suggests that there is a relationship or dependence between the two variables being studied. (b) For the data to support Ho, the results of the statistical test performed should not reject the null hypothesis and show no evidence of a relationship between the two variables being studied. For the data to support Ha, the results of the statistical test performed should reject the null hypothesis and provide evidence of a relationship or dependence between the two variables being studied.

About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent. (a) What is the probability that both are left-handed? (b) What is the probability that both are right-handed? Assume that the number of ambidextrous people is negligible.

(a) P(Both Left Handed) = 0.0081 (b) P(Both Right Handed) = 0.8281

https://redwoods.instructure.com/courses/16247/quizzes/84152?module_item_id=691205#:~:text=The%20probability%20of,(A).

(a) The complement of A: when the total is equal to 12. (b) 1/36 (c) 35/36

Use the plots in below figure to compare the incomes for counties across the two groups. (a) What do you notice about the approximate center of each group? (b) What do you notice about the variability between groups? (c) Is the shape relatively consistent between groups? (d) How many prominent modes are there for each group?

(a) The gain data had an increase in median income as compared to the no gain data, (b) The gain data has more variability than the no gain. (c) Yes, they are both skewed right (d) There are one mode for each group

(a) Which is more affected by extreme observations, the mean or median? (b) Is the standard deviation or IQR more affected by extreme observations?

(a) The mean (b) The Standard Deviation

Two US adults are randomly selected from the given distribution. The shaded region represents the probability that a single adult is between 180 and 185 cm which is 0.1157. What is the probability that the two adults selected are between 180 and 185 cm tall?

0.0134

Three US adults are randomly selected. The probability a single adult is between 180 and 185 cm is 0.1157 What is the probability that all three are between 180 and 185 cm tall?

0.1157 × 0.1157 × 0.1157 = 0.0015

According to the National Center for Health Statistics, the heights of males in the USA closely follow a normal distribution with mean 69.2 inches and standard deviation 2.8 inches. What is the probability that a randomly selected male from the USA will be shorter than 66.4 inches? That is P(X < 66.4) = ?

0.16

Below is a table that gives the mean and standard deviation for an administration of the SAT and ACT exams. Both tests follow a nearly normal distribution and as such we can model then using the normal distribution. Figure 4.4 SATACT Mean 150021 SD 300 5 Tomas scores a 24 on the ACT. What is his Z-score?

0.6

According to the National Center for Health Statistics, the heights of males in the USA closely follow a normal distribution with mean 69.2 inches and standard deviation 2.8 inches. What is the probability that a randomly chosen male from the USA is between 63.6 and 74.8 inches?

0.954

Below is a table that gives the mean and standard deviation for an administration of the SAT and ACT exams. Both tests follow a nearly normal distribution and as such we can model then using the normal distribution. Figure 4.4SATACTMean150021SD3005 Zora scores 1850 on the SAT. What is her Z-score?

1.17

Data collected by the Substance Abuse and Mental Health Services Administration (SAMSHA) suggests that 69.7% of 18-20 year olds consumed alcoholic beverages in 2008. Suppose a random sample of thirty (30) of the 18-20 year olds is taken. What is the probability that exactly 15 of the 30 randomly selected 18-20 year olds consumed and alcoholic drink?

1.2

Data collected by the Substance Abuse and Mental Health Services Administration (SAMSHA) suggests that 69.7% of 18-20 year olds consumed alcoholic beverages in 2008. What is the probability that at most 15 out of 30 randomly sampled 18-20 year olds have consumed alcoholic beverages?

1.9

The bookstore also offers a chemistry textbook for $159 and a book supplement for $41. From past experience, they know about 25% of chemistry students just buy the textbook while 60% buy both the textbook and supplement. What proportion of students don't buy either book? Assume no students buy the supplement without the textbook.

100% -25% - 60% = 15% of students do not buy any books for the class.

The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black, and 2 green. A ball is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of capturing the ball. You watch a roulette wheel spin 3 consecutive times and the ball lands on a red slot each time. What is the probability that the ball will land on a red slot on the next spin?

18/38

Two books are assigned for a chemistry class: a textbook and its corresponding study guide. The bookstore determined 30% of enrolled students do not buy either book, 45% buy the textbook only, and 25% buy both books, and these percentages are relatively constant from one term to another. If there are 200 students enrolled, how many books (both textbook and study guide) should the bookstore expect to sell to this class?

190

Suppose weights of the checked baggage of airline passengers follow a nearly normal distribution with mean 20 kilograms and standard deviation 2.5 kilograms. Most airlines charge a fee for baggage that weigh in excess of 25 kilograms. Determine what percent of airline passengers incur this fee.

2.3

Two books are assigned for a chemistry class: a textbook and its corresponding study guide. The bookstore determined 30% of enrolled students do not buy either book, 45% buy the textbook only, and 25% buy both books, and these percentages are relatively constant from one term to another. The textbook for this chemistry class costs $130 and the study guide $43. How much revenue should the bookstore expect from this class of 200 students?

20.350

Below is a table that gives the mean and standard deviation for an administration of the SAT and ACT exams. Both tests follow a nearly normal distribution and as such we can model then using the normal distribution. Figure 4.4 SATACT Mean 1500 21 SD300 5 If Sam scores a 1450 on the SAT, what proportion of people scored higher than Sam?

56.62

According to the National Center for Health Statistics, the heights of males in the USA closely follow a normal distribution with mean 69.2 inches and standard deviation 2.8 inches. If the tallest (top) 10% of males in the USA are considered "very tall", what is the height cutoff for "verytall"?

72.8

Data collected by the Substance Abuse and Mental Health Services Administration (SAMSHA) suggests that 69.7% of 18-20 year olds consumed alcoholic beverages in 2008. Consider a random sample of thirty 18-20 year olds. How many people would you expect to have consumed alcoholic beverages? And with what standard deviation?

Around 21 with an SD of about 2.5

Smoking habits of UK residents. A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that "£" stands for British Pounds Sterling, "cig" stands for cigarettes, and "N/A" refers to a missing component of the data. Indicate whether the variable in the study given by "marital" is numerical or categorical. If numerical, identify as continuous or discrete.

Categorical

Describe the distribution in the histograms below and match them to the box plots. Correct!(a) (2) Correct!(b) (3) Correct!(c) (1)

Correct!(a) (2) Correct!(b) (3) Correct!(c) (1)

What can you see in a dot plot that you cannot see in a histogram?

Counts (frequency) for an individual value.

We consider a publicly available data set that summarizes information about the 3,143 counties in the United States, and we call this the county data set. This data set includes information about each county: its name, the state where it resides, its population in 2000 and 2010, per capita federal spending, poverty rate, and five additional characteristics. How might these data be organized in a data matrix?

Each county may be viewed as a case, and there are eleven pieces of information recorded for each case. A table with 3,143 rows and 11 columns could hold these data, where each row represents a county and each column represents a particular piece of information.

Smoking habits of UK residents. A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that "£" stands for British Pounds Sterling, "cig" stands for cigarettes, and "N/A" refers to a missing component of the data. What does each row of the data matrix represent?

Each row of the data matrix represents a participant in the survey.

The histogram and box plots below show the distribution of finishing times for male and female winners of the New York Marathon between 1970 and 1999. What features are apparent in the box plot but not in the histogram?

In the box plot the more extreme observations, many of which could be considered outliers, are easier to identify.

Earlier we estimated the mean and standard error of p^ using simulated data when p = 0.88 and n = 1000. Confirm that the Central Limit Theorem applies and the sampling distribution is approximately normal.

Independence. There are n = 1000 observations for each sample proportion p^, and each of those observations are independent draws. Success-failure condition. We can confirm the sample size is sufficiently large by checking the success-failure condition and confirming the two calculated values are greater than 10: ��=1000×0.88≥10�(1−�)=1000×(1−0.88)≥10

On page 30, the concept of shape of a distribution was introduced. A good description of the shape of a distribution should include modality and whether the distribution is symmetric or skewed to one side. Using Figure 1.25 as an example, explain why such a description is important.

Just giving summary statistics will fail to give the entire story. The three distributions are quite different in shape and yet all have the same mean and standard deviation.

Estimate the median for the 400 observations shown in the histogram, and note whether you expect the mean to be higher or lower than the median.

Median = (80+85)/2 = 82.5 The distribution is skewed to the left. therefore, the mean is expected to be lower than the median.

Suppose an observational study tracked sunscreen use and skin cancer, and it was found that the more sunscreen someone used, the more likely the person was to have skin cancer. Does this mean sunscreen causes skin cancer?

No, if someone is out in the sun all day, she is more likely to use sunscreen and more likely to get skin cancer. Exposure to the sun is unaccounted for in the simple investigation.

Data were collected about students in a statistics course. Three variables were recorded for each student: number of siblings, student height, and whether the student had previously taken a statistics course. Classify each of the variables as continuous numerical, discrete numerical, or categorical.

Number of Siblings: Numeric Discrete Student Height: Numeric Continuous Stats Before: Categorical

Smoking habits of UK residents. A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that "£" stands for British Pounds Sterling, "cig" stands for cigarettes, and "N/A" refers to a missing component of the data. Indicate whether the variable in the study given by "age" is numerical or categorical. If numerical, identify as continuous or discrete.

Numerical Continuous

The probability that a random smoker will develop a severe lung condition in his or her lifetime is about 0.3. Suppose you have 7 friends that do not know each other and who are smokers. We assume that they can be treated as a random sample of smokers. Is the binomial model appropriate? What is the probability that at most 1 of your 7 friends will develop a severe lung condition?

Part (A): To check if the binomial model is appropriate, we must verify the conditions. 1. Since we are supposing we can treat the friends as a random sample, they are independent. 2. We have a fixed number of trials (n = 7). 3. Each outcome is a success or failure. 4. The probability of a success is the same for each trial since the individuals are like a random sample (p = 0.3 if we say a "success" is someone getting a lung condition, a morbid choice). Part (B): P(0 or 1, develop severe lung condition) = 0.3294

A recent article in a college newspaper stated that college students get an average of 5.5 hrs of sleep each night. A student who was skeptical about this value decided to conduct a survey by randomly sampling 25 students. On average, the sampled students slept 6.25 hours per night. Identify which value represents the sample mean and which value represents the claimed population mean.

Population mean = 5.5. Sample mean = 6.25.

Given the following scatterplots, which plot shows a nonlinear positive association?

Scatterplot 3

What do scatterplots reveal about the data, and how might they be useful?

Scatterplots are helpful in quickly spotting associations relating variables, whether those associations come in the form of simple trends or whether those relationships are more complex.

In a class of 25 students, 24 of them took an exam in class and 1 student took a make-up exam the following day. The professor graded the first batch of 24 exams and found an average score of 74 points with a standard deviation of 8.9 points. The student who took the make-up the following day scored 64 points on the exam. Does the new student's score increase or decrease the average score?

Since the new score is smaller than the mean of the 24 previous scores, the new mean should be smaller than the old mean.

Here, we exam the relationship between homeownership, which for the loans data can take a value of rent, mortgage (owns but has a mortgage), or own, and app_type, which indicates whether the loan application was made with a partner (joint) or whether it was an individual application. Consider the following contingency table: What does the 0.906 represent in Figure 2.21?

The 0.906 represents the fraction of applicants that rent who applied as individuals.

Consider the binomial model when the probability of a success is p = 0.10. Figure 4.9 shows four hollow histograms for simulated samples from the binomial distribution using four different sample sizes: n = 10,30,100,300. What distribution does the last hollow histogram resemble?

The Normal with mean at about 30

In a random sample 765 adults in the United States, 322 say they could notcover a $400 unexpected expense without borrowing money or going into debt.What parameter is being estimated?

The fraction of US adults who could not cover a $400 expense without borrowing money or selling something.

An experiment is evaluating the effectiveness of a new drug in treating migraines. A group variable is used to indicate the experiment group for each patient: treatment or control. The num_migraines variable represents the number of migraines the patient experienced during a 3-month period.Classify each variable as either numerical or categorical?

The grouping variable is categorical and the number of migraines is numerical discrete

In a class of 25 students, 24 of them took an exam in class and 1 student took a make-up exam the following day. The professor graded the first batch of 24 exams and found an average score of 74 points with a standard deviation of 8.9 points. The student who took the make-up the following day scored 64 points on the exam. Does the new student's score increase or decrease the standard deviation of the scores?

The new score is more than 1 standard deviation away from the previous mean, and this will tend to increase the standard deviation of the data.

The below figure suggests three distributions for household income in the United States. Only one is correct. Which one must it be? What is wrong with the other two?

The probabilities of (a) do not sum to 1. The second probability in (b) is negative. This leaves (c), which sure enough satisfies the requirements of a distribution. One of the three was said to be the actual distribution of US household incomes, so it must be (c).

Look back to the study in Section 1.1 where researchers were testing whether stents were effective at reducing strokes in at-risk patients. Is this an experiment? Was the study blinded? Was it double-blinded?

The researchers assigned the patients into their treatment groups, so this study was an experiment. However, the patients could distinguish what treatment they received, so this study was not blind. The study could not be double-blind since it was not blind.

Consider the following contingency table: Is this an observational study or an experiment? What implications does the study type have on what can be inferred from the results?

The study is an experiment, as patients were randomly assigned an experiment group. Since this is an experiment, the results can be used to evaluate a causal relationship between the malaria vaccine and whether patients showed signs of an infection.

Internet use and life expectancy. The scatterplot below shows the relationship between estimated life expectancy at birth as of 2012 and percentage of internet users in 2010 in 208 countries. Describe the relationship between life expectancy and percentage of internet users.

The two variables are positively associated. Countries in which a higher percentage of the population have access to the Internet also tend to have higher average life expectancies.

Take a look at the dot plot below Can you see the skew in the data and if so, what type of skew is it? 0 10 * 20 30 40 50 60

There is skew in the distribution and it appears to be skewed right.

What proportion of scores in a normal distribution fall between: (a) plus or minus 1 standard deviation of the mean (b) plus or minus 2 standard deviations of the mean (c) plus or minus 3 standard deviations of the mean

a) 68% (b) 95% (d) 99.7%

Give the short-hand for a normal distribution with (a) mean 5 and standard deviation 3,(b) mean -100 and standard deviation 10, and(c) mean 2 and standard deviation 9.

a. N9P=5,@=3 b. N9p=-100,@=10 c. N9P=2,@=9

The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females. (a) Describe the distribution of total personal income (b) What is the probability that a randomly chosen US resident makes less than $100,000 per year.

a. The distribution is right-skewed, with a median between $35000 and $49000. The distribution is very roughly about $30000. The distribution is skewed right. The positive skew suggests that there are relatively more individuals with higher incomes than would be expected if the distribution was perfectly symmetrical. random us residents make less then 100,000 is 81.2

Of all freshman at a very large university, 16% made the dean's list in the current year. As part of a statistics class project, students randomly sample 100 students and check if those students made the list. They repeat this 1,000 times and build a distribution of sample proportions. What is this distribution called? Would you expect the shape of this distribution to be symmetric, right skewed, or left skewed? Explain your reasoning. Calculate the variability of this distribution. What is the formal name of the value you computed in part C?

a. The sampling distribution of the proportion b. Since it passes the success/failure criteria (np = 16, n(1 - p) = 84), we would expect the distribution to be symmetric and approximately normal. c. The variability of the distribution would be 0.037 D. The standard error of the proportion or in symbols: ���^

Suppose we are interested in estimating the malaria rate in a densely tropical portion of rural Indonesia. We learn that there are 30 villages in that part of the Indonesian jungle, each more or less similar to the next. Our goal is to test 150 individuals for malaria. What sampling method should be employed? Your Answer: cluster sampling or multistage sampling seemlike the best method

cluster sampling or multistage sampling seemlike the best method

When observations are independent and the sample size is sufficiently large, the sample proportion p^ will tend to follow a skewed right distribution with the following mean and standard error:

false

Below is a table that gives the mean and standard deviation for an administration of the SAT and ACT exams. Both tests follow a nearly normal distribution and as such we can model then using the normal distribution. Figure 4.4 SATACT Mean 1500 21 SD300 5 If Sam scores a 1450 on the SAT, will his Z-score be positive of negative?

negitive

Pick the best answer to complete this sentence: In order for the Central Limit Theorem to hold, the sample size is typically considered sufficiently large when ________________,

np> 10and n(1-p).>10

The binomial distribution with probability of success p is nearly normal when the sample size n is sufficiently large that np and n(1 - p) are both at least 10. The approximate normal distribution has parameters corresponding to the mean and standard deviation of the binomial distributional given by: �=�,�=�(1−�)

p=np,srt=check mark / over np(1-p)

In real-world applications, we never actually observe the sampling distribution, yet it is useful to always think of a point estimate as coming from such a hypothetical distribution. Understanding the sampling distribution will help us characterize and make sense of the point estimates that we do observe.

true

NORMAL DISTRIBUTION FACTS Many variables are nearly normal, but none are exactly normal. Thus the normal distribution, while not perfect for any single problem, is very useful for a variety of problems. We will use it in data exploration and to solve important problems in statistics

true

RANDOM VARIABLEA random process or variable with a numerical outcome.

true

The binomial distribution is used to describe the number of successes in a fixed number of trials.

true

The four conditions to check for a binomial distribution are: (1) The trials are independent.(2) The number of trials, n, is fixed.(3) Each trial outcome can be classified as a success or failure.(4) The probability of a success, p, is the same for each trial. Correct!

true


Related study sets

Module 6 & 7 study guide Fund of data communication.

View Set

The Business Plan: Creating and Starting the Venture 7

View Set

Property & Casualty Law Review Questions

View Set

Corp Finance Midterm Chapter 16

View Set

Hemianopia vs visual neglect: aphasia

View Set

Supply Chain Chapters 1-4 Quiz Questions

View Set

AP Computer Science Sample MC, wqqqwwwwwwwwww

View Set