ST 311 NCSU EXAM 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

quanitative= summarized with mean

categorical= summarized with proportion

Responce variable

measures the outcome of a study (taste rating of syrup)

Double blind:

neither subjects or researchers know if they are in the control or treatment group. (possible 3rd party keeps track of the results)

population variance: "o-"^2 = "sigma squared"

sample variance= s^2

Bo is the INTERCEPT value of y when x is zero.

B1 is the SLOPE. How much y increases as x increases one unit

We would like to test the hypothesesWe find t=1.88 with 5 degrees of freedom. What is the appropriate p-value. Select one: a. 0.05>p-value>0.02 b. 0.025>p-value>0.01 c. 0.20>p-value>0.10 d. 0.10>p-value>0.05

CA: c. 0.20>p-value>0.10

Multi-stage samples:combination of sampling techniques.

Categorical data: Place subjects into one of several groups or categories. (pie charts, bar chars, a percentage in each category)

Biased sampling: more likely to produce some outcomes than others (sample statistics may be consistently too high to too low)

Convenience samples: samples that are easy to take. (often biases: easy to get. but may be different from population in general.)

According to a recent report, it was found that 50.3% of residents in Cuyahoga County, Ohio are registered to vote. Which of the following is more likely? Select one: We take a random sample of 100 people from this county and find that the proportion is below 45% We take a random sample of 400 people from this county and find that the proportion is below 45% We take a random sample of 1000 people from this county and find that the proportion is below 45% We take a random sample of 10000 people from this county and find that the proportion is below 45% We have no basis for predicting which is more likely to have a proportion below 45%.

E1. Explain why the value of sample statistics vary from sample to sample.E5. Given a study, describe the sampling distribution of p-hat as specifically as possible. This involves stating whether this distribution is at least approximately normal and its corresponding mean and standard deviation. The correct answer is: We take a random sample of 100 people from this county and find that the proportion is below 45%

Sum of the squared errors (SSE): -LINE FITS WILL IF THE SSE IS SMALL

Least squares line/ regression line: Line that related x and y, that minimizes the sum of squared errors.

Test for slope: -is there a linear relationship between the variables. Ho: B1= 0 -no straight line relationship between x and y Ha: B1> 0 Ha: B1< 0 Ha: B1=/ 0

Test statistic: t= sample statistic- null value/ se t= (b1-0)/(se*b1) note: we can put different values in 0 if we have a different idea

Which of the following statements is most appropriate about the relationship between x and y in the plot below? Select one: a. The correlation between x and y is negative. b. The relationship between x and y is not linear. c. The standard deviation for y is not constant for all values of x. d. If we were to fit a regression line to this data, the distribution of the y's wouldn't be normally distributed for each x around that fitted line.

The correct answer is: The standard deviation for y is not constant for all values of x.

A ping pong enthusiast gained access to a database from the International Table Tennis Federation on the number and types of shots professional players made during the Olympic Games. The top players' individual ratings are included in the data. The ping pong enthusiast would like to know if there is a correlation between a player's rating and the number of down-the-line shots they made in the 2000 Olympic Singles Games. Suppose that all relevant assumptions are met. Based on the StatCrunch output below, do we have reason to believe that there is non-zero slope between the number of down-the-line shots a player made and that player's international rating? Select one: a. No, because the p-value is less than .05 b. No, because the p-value is more than .05 c. Yes, because the p-value is less than .05 d. Yes, because the p-value is more than .05 e. We cannot decide from the given information.

The correct answer is: Yes, because the p-value is less than .05

Block design: subjects are divided into similar groups called blocks and each treatment is applied in each block (ex: age)

WEEK 8 MATERIAL STARTS HERE:

Skewed right= mean greater than median

skewed left: mean less than median

Quantitative/ Numeric Data: numbers are actually numbers-> make sense as numbers and can be used that way. (histograms, dot plots) (shape, center, variability)

skewed right= long tail to the right, individuals stacked near the lower limits and unlimited on the upper end.

Base MOEon: -normal distribution -standard deviation of the mean

t-distribution: -similar to standard normal -symmetric -bell-shaped -centered at 0 -more "squashed" than normal (less in the middle, more on the ends)

WEEK 6: base MOE on: -normal distribution -sd of the mean

t-distribution: -simular to standard normal - symmetric -bell-shaped -centered at 0 -more squashed than normal

WEEK 9 MATERIAL STARTS HERE: degrees of freedom= n-1

t= statistic hypothesized value/ standard error of statistic

Measures of variability: range- difference of maximum and minimum interquartile range= IQR= Q3-Q1

variance= summarizes distance between all individuals and their mean.

Purpose of regression analysis: -Build models (see how the world works) -Determine if relationship is significant (which variables are related?) -Predict values for individuals (know their value of x, predict y)

y-hat-i= our prediction of a particular value of y/ point on the line at that value of x. yi= particular value of y xi=particular value of x bo=estimated intercept b1-estimated slope

A survey asked 1170 randomly selected subjects how much they extra in taxes they would be willing to pay to protect the environment. The sample average was $430 with a sample standard deviation of $119. Is it appropriate to use a normal distribution to approximate a confidence interval for the population mean? If it's inappropriate, indicate why. Select one: a. Yes. b. No, because it was not a random sample. c. No, because n(p-hat) < 10 or n(q-hat) < 10. d. No, because the sample size wasn't at least 30 and the population wasn't normally distributed. e. No, because we already know the population mean.

CA: -a. Yes.

The coefficient of determination, r2, for the above scatter plot is 0.651. What is the correlation coefficient, r? Give your answer to 3 decimal places.

CA: -0.807

Which of the following numbers represents the correlation for the following scatter plot? Select one: -0.66 0.93 -0.11 0.24 -0.93

CA:A. -0.66

P-value: how likely is our statistic under Ho. Basic idea: if the p-value is SMALL then the statistic is unlikely given the null hypothesis. small tells us something else is going on thus we reject the null hypothesis.

How small is small? The significance level is the cut off point (alpha). IF THE P-VALUE IS LESS THAN ALPHA THEN REJECT THE NULL HYPOTHESIS.

placebo effect: tendency to react to a drug regardless of its actual physical function

PLacebo: a false drug or treatment that the subjects believe is real. (sugar pill, saline solution, fake treatment) evens out placebo effect between treatment and control groups, lacks "active" ingredient.

When do we use: Range: simplest, used only for very quick looks.

SD: common, sensitive to outliers IQR: best when comparing skewed data sets when outliers may be present

Advantages of SRS: 1. unbiased- preferences of person taking sample sample does not come into play (avoid researcher or convenience bias) 2. statistics that result have a predictable long run pattern. (randomness is predictable)

Stratified Random Samples: population is divided into groups (strata) and random samples are taken within these groups. Insures information about all strata. -find estimates for each strata

Alice knows that 70% of the creatures in Wonderland are anthropomorphic animals. She randomly invites several of Wonderland's inhabitants to a tea party. Depending on which inhabitants of Wonderland are chosen, the proportion of animals invited may vary. If Alice samples 60 inhabitants, what is the probability that more than 65% of the sampled inhabitants are animals? Give your answer to four decimal places.

The correct answer is: 0.8023

A researcher was interested in utilities provided by city governments. The researcher randomly selected 20 counties from a list of all counties in the U.S. From each of these counties the researcher then contacted each city government (a total of 192) and found that 12 (6.25%) of them provided electricity to their residents. The type of sample used in this example is a Select one: simple random sample. cluster random sample. stratified random sample. none of the above.

The correct answer is: cluster random sample.

The Mental Development Index (MDI) of the Bayley Scales of Infant Development is a standardized measure used in longitudinal follow-up of high-risk infants. The scores on the MDI have approximately a normal distribution with a mean of 100 and standard deviation of 15. What proportion of children have MDI of at least 88? Select one: .2119 .7881 .1056 .8944

This question covers the following learning objective: D4. Given a mean μ, standard deviation σ, and observed value x, calculate the standardized value of x (the z-score) and interpret its value. D6. Given an observed value(s), use the standard normal table to find the corresponding probability above, below, or between them. The correct answer is: .7881

For each of the sets of sample size and confidence level listed below, select the appropriate t-value. a.n=23, 95% confidence= b.n=30, 99% confidence= c.n=10, 98% confidence=

a. 2.074 b. 2.756 c. 2.821

population mean: "mu" (long u) u sample mean: y- "y bar"

median: middle value in data set when values are put in INCREASING order.

modes= packs of histograms bi model= 2 peaks multi model=several peaks

outlier= unusual values that do not fit with the rest of the pattern. (may bee data entry errors, but not always)

symmetric: mean and median are about the same

sensitive to unusual values and skewed data: pulled away from median

Random assignment:

subjects are assigned to the different levels of the explanatory variable. (i.e. different treatments) by random mechanisms.

Volunteer response sample: self-selected sample of people who responded to a general appeal. (table tents, online votes, etc.)

(SRS)Simple random sample: a sample taken in such a way that evert set of n items has an equal chance of being chosen. Uses chance mechanisms, randomness to avoid bias.

Null hypothesis: beginning claim (H-naught) Allows establishment of the sampling distribution.

Alternative Hypothesis: research hypothesis (H1 or Ha)

The term statistical significance means Select one: a. the test statistic is close to what we would expect if the null hypothesis is true. b. the null hypothesis is true. c. the result we see is unlikely to happen just by random chance if the null hypothesis is true. d. the results are important and will make a practical difference in the lives of the subjects.

CA'S: -the result we see is unlikely to happen just by random chance if the null hypothesis is true.

QUIZ 4: In engineering and product design, it is important to consider the weights of people so that airplanes or elevators aren't overloaded. Based on data from the National Health Survey, we can assume the weight of adult males in the US has a mean weight of 197 pounds and standard deviation of 32 pounds. We randomly select 64 adult males. What is the probability that the average weight of these 64 adult males is over 205 pounds? Give your answer to 4 decimal places.

E7. Given a population standard deviation σ and sample size n, calculate the standard deviation of the sample mean y-bar using the appropriate formula.E8. Given a population mean μ, standard deviation σ, sample size n, and sample mean y-bar, calculate the standardized value (z-score) for a sample mean.E10. Given a mean μ, standard deviation σ, sample size n, and sample mean y-bar, use the sampling distribution of y-bar and the standard normal table to find probabilities above or below the sample mean. The correct answer is: 0.0228

Ha: u =/ 1.5 TWO TAILED

Ha: u> 1.5 Ha: u< 1.5 ONE TAILED

Single blinded

Only participants are unaware of which treatment they receive. Avoids bias.

Which of the following data sets could most appropriately be analyzed with a simple linear regression? In other words, which of the following graphs doesn't violate one of the assumptions for applying a simple linear regression? Select one: a b c d

The correct answer is: d

Please choose all the correct answers about probability. Hint: try picking numbers out of the Z table to answer when considering some of the choices (for example, for "z1" and "z2"). Select one or more: a. Probabilities are always at least 0 and at most 1. b. Pr(Z <= z) + Pr(Z > z) = 1 c. If z1 > z2, then Pr(Z <= z1) + Pr(Z > z2) always equals Pr(Z <= z1 or Z > z2) d. If z1 <= z2, then Pr(Z <= z1) + Pr(Z > z2) always equals Pr(Z <= z1 or Z > z2). e. For a normal distribution, the mean may not always equal the median. f. For a normal distribution (call it X) with mean µ and standard deviation σ, then Pr((X-µ)/σ > 0) = 0.5.

The correct answers are: Probabilities are always at least 0 and at most 1., Pr(Z <= z) + Pr(Z > z) = 1, If z1 <= z2, then Pr(Z <= z1) + Pr(Z > z2) always equals Pr(Z <= z1 or Z > z2)., For a normal distribution (call it X) with mean µ and standard deviation σ, then Pr((X-µ)/σ > 0) = 0.5.

-straight line relationships only, curved relationships we need more complex analysis. -beware of outliers, one bizarre point can change the correlation.

Units: The correlation does not depend on the unit of measure. - correlation is the same if measured in cm or in -regression line will change -regression needs correct units to give appropriate slope.

Control

absence of treatment; used for comparison

Experiments

impose a difference in the explanatory variables and try to determine if there us a difference in the outcome variable.

Subjects/participants

individuals studied

Under coverage: sampling frame does not include all of the population, if parts are missing it is then biased.

non-responce: some part of the population may not respond

Form: y-hat=bo + b1x -we need to find bo and b1 -take the partial derivative and minimize the function to find the coefficients.

residual: 250-229.23= 20.77

Explanatory variable:

variable we think influences another variable (syrup type)

The contractor took 28 water samples and found an average pH of 6.5 with a sample standard deviation of 1.3. What is the test statistic for this sample? Give your answer to 2 decimal places. For help on how to input a numeric answer, please see "Instructions for inputting a numeric response."

CA: -2.04

We would like to test the hypothesis H0:μ=125vs Ha:μ>125We find t = 2.56 with 10 degrees of freedom. What is the appropriate p-value? Select one: a. 0.025 > p-value > 0.01 b. 0.025 > p-value > 0.02 c. 0.05 > p-value > 0.025 d. 0.01 > p-value

CA: a. 0.025 > p-value > 0.01

Which of the following numbers represents the correlation for the following scatter plot? Select one: 0.96 -0.95 -0.18 0.5 0.48

CA:A 0.96

Statistical Significance: not likely to have occurred just by random chance. Indicated that something other than chance is happening.

Hypothesis test: series of steps that allow us to establish statistical significance (determine how much random variability would be in the statistic)

WEEK 10 material starts here: Regression Analysis Positive association: as one variable increases the other variable increases.

Negative association: as one variable increases the other decreases.

Conclusion: if the p-value is less than alpha, reject Ho. Note: the test fir the slope B1=0 is equivalent to testing if the correlation is zero.

Summary: -Because we are taking a sample we might see a correlation or non-zero slope just because of random chance. -We can use a test of hypothesis.

A police officer wants to know the proportion of crimes that are burglaries within the last five years. She randomly selects 40 records from a database of all crimes and checks their records. She finds that 30% of the crimes were labeled as burglary within the last five years. Select one: a. Yes. b. No, because it wasn't a random sample. c. No, because n(p-hat) < 10 or n(q-hat) < 10. d. No, because the sample size wasn't at least 30 and the population wasn't normally distributed. e. No, because we already know the population proportion.

The correct answer is: Yes.

Observational Study:

The researcher did not assign groupings.

Correlation: numeric measure of association (r)

-range from-1 to 1 -direction indicated by sign: neg or pos association. -magnitude indicated strength -values near -1 or 1 indicated close to a straight line (strong association) -values near 0 indicate little or no association -want them to be tightly packed *be careful of outliers

Subject bias can occur in an experiment when the participants are aware of the purpose of the study and respond in a way that they think the researchers want. For example, if participants believe they are in a study to test if coffee improves your mood, those who drank coffee might claim to be happier than they really are. Based on the information given, which of the following experiments is most likely to have the issue of a control or placebo that could introduce subject bias? Select one: A researcher wanted to test how medicine M would lower blood sugar levels of rabbits. She had 20 similar rabbits and randomly divided them into two groups of 10 rabbits each. Rabbits in the treatment group were injected with medicine M, while rabbits in the control group just stayed in the cage. For each rabbit, blood sugar levels were measured before and after the treatment, and the change was computed. To study the effect of medicine A on the treatment of children with tic disorders, 50 patients were randomly divided into two groups of 25 patients each. Patients in the treatment group took medicine A, while patients in the control group took a placebo. The medicine A and placebo were similar in appearance so that patients were not able to figure out which they had taken. After six months, the level of improvement for each patient was evaluated by an independent organization. A company wanted to know how background music could affect the impression of a television commercial on audiences. They conducted an experiment in which 200 people took part. During the experiment, the subjects were asked to sit before screens to take a computer-based test simultaneously. In the test, a television advertisement was played once and questions about the product were asked after the advertisement disappeared. Advertisements and questions were the same for all the subjects except for the background music in the advertisement. Two different versions of background music were randomly selected by computers with equal probabilities. The performance of each subject on answering questions was recorded. A middle school conducted an experiment to show how an advanced teaching method that was different from what was currently employed could improve the academic performances of 8th grade students. Students were informed of the experiment in an assembly. The researchers then randomly selected 15% of all the 8th grade students to form special classes. Students in the special classes were taught using the new approach, while other students remained in their original classes and were taught using the same way as before. General conditions such as class sizes were similar among all classes. After one semester, the changes in academic performances were recorded.

CA'S: -A middle school conducted an experiment to show how an advanced teaching method that was different from what was currently employed could improve the academic performances of 8th grade students. Students were informed of the experiment in an assembly. The researchers then randomly selected 15% of all the 8th grade students to form special classes. Students in the special classes were taught using the new approach, while other students remained in their original classes and were taught using the same way as before. General conditions such as class sizes were similar among all classes. After one semester, the changes in academic performances were recorded.

Local government is worried that runoff from a corporate farm has caused water in a nearby stream to become acidic. The pH is used to measure the acidity/alkalinity of a substance. Pure water, for instance, has a pH of 7, and smaller pH values indicate acidity, while larger values indicate alkalinity. A contractor is hired to test the hypothesis that the water is significantly acidic.Use this information to answer questions 3 and 4. Question 3 Select the correct null and alternative hypotheses. Select one or more: H0: μ = 0 H0: μ = 7 H0: μ≠ 7 H0: μ = 14 HA: μ ≠ 7 HA: μ < 7 HA: μ > 0 HA: μ<14

CA'S: -H0: μ = 7 -HA: μ < 7

The SD of the sample mean: -decreases as sample size increases -is divided by the square root of the sample size. as sample size increases> variability in the sample mean is decreases, SD becomes much smaller

CLT CENTRAL LIMIT THEOREM WEEK 4: highlighted in notebook

Week 5: MOE margin of error: numeric indication of distance a statistic may bee from true parameter.

Confidence level: proportion of times that we do capture the parameter. 90% confidence, wee do camp lee the true parameter in 90% of the possible samples.

An insurance company wants to know the proportion of clients who have had claims within the last year. They randomly select 100 clients from a database of all clients and checks their basic information. They find that 5% of the clients had claims within the last year.'Can we create a confidence interval for the relevant parameter? Select one: a. Yes. b. No, because it wasn't a random sample. c. No, because n(p-hat) < 10 or n(q-hat) < 10. d. No, because the sample size wasn't at least 30 and the population wasn't normally distributed. e. No, because we already know the population proportion

The correct answer is: No, because n(p-hat) < 10 or n(q-hat) < 10.

A researcher is interested in the breakfast and exercise habits of college students. For some randomly selected students, he recorded the number of times they had breakfast in one month and the number of times they went to gym. The correlation for these two variables ended up being 0.74. If a college student who had never previously gone to the gym began going to the gym regularly, what does this imply about the future habits of that student? Select one: a. He will have breakfast more regularly. b. He will stop having a breakfast. c. We cannot conclude that working out in the gym will affect the future breakfast habits of this person. d. He will have breakfast in the gym.

The correct answer is: We cannot conclude that working out in the gym will affect the future breakfast habits of this person.

A researcher collected data on 100 random homes in Wake County. She then created the following histogram of when each home was built.Which of the following are true?Select all that apply. Select one or more: The mean year built is greater than the median year built. The median year built is greater than the mean year built. The median year built is between 1940 and 1960. The median year built is between 1980 and 2000. The third quartile of year built is between 2000 and 2020. The first quartile of year built is between 1940 and 1960. The mean year built and the median year built are equal.

The correct answers are: The median year built is greater than the mean year built., The median year built is between 1980 and 2000.

IF WE TAKE ANOTHER SAMPLE STATISTIC WOLL NOT FALL IN THIS INTERVAL

WE THINK THE TRUE PARAMETER IS THERE NOT OTHER STATISTICS

We are conducting a test of the hypotheses H0: p = 0.6 Ha: p > 0.6 We find a test statistic of z = 2.31. What is the corresponding p-value? Give your answer as a proportion between 0 and 1 to 4 decimal places.

answer: 0.0104

Degrees of freedom big sample= can be normal small sample= less than normal df=indication as to how far the t-distribution is from normal

who the df are large the t-distribution is equivalent to the normal distribution.

3. The spread of the points around the line have the same standard deviation o^- for all x.

4. The points around the line are normally distributed around the line -more values near the line and fewer as we move away from the line.

A LARGER CONFIDENCE PRODUCES A LARGER MOE (99 VS 95) (CHANGES Z-SCORE)

A LARGER SAMPLE SIZE WILL PRODUCE A SMALLER MOE AND DOES NOT INFLUENCE CONFIDENCE

Quartiles: 1 q= 25% of data below 3 q= 25% of data above

Boxplots can immediately tell us if a distribution is multimodal, FALSE WE CANNOT DO THIS

Other problems: -non-responce or dropout -non-adherance ( do not follow instructions)

Generalization/ realism: can we say that the results hold true for the entire population?

Summary:

Test about the slope depends on assumptions-if those assumptions are not reasonable, then the test will not necessarily give the correct p-value.

We are conducting a test of the hypotheses H0: p = 0.7 Ha: p ≠ 0.7 We find a test statistic of z = -2.08. What is the corresponding p-value? Give your answer as a proportion between 0 and 1 to 4 decimal places.

answer: 0.0378

240 individuals are recruited in this trial, and the new treatment is effective on 60 of them. What is the p-value associated with the hypothesis test from question 4? Give your answer to 4 decimal places. For help on how to input a numeric answer, please see "Instructions for inputting a numeric response."

answer: 0.9545

normal distribution is good when we know o-

t-distribution is good when we know s.

Lurking variable

variables that may influence the response but that often are not studied explicitly.

SUMMARY OF BASIC STEPS: 1. establish a null hypothesis 2. find the test statistic 3. specify a null distribution 4. interpret p-value and make the conclusion

"p-naught"= a specific proportion of interest Ha:p>po Ha:p<po Ha:p=/po

A researcher believes that the ankle circumference for adult males in Europe can be considered to have a normal distribution with a mean of 24 cm.If his belief is correct which of the following ranges of ankle sizes will have the largest proportion of members of this population? Select one: 15 to 21 cm 21 to 27 cm 27 to 33 cm It is impossible to tell without the standard deviation.

D1. Explain that the normal distribution is a model for data that has a bell-shaped distribution. D2. List the key characteristics of the normal distribution. The correct answer is: 21 to 27 cm

Which of the following are NOT likely to be well modeled by a normal distribution because the distribution is NOT likely to be symmetric? (Hint: Sketch what you think the histogram would look like based on the information given.) Select one or more: a. The scores from a university's mathematics placement exam in which the minimum score is 0 and the maximum score is 100. Although there were scores throughout the entire range, more than half of the students scored over 85. This is likely to be skewed to the left. b. The number of hours students at a large university work per week at outside jobs. More than 75% of the students worked less than 10 hours but about 5% had jobs in which they worked over 30 hours outside the university. This is likely to be skewed to the right. c. The height (in centimeters) of 10 year old boys in the U.S. Rarely are values lower than 120 cm or over 160 cm and the majority are between 135 cm and 145 cm. d. The amounts of time people wait at a particular bus stop for the bus. About 60% of the time they wait less than 7 minutes, however, occasionally because of traffic issues they wait as long as 40 minutes. This is likely to be skewed to the right.

D1. Explain that the normal distribution is a model for data that has a bell-shaped distribution. D2. List the key characteristics of the normal distribution. The correct answers are: The scores from a university's mathematics placement exam in which the minimum score is 0 and the maximum score is 100. Although there were scores throughout the entire range, more than half of the students scored over 85., The number of hours students at a large university work per week at outside jobs. More than 75% of the students worked less than 10 hours but about 5% had jobs in which they worked over 30 hours outside the university., The amounts of time people wait at a particular bus stop for the bus. About 60% of the time they wait less than 7 minutes, however, occasionally because of traffic issues they wait as long as 40 minutes.

For a normal distribution, what standard score (Z-score) has 90% of the distribution above it? Find the closest value listed on the table. Give your answer to 2 decimal places.

D7. Find a specified percentile of the standard normal distribution (e.g. given a probability find the corresponding z-score). The correct answer is: -1.28

A state administered standardized reading exam is given to eighth grade students. The scores on this exam for all students statewide have a normal distribution with a mean of 517 and a standard deviation of 63. A local Junior High principal has decided to give an award to any student who scores in the top 10% of statewide scores. How high should a student score be to win this award? Give your answer to the nearest integer. For help on how to input a numeric answer, please see the instructions for entering numeric response. Answer:

D8. Given a mean μ and standard deviation σ, find a specified percentile of the normal distribution (e.g. given a probability find the corresponding value of x). The correct answer is: 597.96

volunteer response sample: a survey in which subjects respond to a general call. This type of survey typically gets more response from people who have strong opinions.

haphazard sample: a sample that is take without any logical method. Not the same as random.

Null distribution: -we know for a large sample the population follows a normal distribution, centered at the tree proportion.

p-value: proportion of the null distribution that is more extreme than the observed sample value, need to worry about being extreme in either direction.

Which of the following are true? Select one or more: a. The t-distribution is dependent on the sample size. b. The t-distribution does not depend on the sample size. c. The t-distribution has more values at the extremes than a standard normal distribution. d. The t-distribution has fewer values at the extremes than a standard normal distribution. e. The t-distribution is bell-shaped and centered at its degrees of freedom. f. The t-distribution is bell-shaped and centered at 0. g. With larger samples, the t-distribution is closer to a normal distribution. h. With smaller samples, the t-distribution is closer to a normal distribution.

-The t-distribution is dependent on the sample size. -The t-distribution has more values at the extremes than a standard normal distribution. -The t-distribution is bell-shaped and centered at 0. -With larger samples, the t-distribution is closer to a normal distribution.

A researcher conducted a survey among all the male employees in Research Square Park (just like Research Triangle Park). Among other questions, she asked a randomly selected set of male employees in the past three months (1) how much they have spent on skin-care products and (2) how much tennis they have watched on TV. After collecting the data, she performed a simple linear regression analysis. She was interested in the relationship between the time spent on watching tennis and the money spent on skin care. She treated the number of hours in the past three months watching tennis as the independent variable, and she treated the amount spent on skin-care products in the past three months as the dependent variable. She used StatCrunch to get the following output: Which of the following are correct interpretations and conclusions of the estimated slope? For simplicity, assume all answers are in the context of male employees in RSP over the past three months. Select one or more: a. There is a positive relationship between the time spent on watching tennis and the money spent on skin care. b. There is a negative relationship between the time spent on watching tennis and the money spent on skin care. c. Each additional hour spent watching tennis is associated with about $3.33 additional money spent on skin care on average. d. Each additional hour spent watching tennis is associated with about $56.85 additional money spent on skin care on average. e. For male RSP employees, watching an additional hour of tennis causes them to spend approximately an additional $3.33 on skin care on average.

-There is a positive relationship between the time spent on watching tennis and the money spent on skin care. -Each additional hour spent watching tennis is associated with about $3.33 additional money spent on skin care on average.

-for regressions, it is very important which variable is x and which is y, for correlations it is not. -correlations only look at pattern -regression needs to predict -dont predict beyond data

-association between variables does not mean that one variable causes the changes in another. -could be lurking variables or things BHTS -need a well designed experiment to establish causation

Assumptions: 1. We used randomizations in collecting data -random sample from a population -random assignment in an experiment

2. A straight line is the correct model for the data -not appropriate if we have curved relationship or outliers.

Which of the following are true statements about hypothesis tests?Select all that apply. Select one or more: A hypothesis test for the mean can be run for a simple random sample of 20 observations from a non-normal population. A hypothesis test for the mean can be run for a simple random sample of 45 observations from a non-normal population. A hypothesis test for the mean can be run for a convenience sample of 20 observations from a normal population. A hypothesis test for the mean can be run for a convenience sample of 45 observations from a non-normal population. A hypothesis test for the mean can be run for a simple random sample of 20 observations from a normal population.

CA'S: -A hypothesis test for the mean can be run for a simple random sample of 45 observations from a non-normal population. -A hypothesis test for the mean can be run for a simple random sample of 20 observations from a normal population.

Existing research states that the proportion of households in a city owning a computer is 30%. However, a local politician believes that that number is wrong. He randomly selects 200 families and finds that 68 of them have computers. Please conduct a formal hypothesis test to verify if the politician's claim is credible. Select one: a. H0: p = 0.3, HA: p ≠ 0.3, z = 1.23, p-value = 0.2187, so we conclude there is insufficient evidence for the politician's claim and fail to reject the null hypothesis. b. H0: p = 0.3, HA: p > 0.3, z = 1.23, p-value = 0.1093, so we conclude there is insufficient evidence for the politician's claim and reject the null hypothesis. c. H0: p = 0.3, HA: p < 0.3, z = 1.23, p-value = 0.8907, so we conclude that there is evidence for the politician's claim and reject the null hypothesis. d. H0: p = 0.3, HA: p ≠ 0.3, z = 1.23, p-value = 0.2187, so we conclude that there is evidence for the politician's claim and reject the null hypothesis.

CA'S: -H0: p = 0.3, HA: p ≠ 0.3, z = 1.23, p-value = 0.2187, so we conclude there is insufficient evidence for the politician's claim and fail to reject the null hypothesis.

This problem uses StatCrunch. A local greenhouse sells coffee-tree saplings. They price their saplings based on the height of the plant. They have two workers, Susan and Karen, who measure the saplings for pricing. The greenhouse manager wants to determine if there is a significant difference in the measurements made by these two individuals. She has them measure the same set of 15 saplings. Assume that the differences are calculated as Susan - Karen. Use this information to answer questions 5, 6, and 7. Select all of the correct null and alternative hypotheses. Select one or more: H0: μd=0 H0: μd=15 H0: μd≠15 H0: μd≠0 HA: μd≠0 HA: μd≠15 HA: μd<15 HA: μd>15

CA'S: -H0: μd=0 -HA: μd≠0

Use the following information to answer questions 4 through 6. A clinical trial is conducted to test a new eye treatment, interferon-a, on its efficacy to slow down age-related vision loss. The currently used medicine is found to be effective on 30 percent of all the patients. Scientists believe that interferon-a could have better performance than that. In other words, scientists believe interferon-a will be effective on a larger percentage of patients than the current treatment, and they are able to differentiate between a medicine being "effective" and "not effective." Question 4 Correct 1.00 points out of 1.00 Flag question Question text Select both a null and alternative hypothesis that represent the research question described above. Be careful of notation. Select one or more: HA: p=0.3 HA: p≥0.3 HA: p>0.3 H0: p=0.3 HA: p≠0.3 H0: p≤0.3 HA: p≤0.3 H0: p>0.3

CA'S: -HA: p>0.3 -H0: p=0.3

A scientist wants to test how therapies A and B improve the eyesight of rats. She has 20 rats to use, but their vision is different, which might affect how much vision they improve. The scientist decides to group the four rats with the worst vision in one group, the next four rats with second-to-worst vision in another group, etc., until the final group has the best vision. Then, from each group she randomly selects two rats to receive therapy A, and the remaining two rats in each of the groups get therapy B. Select one or more: The experimental units are therapies A and B. The treatment is the eyesight improved. This is an example of a matched pairs experiment. The experimental units are the 20 rats. This is an example of a block design. This is an example of a completely randomized experiment. The treatments are therapies A and B.

CA'S: -The experimental units are the 20 rats. -This is an example of a block design. -The treatments are therapies A and B.

WEEK 7 QUIZ MATERIAL: A crop scientist is conducting research with a drought-resistant corn hybrid. She is interested in determining if using fertilizer X will increase yield. She prepares 28 single acre plots and randomly assigns 14 to have normal soil while the other 14 are planted with fertilizer X. The resulting average yield for each group of 14 plots was recorded. Select one or more: The explanatory variable is the average yield for each group of 14 plots. The explanatory variable is whether the corn plants had fertilizer X or not. The response variable is whether the corn plants had fertilizer X or not. This is best described as an observational study. This study is best described as an experiment. The response variable is the average yield for each group of 14 plots.

CA'S: -The explanatory variable is whether the corn plants had fertilizer X or not. -This study is best described as an experiment. -The response variable is the average yield for each group of 14 plots.

Which of the following are true statements about the p-value?Select all that apply. Select one or more: The p-value is the probability the null hypothesis is correct. The p-value is the probability the alternative hypothesis is correct. The p-value is one minus the probability the alternative hypothesis is correct If the p-value is large it indicates we did not calculate the test statistic correctly. The p-value is calculated assuming the null hypothesis is true. The p-value is calculated assuming the alternative hypothesis is true. If the p-value is small it indicates the data is unlikely under the null hypothesis.

CA'S: -The p-value is calculated assuming the null hypothesis is true. -If the p-value is small it indicates the data is unlikely under the null hypothesis.

A large-scale study is carried out to investigate whether working out lowers the rate of getting diabetes. In other words, the investigators wish to determine if exercising causes a decreased risk in getting diabetes. A sample of 536 female patients with diabetes and a sample of 445 females without diabetes were surveyed about their past workout history and if they have had diabetes. What is wrong with the study? Select one: Based on the information provided, there is nothing wrong with the study. The experiment is too expensive. The result of this study cannot be used to answer the question of interest. We do not have equal number of patients in each sample.

CA'S: -The result of this study cannot be used to answer the question of interest.

An employer compared the average salaries of their employees over the past two years. They found that the average salary had increased by $3,000 from $40,000 to $43,000, which corresponded to a p-value of 0.21. What should we conclude about their findings? Select one: The results were statistically significant but not practically significant. The results were practically significant but not statistically significant. The results were both statistically significant and practically significant. The results were neither statistically significant nor practically significant.

CA'S: -The results were practically significant but not statistically significant.

Our friend the waffle-man is back and wants to do more hypothesis tests for proportions, but this time for four waffle recipes. He randomly selected 250 waffle consumers and found that 100 (40%) of the 250 preferred Waffle No. 2. He conducted a hypothesis test with H0:p=0.25, Ha:p>0.25. Notice the proportion under the null distribution is p_0 = 0.25. The test statistic for this problem is 5.47. You can verify for yourself that the probability of observing this test statistic is nearly zero assuming the null hypothesis is true. Now suppose we wish to conduct the same hypothesis test again if the true proportion is 0.35. In other words, we happen to know the true parameter value is 0.35, something that is typically not known. How does the test statistic change with this new information? What is the resulting p-value? Hint: Try writing what the null hypothesis and test statistic would be given this new information. What would change, if anything? Select one: a. The test statistic becomes 1.61 with a p-value of 0.537. b. The test statistic becomes 1.66 with a p-value of 0.095. c. The test statistic becomes -3.23 with a p-value of 0.9995. d. The test statistic does not change from its original value of 5.47, and the associated p-value does not change. e. None of the other answers are correct.

CA'S: -The test statistic does not change from its original value of 5.47, and the associated p-value does not change.

Based on the p-value that you calculated in Question 5, what conclusion can be made about your hypothesis? Select one: There is enough evidence to suggest that the new treatment is effective on more than 30% of patients. There is enough evidence to suggest that the new treatment is effective on less than 30% of the patients. There is enough evidence to suggest that the new treatment is effective on exactly 30% of the patients. There is not enough evidence to suggest that the new treatment is effective on more than 30% of the patients. There is not enough evidence to suggest that the new treatment is effective on less than 30% of the patients. There is not enough evidence to suggest that the new treatment is effective on exactly 30% of the patients. We cannot make any conclusion based on the p-value in Question 5.

CA'S: -There is not enough evidence to suggest that the new treatment is effective on more than 30% of the patients.

The November 17, 1994 issue of The New England Medical Journal reported on a study of the effects of hormone therapy on middle-aged women. About 750 women took part in the study; half were selected randomly to receive the hormone therapy and the other half were given a placebo (they did not know which). After about a year, blood tests were conducted on each subject by a lab technician who was unaware of which group (treatment or placebo) the blood samples originated from. In presenting the results of the experiment, the authors reported that the women in the treatment group had experienced a statistically significant increase in HDL (the so-called "good" cholesterol) and a statistically significant reduction in LDL (the so-called "bad" cholesterol) when compared with the control group. Select one or more: This is an example of a block design. This is an example of a completely randomized design. This is an example of a matched pairs design. The treatments were the hormone therapy and the placebo. The treatments were the HDL and the LDL. The subjects are the 750 women in the study. The subjects were the blood tests. This study would be classified as un-blinded. This study would be classified as single-blinded. This study would be classified as double-blind.

CA'S: -This is an example of a completely randomized design. -The subjects are the 750 women in the study. -This study would be classified as double-blind.

Does daily caffeine intake make subjects better at memory tasks? A study was conducted in which the daily habits of 40 college students were documented, focusing on how much caffeine they consumed in a week. After a week, the subjects were given a computer-based memory test on which they received a score on a 0 to 100 point scale. Their scores were compared with how much caffeine they consumed that week. Select one or more: This is best described as an observational study. The explanatory variable is the amount of caffeine consumed. This study is best described as an experiment. The explanatory variable is the score on the computer-based memory test. The response variable is the score on the computer-based memory test. The response variable is the amount of caffeine consumed.

CA'S: -This is best described as an observational study. -The explanatory variable is the amount of caffeine consumed. -The response variable is the score on the computer-based memory test.

From this hypothesis test, choose the correct conclusions. Select one or more: We reject the null hypothesis. We fail to reject the null hypothesis. We conclude that there is a significant statistical difference between the two employees' measurements. We conclude that there is not a significant statistical difference between the two employees' measurements. On average in the sample, Susan has larger values than Karen. On average in the sample, Karen has larger values than Susan. We can not tell whether Susan or Karen has larger values on average.

CA'S: -We fail to reject the null hypothesis. -We conclude that there is not a significant statistical difference between the two employees' measurements. -On average in the sample, Karen has larger values than Susan.

An official with campus dining services claimed that a majority of the students at her university had a campus meal plan. A reporter for the campus newspaper was skeptical of this and decided to investigate the situation. She took a random sample of 75 students and asked them if they had a meal plan. She then used Statcrunch software to construct the following output. What conclusion should we make from this output? Hypothesis test results:Outcomes in : meal_planSuccess : Yesp : Proportion of successesH0 : p = 0.5HA : p < 0.5 VariableCountTotalSample Prop.Std. Err.Z-StatP-valuemeal_plan33750.440.0577-1.0390.1493 Select one: a. We should reject the null hypothesis and conclude there is evidence that less than half the students have a meal plan. b. We should NOT reject the null hypothesis and conclude there is evidence that less than half the students have a meal plan. c. We should NOT reject the null hypothesis and there is insufficient evidence to conclude that less than half of the students have a meal plan. d. We should reject the null hypothesis and there is insufficient evidence to conclude that less than half of the students have a meal plan.

CA'S: -We should NOT reject the null hypothesis and there is insufficient evidence to conclude that less than half of the students have a meal plan.

A successful waffle-man has recently developed a new recipe for waffles. To test the popularity of this new waffle compared to two other tried-and-true types of waffles, our friend the waffle-man randomly selected 180 lucky customers to vote on which of the three waffle types they liked best. Exactly 35% of these customers (or 63 in total) voted in favor of the new waffle. If all waffles were equally tasty, then the waffle-man knows to expect that each waffle would receive around 1/3 of the votes (so around 60 votes per waffle). Are 63 votes for the new waffle enough to conclude that significantly more customers like it compared to the others? Luckily, our friend the waffle-man triple-majored in waffles, statistics, and clinical neurophysiology and knows how to objectively answer this question. He conducts a hypothesis test for proportions, H0:p=1/3, Ha:p>1/3 with a sample proportion of 63/180. In carrying out this test, what null distribution for p̂ should he use? In other words, what is the distribution of the sample statistic assuming the null hypothesis is true? (Be sure to use at least four decimal places in your calculations.) Select one: a. A normal distribution centered at 1/3 with a standard deviation of about 0.0351. b. A normal distribution centered at 1/3 with standard deviation of about 0.0356 c. A normal distribution centered at 0.35 with standard deviation of about 0.0356. d. A normal distribution centered at 0.35 with standard deviation of about 0.0351. e. We cannot use a null distribution in this problem because the population is not normally distributed.

CA'S: -a. A normal distribution centered at 1/3 with a standard deviation of about 0.0351.

A sample of 108 college students at NC State University were randomly selected for a survey. The survey participants reported sleeping 6.8 hours a night on average with a sample standard deviation of 1.1 hours. Which of the following are true? Select one or more: a. The margin of error at 99% confidence is larger than that at 95% confidence. b. The margin of error at 99% confidence is smaller than that at 95% confidence. c. The margin of error is larger with a larger sample size. d. The margin of error is larger with a smaller sample size. e. The margin of error would be larger if the sample mean were 7.1 hours instead of 6.8 hours. f. The margin of error would be smaller if the sample mean were 7.1 hours instead of 6.8 hours. g. The margin of error would be larger if the sample standard deviation were larger. h. The margin of error would be larger if the sample standard deviation were smaller.

CA'S: -a. The margin of error at 99% confidence is larger than that at 95% confidence. -d. The margin of error is larger with a smaller sample size. -g. The margin of error would be larger if the sample standard deviation were larger.

An electronic device factory is studying the length of life of the electronic components they produce. The manager takes a random sample of 50 electronic components from the assembly line and records the length of life in the life test. From the sample he found the average length of life was 100,000 hours and that the standard deviation was 3,000 hours. He wants to find the confidence interval for the average length of life of the electronic components they produced. Based on the information, what advice would you give to him? Select one or more: a. The distribution of the length of life of the electronic components is usually right skewed. Thus, he should not compute the confidence interval. b. He did not take a simple random sample of the electronic components; thus he should not compute the confidence interval c. The mean and standard deviation are large enough to compute the confidence interval. d. The sample size is large enough to use a normal approximation. Thus he can compute the confidence interval. e. He can calculate the confidence interval but should use a t-distribution because the population standard deviation is unknown.

CA'S: -d.The sample size is large enough to use a normal approximation. Thus he can compute the confidence interval. -e. He can calculate the confidence interval but should use a t-distribution because the population standard deviation is unknown.

A researcher conducted a survey among all the male employees in Research Square Park (just like Research Triangle Park). Among other questions, she asked a randomly selected set of male employees in the past three months (1) how much they have spent on skin-care products and (2) how much tennis they have watched on TV. After collecting the data, she performed a simple linear regression analysis. She was interested in the relationship between the time spent on watching tennis and the money spent on skincare. She treated the number of hours in the past three months watching tennis as the independent variable, and she treated the amount spent on skin-care products in the past three months as the dependent variable. She used StatCrunch to get the following output: Austin, a male employee in RSP, spent 10 hours watching tennis and $200 on skin-care products during the past three months. What is the corresponding residual value? Bob, a male employee in RSP, spent 15 hours watching tennis and $30 on skin-care products during the past three months. What is the corresponding residual value? Whose response(s) lie(s) above the regression line?

CA: -109.89 -(-76.74) -Austin

Which of the following can he conclude based on these histograms? Select one or more: a. Histogram X has a larger standard deviation than Histogram Y. b. Histogram X has the largest standard deviation among all 3 histograms. c. Histogram Z has a larger standard deviation than Histogram X. d. Histogram Y has the smallest standard deviation of all the histograms. e. Histogram X most closely resembles a normal distribution with a mean of 500 and a standard deviation of 100. f. Histogram Y most closely resembles a normal distribution with a mean of 500 and a standard deviation of 100. g. Histogram Z most closely resembles a normal distribution with a mean of 14 and a standard deviation of 5. h. Histogram X most closely resembles a normal distribution with a mean of 500 and a standard deviation of 200.

CA: -A,B, & E

Which of the following are true about the correlation coefficient r?Select all that apply. Select one or more: The correlation coefficient is always greater than 0. The correlation coefficient is always between -1 and +1. The correlation coefficient will change if we change the units of measure. If the correlation coefficient is positive, the slope of the regression line will also be positive. If the correlation coefficient is +1, then the slope of the regression line is also +1. If the correlation coefficient is close to 0, that means there is a strong linear relationship between the two variables.

CA: -The correlation coefficient is always between -1 and +1. -If the correlation coefficient is positive, the slope of the regression line will also be positive.

A researcher has run an experiment and has properly calculated a confidence interval for a population mean parameter µ. Her 95% confidence interval is (0.351, 0.412). What is the probability that the true, unknown parameter µ is in her 95% confidence interval? Select one: a. 5% b. 95% c. Either 0% or 100%, but we don't know which. d. This isn't appropriate because confidence intervals are for sample statistics, not parameters. e. This isn't appropriate because confidence intervals are for population proportions, not population means.

CA: -c. Either 0% or 100%, but we don't know which.

Which of the following statements about a confidence interval created for the mean weight of all monsters is true? Select one: a. Choosing a 20% confidence level will mean the range for the confidence interval will be too large to be useful for the airline. b. Choosing a 100% confidence level will mean our confidence interval will be narrower, resulting to too low an accuracy for the mean monster weight to be useful for the airline. c. A 20% confidence level will result in a confidence interval that a balance between a precise estimate for the mean weight of all monsters and a reliable interval of values. d. None of these choices is true.

CA: -d. None of these choices is true.

A public health researcher wishes to study the dietary behavior of residents in Durham County. The researcher randomly contacts 35 county residents and collects data on their daily sugar intake and obtained a sample average of 37.4 grams sugar per day and a sample standard deviation of 4.2 grams per day. The researcher would like to construct a 99% confidence interval for the mean daily sugar intake of residents in the county using the data. Which distribution should the researcher use when analyzing these data? Select one: a. The researcher should use a normal distribution because the data come from a simple random sample and the sample is large. b. The researcher should use a normal distribution because the true population average daily sugar intake is likely not near zero. c. The researcher should use a normal distribution because they do not know the true standard deviation of daily sugar intake in Durham county. d. The researcher should use a t-distribution because the sample size is too small to accurately calculate a 99% confidence interval using a normal distribution. e. The researcher should use a t-distribution because they do not know the true standard deviation of daily sugar intake in Durham county. f. The researcher cannot determine whether a t-distribution or a normal distribution would be more appropriate without looking at the shape of the population distribution.

CA: -e.The researcher should use a t-distribution because they do not know the true standard deviation of daily sugar intake in Durham county.

To study the relationship between the elevation and average annual precipitation in a nature preserve, rain gauges were constructed on different locations with elevation ranging from 0 to 5593 feet. After 30 years, the average annual precipitation for each location was calculated. Statistical software was used to conduct a simple linear regression about the relationship between the average annual precipitation (in inches) and elevation (in 100 feet). The following equation for the regression line was given: Precipitation = 25.74 + 0.8864 * Elevation What is the interpretation of the estimated slope? What is the interpretation of the estimated intercept?

CA: -each additional 100feet of elevation is associated with about 0.8864 inches of additional average annual precipitation on average. -if the elevation is 0 feet, the annual precipitation is expected to be 25.74 inches on average.

A psychologist has designed an index to measure the social perceptiveness of elementary school children. The index is based on ratings of a child's responses to questions about a set of photographs showing different social situations. A random sample of 16 elementary school children was chosen, and their index measurements were recorded. Assume that the index measure in the population is normally distributed. The 90% confidence interval created from this data is (57.07, 64.31). This interval indicates: Select one or more: a. 3.62 is 90% of the true average of the index for all elementary school children. b. The average index of elementary school children must be 60.69. c. 90% of all elementary school children in this district have indices between 57.05 and 64.31. d. If we take many samples from this population, 90% of them will have a sample mean between 57.05 and 64.31. e. The standard deviation of the sample is about 10% smaller than the population standard deviation. f. We are 90% confident that the average index for all elementary school children is between 57.05 and 64.31.

CA: -f. We are 90% confident that the average index for all elementary school children is between 57.05 and 64.31.

The daily revenues of a cafe near the university are approximately normally distributed. The owner recently collected a random sample of 40 daily revenues and found a 90% confidence interval for the average daily revenues in his shop is (973.993, 1026.007). He is unsatisfied by the precision of this confidence interval, however, and wishes to reduce the margin of error by a factor of 2, while retaining the same level of confidence. What sample size do you suggest he use to obtain the desired margin of error? Assume the sample standard deviation remains the same as the sample size changes. Select one: a. 10 b. 20 c. 80 d. 160

CA: d. 160

First, log in to StatCrunch by clicking the link. You do not need to do anything else to log in. If a blank spreadsheet appears, you've logged inThe resulting measurements (in cm) have been saved in StatCrunch (just click this second link to open it). Use the data to compute the test statistic for the difference between Susan and Karen. For help using StatCrunch for a Paired Difference T-Test click here. (You may want to right-click that link and open it in a new window so you don't lose your answers on this quiz!) Give your answer to four decimal places.

CA: -1.9391

A dietitian wants to know the average time spent on breakfast in a primary school. The dietitian randomly samples 25 students and finds that the average is 12.6 minutes with a standard deviation of 1.53 minutes. Assume that the distribution of the time spent on the breakfast is normally distributed. The dietitian finds a 95% confidence interval for this sample is (11.968, 13.232). Select one or more: a. The margin of error is 0.632. b. The margin of error is 1.264. c. The margin of error is 0.600. d. We believe that the true mean time spent on breakfast is between 11.968 and 13.232 minutes. e. If we take many other samples from this population, 95% of them will have a sample mean that is between 11.968 and 13.232. f. There is a 95% chance that the true mean is between 11.968 and 13.232 minutes.

CA: A & D

In a medical study, a group of patients are identified who are free of a particular disease. The researchers collect data on their alcohol consumption. These individuals are then followed over some specified time period to determine whether they get this disease or not. The relationship between the probability of getting the disease during this time period and alcohol consumption is then analyzed. Select one or more: a. This is best described as an observational study. b. The explanatory variable is whether or not the patients get disease. c. The response variable is whether or not the patients get disease. d. This study is best described as an experiment. e. The explanatory variable is the patients' alcohol consumption. f. The response variable is the patients' alcohol consumption.

CA: A, C, & E

In finance, the Capital Asset Pricing Model (CAPM) is a model used to determine a theoretically expected rate of return of an asset, particularly stocks, by describing the relationship between systematic risk and the expected return for those assets. To study the model, we perform a simple linear regression analysis on Google's stock price from 2005 to 2018 [aka Alphabet Inc]. The dependent variable is the monthly excess return of the Google stock, while the independent variable is the monthly excess return on the stock market. [As an aside, the risk-free rate (One Month Treasury Bill Rate) is subtracted from the stock return to get the excess return, so the excess return also measures the risk premium.] The output is as follows. Select all that apply. Here we assume the significance level is 0.05. Select one or more: a. The correlation between the monthly excess return of Google and the monthly excess return on the market is statistically different from zero. b. The correlation between the monthly excess return of Google and the monthly excess return on the market is not statistically different from zero. c. The correlation between the monthly excess return of Google and the monthly excess return on the market is positive. d. The correlation between the monthly excess return of Google and the monthly excess return on the market is negative. e. 26.2% of the variation in the monthly excess return of Google is explained by the linear relationship with monthly excess return on the market. f. It is impossible to find the correlation from the given output.

CA: A, C, & E

Which of the following best describes the scatter plot and its simple linear regression fit? Select one: a. The assumptions for simple linear regression inference are met, and there's a significant correlation between x and y. b. The assumptions for simple linear regression inference are NOT met because the standard deviation for y is not constant. c. The assumptions for simple linear regression inference are NOT met because the overall pattern of the scatter plot isn't a straight line. d. The assumptions for simple linear regression inference are NOT met because y doesn't follow a normal distribution around the regression line for each x. e. The assumptions for simple linear regression inference are NOT met because there are outliers in the data.

CA: C

A fashion magazine provides a yearly subscription service. According to historical data, about 72% of the customers renew their subscriptions each year. The manager of the magazine office believes the percentage would be higher if they change their service to be a monthly subscription. After 12 months, the manager randomly selected 500 customers and found 387 of them renewed the subscription. She conducts a hypothesis test where H0:p=0.72, and Ha:p>0.72 at the α=.05 level. She finds the p-value is 0.0036. Select one or more: a. 0.0036 is the probability of having 387 or fewer customers renew their subscription in a sample of 500 customers given that the true proportion of renewing subscription is 0.72. b. 0.0036 is the probability of having exactly 387 customers renew their subscription in a sample of 500 customers given that the true proportion of those who renew their subscription is 0.72. c. 0.0036 is the probability of having 387 or more customers renew their subscription in a sample of 500 customers given that the true proportion of those who renew their subscription is 0.72. d. Because the p-value 0.0036 is not higher than 0.05, it is not statistically significant at that level, so we fail to reject the null hypothesis. e. Because the p-value 0.0036 is less than 0.05, it is statistically significant at that level, and so we should reject the null hypothesis in favor of the alternative hypothesis. f. 0.0036 is the probability that the true proportion of customers who renew their subscription is 0.72.

CA: C & E

Suppose we take a random sample of 50 monsters to create a 95% confidence interval for the population mean weight of all monsters. How likely is it that the sample mean weight of the random sample differs from the population mean weight of all monsters? Select one: a. All but guaranteed; essentially a 100% chance. b. There is a 95% chance. c. There is a 5% chance. d. Almost certainly no chance; essentially a 0% chance.

CA: a. All but guaranteed; essentially a 100% chance.

Quiz 12: An ambitious statistical-education researcher wants to analyze the typical statistical skills of STEM (science, technology, engineering, and math) majors across the country. They have an exam they plan on giving to sampled college students who are majoring in a STEM field, but they need to choose how to sample college students. Which of the following are considered good sampling methods? Select one or more: a. The researcher can go to their local university and get all the STEM majors there to take the exam. b. The researcher can randomly pick 10 universities and get all the STEM majors at those randomly chosen universities to take the exam. c. Assuming the researcher can get such a list, they can sort STEM majors into those attending private or public universities. They can then pick 30 randomly chosen students from those attending private universities and 30 from public universities. d. Assuming the researcher can get such a list, they can sort STEM majors into those whose last names start with an A-K and those whose last names start with an L-Z. They can then randomly pick 30 students from each list. e. Assuming the researcher can get such a list, they can randomly pick 60 STEM majors from across the country.

CA: b, c, and one more but im not sure what

A biologist is interested in how the circumference of a tree is associated with its height. Statistics software was used to conduct a simple linear regression for the relationship between diameters and the height of trees in Raleigh, NC. Both perimeters and circumferences were measured in feet. The following equation for the regression line was given: HEIGHT = 2.18 + 13.634 * CIRCUMFERENCE If the circumference of a tree is 0.5278 feet, what is the predicted height of this tree? Give your answer to the three decimal places.

CA:-40.24

QUIZ 3: From the records of a health-insurance companies in Pennsylvania, it is known that 58% of the accounts include dental coverage. A researcher would like to take a random sample of 500 accounts to review. Find the standard deviation of the sample proportion in this situation. Give your answer to 4 decimal places.

E3. Given a population proportion p and sample size n, calculate the standard deviation of the sample proportion p-hat using the appropriate formula. The correct answer is: 0.0221

At NC State University, 36.3% of the undergraduate classes have less than 20 students. If a random sample of 100 undergraduate classes was taken, which of the following would accurately describe the sampling distribution? Select all that apply. Select one or more: The sampling distribution will be approximately normal. The sampling distribution will be skewed right. The sampling distribution will be skewed left. The mean of the sampling distribution will be 36.3%. The mean of the sampling distribution will be equal to 20%. We can not determine the mean of the sampling distribution from the given information. The standard deviation of the sampling distribution will be 0.0481. The standard deviation of the sampling distribution will be 0.0023.

E3. Given a population proportion p and sample size n, calculate the standard deviation of the sample proportion p-hat using the appropriate formula. E5. Given a study, describe the sampling distribution of p-hat as specifically as possible. This involves stating whether this distribution is at least approximately normal and its corresponding mean and standard deviation. The correct answers are: The sampling distribution will be approximately normal., The mean of the sampling distribution will be 36.3%., The standard deviation of the sampling distribution will be 0.0481.

Which of the following scenarios would it be appropriate to use a normal approximation for the sampling distribution of the sample proportion? Select one: A researcher wishes to find the probability that more than 60% of a sample of undergraduate students from UNC will be female. She samples the first 42 students that walk into the gym on Monday morning. The population proportion of undergraduate females at UNC is known to be 60.1%. A researcher wishes to find the probability that less than 5% of a sample of undergraduate students from Appalachian State University will be between the ages of 25 and 34. He randomly samples 50 undergraduate students from the student database. The proportion of undergraduates between the ages of 25 and 34 is 5.3%. A grad student at NC state wants to know how likely it is that a group of students would be made up of more than 27% graduate students. She will randomly select 38 students and ask them if they are a graduate student or an undergraduate student. The population proportion of grad students at NC state is 26.6%. A full-time student at Fayetteville State University wants to know how likely it is that a group of students would be made up of less than 70% full-time students. She will ask 30 people that she sees parking in the parking deck if they are full-time or part-time. The population of full-time students at Fayetteville State is known to be 72%.

E5. Given a study, describe the sampling distribution of p-hat as specifically as possible. This involves stating whether this distribution is at least approximately normal and its corresponding mean and standard deviation. The correct answer is: A grad student at NC state wants to know how likely it is that a group of students would be made up of more than 27% graduate students. She will randomly select 38 students and ask them if they are a graduate student or an undergraduate student. The population proportion of grad students at NC state is 26.6%.

At a large university it is known that 35% of the students live on campus. The director of student life is going to take a random sample of 500 students. Which of the following is most likely to occur? Select one: The sample proportion falls between 0.35 and 0.55. The sample proportion falls between 0.15 and 0.35. The sample proportion falls between 0.25 and 0.45. The sample proportion falls between 0.3 and 0.4.

E6. Given a proportion p, sample size n, and sample proportion p-hat, use the sampling distribution of p-hat and the standard normal table to find probabilities above or below the sample proportion. The correct answer is: The sample proportion falls between 0.25 and 0.45.

In psychology, there is a particular Mental Development Index (MDI) used in the study of infants. The scores on the MDI have approximately a normal distribution with a mean of 100 and standard deviation of 16. We are going to randomly select 64 children and average their MDI scores. What is the probability that the average is under 97? Give your answer to 4 decimal places.

E7. Given a population standard deviation σ and sample size n, calculate the standard deviation of the sample mean y-bar using the appropriate formula.E8. Given a population mean μ, standard deviation σ, sample size n, and sample mean y-bar, calculate the standardized value (z-score) for a sample mean.E10. Given a mean μ, standard deviation σ, sample size n, and sample mean y-bar, use the sampling distribution of y-bar and the standard normal table to find probabilities above or below the sample mean. The correct answer is: 0.0668

For each of the following, tell whether the population parameter of interest is µ or p. In a poll of 122 Americans, 32 reported that they preferred milk with their Oreo cookies. What percentage of Americans prefer milk with their Oreo cookies? In a survey of 4 car dealerships in North Carolina, the average employee salary was $43,101. What is the average salary of employees working at car dealerships in North Carolina? A recent study of 204 high school students found that the average daily time they spent on cell phones was 90 minutes. What is the average daily time spent on cell phones among all students at the high school.

E9. Given a study, describe the sampling distribution of y-bar as specifically as possible. This involves stating whether this distribution is at least approximately normal and its corresponding mean and standard deviation. The correct answer is: In a poll of 122 Americans, 32 reported that they preferred milk with their Oreo cookies. What percentage of Americans prefer milk with their Oreo cookies? → p, In a survey of 4 car dealerships in North Carolina, the average employee salary was $43,101. What is the average salary of employees working at car dealerships in North Carolina? → µ, A recent study of 204 high school students found that the average daily time they spent on cell phones was 90 minutes. What is the average daily time spent on cell phones among all students at the high school. → µ

The city government has collected data on the square footage of houses within the city. They found that the average square footage of homes within the city limit is 1,660 square feet while the median square footage of homes within the city limits is 1,240 square feet. The city government also found that the standard deviation of home square footage within the city limits is 198 square feet. A statistician hired by a local home-carpeting company is going to randomly select a sample of 20 houses and record the square footage of the homes using public records. Which of the following is true? Select all that apply. Select one or more: a. The shape of the sampling distribution of the mean square footage of homes will be right skewed. b. The shape of the sampling distribution of the mean square footage of homes will be left skewed. Recall that if the mean is larger than the median, the distribution is right skewed. If the mean is less than the median, the distribution is left skewed. And if the mean is equal to the median, or if the sample size is greater than 30, the distribution will be approximately symmetric. c. If the statistician sampled 15 more homes within the city limits and added their data to the original sample of 20 homes then the shape of the sampling distribution of the mean square footage of all 35 homes will be approximately symmetric. Recall that the Central Limit Theorem says that if the sample size is greater than 30, then the sampling distribution of the mean will be symmetric. d. The sampling distribution of the mean square footage will have a smaller standard deviation when compared to the standard deviation of square footage among all homes within the city limits. e. The sampling distribution of the mean square footage will have a standard deviation equal to or larger than the standard deviation of square footage among all homes within the city limits.

E9. Given a study, describe the sampling distribution of y-bar as specifically as possible. This involves stating whether this distribution is at least approximately normal and its corresponding mean and standard deviation. The correct answers are: The shape of the sampling distribution of the mean square footage of homes will be right skewed., If the statistician sampled 15 more homes within the city limits and added their data to the original sample of 20 homes then the shape of the sampling distribution of the mean square footage of all 35 homes will be approximately symmetric., The sampling distribution of the mean square footage will have a smaller standard deviation when compared to the standard deviation of square footage among all homes within the city limits.

A sample of 1000 college students at NC State University were randomly selected for a survey. Among the survey participants, 102 students suggested that classes begin at 8 AM instead of 8:30 AM. The sample proportion is 0.102. What is the margin of error for a 99% confidence interval for this sample? Give your answer to three decimal places.

F1. Describe what is meant by a margin of error—that is, the margin of error is an indication of how far a statistic is from the true population parameter. F7. Given a confidence level C, determine the critical value (z*) from the standard normal table needed to construct the confidence interval. F8. Define the standard error and be able to calculate it when given a sample proportion p-hat and sample size n. F9. Construct a confidence interval for a population proportion using the appropriate formula. The correct answer is: 0.025

A sample of 1000 college students at NC State University were randomly selected for a survey. Among the survey participants, 102 students suggested that classes begin at 8 AM instead of 8:30 AM. The sample proportion is 0.102. What is the upper endpoint for the 99% confidence interval? Give your answer to three decimal places. (Note that due to the randomization of the questions, the numbers in this question might be different from the previous question.)

F1. Describe what is meant by a margin of error—that is, the margin of error is an indication of how far a statistic is from the true population parameter. F7. Given a confidence level C, determine the critical value (z*) from the standard normal table needed to construct the confidence interval. F8. Define the standard error and be able to calculate it when given a sample proportion p-hat and sample size n. F9. Construct a confidence interval for a population proportion using the appropriate formula. The correct answer is: 0.127

WEEK 11 MATERIAL: SD of points around the line -if s is SMALL, R2 will be BIG -SSE/n-2 is also known as a mean square error (MSE) or mean square residual. -The computer will say "estimate of error sd"= s

Inference for slope -slope characterizes the linear relationship between x andy -if slope if zero-> values of y do not change when x changes. (duh) -different samples will produce different slopes. (b1)

R2 or r2= square of the correlation -ranges from 0 to 100% -closer to 100%= the smaller the variability around the line.

Interpretation: -R2 represents percentage of variability in y that is accounted for my straight line relationship with x. -goof for comparing predictor variables.

Assumptions: -random sample -the population is a normal distribution -(this test is robust to the normal population assumption when the sample size is large)

Matched pair experiments: -compare two measurements on the same object -record two measurements on the same subject -take difference of those measurements -change scores -treat differences as single samples

completely randomized design: each subject receives one treatment, without taking other variables into consideration. simplest type

Matches pairs design: each unit receives two treatments. 1. a single subject (serves as their own control) 2. two subjects that have been matched together (one receives treatments, the other receives the control; based on some criteria)

Hypotheses for differences: -if there is no difference the average should be 0 -if dominant hand is stronger the differences should be greater than 0 -Ho: u=uo -H1:u<uo -H1:u> uo -H1:u=/uo

Notes on hypothesis test: -statistical test are based on randomness and assumptions (non-random experiments result in p-values that have questionable interpretation) -statistical significance does not mean practical.

Test statistic: numeric measure of distance from sample value to what is expected under the null hypothesis. (difference in means, a standard score, etc.)

Null distribution: distribution of the test statistic assuming the null hypothesis is true. What is likely if the null hypothesis is true.

WEEK 1 MATERIAL: population: is the entire group of students in which we are interested in. census: a process that collects data from an entire population.

Sample: is a subset of the population from which we collect data. convenience sample: a sample in which the researcher finds subjects that are easy to access.

Distribution of the slope: Slope b1 -centered around the true slope B1 -has normal distribution (centered at 0) -spread? need standard error (se)

Standard error of the slope: formula in notebook Summary of module: slope varies from sample to sample but has a predictable distribution.

Regression Analysis: procedure to determine the best linear relationship. Makes prediction. -find the best guess at values Bo and B1.

Straight line, Linear relationship: -as x increases y increases at a constant rate. y=Bo+B1x use to predict the average y for a particular value of x

Cluster samples: population divide into naturally occurring groups/cluster. Randomly select clusters of subjects and talk with all subjects in that group.

Systematic samples: select every k^th item individual from the sampling frame (every 10^th, start in random place)

For each of the sets of sample size and confidence level listed below, select the appropriate t-value. n=23, 95% confidence blank n=30, 99% confidence blank n=10, 98% confidence blank

The correct answer is: For each of the sets of sample size and confidence level listed below, select the appropriate t-value. n=23, 95% confidence [2.074] n=30, 99% confidence [2.756] n=10, 98% confidence [2.821]

How much does it cost to commute to work? How does distance of your commute relate to your fuel usage? Data were collected on distance between a person's home and primary work location (distance) and the person's monthly expenditures on gasoline for their vehicle (gas) for 17 individuals. The regression analysis is given below. One of the 17 individuals in the sample lived 15 miles from work and spent $150.25 on gas. What is this individual's residual value? Give your answer to 2 decimal places. For help on how to input a numeric answer, please see "Instructions for inputting a numeric response."

The correct answer is: -3.25

The Town of Hertfordshire clerk knows that 23% of dogs in the town have completed emotional support training. Hertfordshire plans on showcasing a simple random sample of its dogs in a show. Depending on which dogs are chosen, the proportion of emotional support trained dogs may vary. In a sample of 50 dogs, what is the probability that less than 6% of the dogs are emotional support trained? Give your answer to four decimal places.

The correct answer is: 0.0021

A statistics graduate student conducted an experiment about graduate students who lived on campus. After taking a simple random sample of 65 students, she found that twelve students lived on campus. What is the standard error she calculated? Give your answer to 3 decimal places.

The correct answer is: 0.048

QUIZ 5: In January of 2019, the American Midwest experienced record-breaking freezing temperatures. A certain metropolitan area in Wisconsin, with a residential population of over 1.15 million, experienced daily average temperatures as low as minus 28 degrees Fahrenheit. Hospitals saw a steady stream of patients reporting symptoms of frostbite. In the aftermath, a survey was conducted in order to estimate the true proportions of individuals afflicted by frostbite during the extreme weather. 289 residents were selected via simple random sampling and 273 reported not having any symptoms. What was the sample proportion obtained by the survey? Give your answer to 3 decimal places. For help on how to input a numeric answer, please see "Instructions for inputting a numeric response." (Note it's the sample proportion of people who did have frostbite.)

The correct answer is: 0.055

At a large university it is known that 45% of the students live on campus. The director of student life is going to take a random sample of 100 students. What is the probability that more than half of the sampled students live on campus?

The correct answer is: 0.1587

In-N-Out Burger is planning on adding a new burger to the menu. A franchise owner wants to know how well the new burger would sell in Austin, Texas. As such, they want to estimate the proportion of residents in Austin, Texas that would like the new recipe. They randomly select 220 residents and have them taste the burger. Out of these 220 people, they determined that 90 of them enjoyed the burger. The sample proportion is 0.409, and the 95% confidence interval for the proportion of residents in Austin who like the new burger is: (0.344, 0.474). Assume where applicable that p-hat remains the same. If you had a random sample of 105 residents instead of 220 the margin of error would? If you created a 90% confidence interval instead of the 95% confidence interval the margin of error would? If you created a 99% confidence interval instead of the 95% confidence interval the margin of error would? If you had a random sample of 370 residents instead of 220 the margin of error would?

The correct answer is: If you had a random sample of 105 residents instead of 220 the margin of error would? → Increase compared to the interval above, If you created a 90% confidence interval instead of the 95% confidence interval the margin of error would? → Decrease compared to the interval above, If you created a 99% confidence interval instead of the 95% confidence interval the margin of error would? → Increase compared to the interval above, If you had a random sample of 370 residents instead of 220 the margin of error would? → Decrease compared to the interval above

What is a sampling distribution? Select one: It is a probability distribution that quantifies which sample statistics are more and less likely to be observed. It is a probability distribution that quantifies which populations are more and less likely to have been sampled from. It is the true, unknown probability distribution of all parameters in the population. It is the true, average sample proportion across all possible sample proportions.

The correct answer is: It is a probability distribution that quantifies which sample statistics are more and less likely to be observed.

Quiz 2: An instructor in a college class recently gave an exam that was worth a total of 100 points. The instructor inadvertently made the exam harder than he had intended. The scores were very symmetric, but the average score for his students was 54 and the standard deviation of the scores was 4 points. The instructor is considering 2 different strategies for rescaling the exam results: Method 1:Add 20 points to everyone's score. Method 2:Multiply everyone's score by 1.5. Which of the following are true? Select all that apply. Select one or more: Method 1 will increase the standard deviation of the students' scores Method 2 will increase the standard deviation of the students' scores. Method 1 will decrease the standard deviation of the students' scores. Method 2 will decrease the standard deviation of the students' scores.

The correct answer is: Method 2 will increase the standard deviation of the students' scores.

A university officer wants to know the proportion of registered students that spend more than 20 minutes to get to school. He selects two parking decks at the university and talked with students at those decks. Of 25 students found in the decks, 12 have a commute under 20 minutes and 13 have a commute more than 20 minutes. Select one: a. Yes. b. No, because it wasn't a random sample. c. No, because n(p-hat) < 10 or n(q-hat) < 10. d. No, because the sample size wasn't at least 30 and the population wasn't normally distributed. e. No, because we already know the population proportion.

The correct answer is: No, because it wasn't a random sample.

A clothing store owner wants to know the proportion of customers who used coupons within the last year. He selects in a random order all 2,000 receipts from a database of all purchases within the last year and finds that 340 of them are discounted by coupons. Select one: a. Yes. b. No, because it wasn't a random sample. c. No, because n(p-hat) < 10 or n(q-hat) < 10. d. No, because the sample size wasn't at least 30 and the population wasn't normally distributed. e. No, because we already know the population proportion.

The correct answer is: No, because we already know the population proportion.

Below is a scatter plot of data comparing the number of hours of sleep a student got the night before an exam and their resulting grade on the exam. Inspired by this graph, Timothy decides he's going to sleep for 14 hours the night before he takes this exam. Is it appropriate to use this data to predict Timothy's score on the exam? Select one: Yes, assuming we can find the regression line. No, because we would be trying to find a value outside of our data range. Yes, because the slope of the regression line will be positive. Yes, because the y-intercept of the regression line will be positive. No, because the data is not linear.

The correct answer is: No, because we would be trying to find a value outside of our data range.

A teacher decided to bring a jar of 530 pieces of small candy to his 100-student classroom so students could practice estimation. The students were told that whoever had the closest guess would win the candy. Suppose we took a random sample of one third of the students and calculated the sample mean of their guesses. The distribution of individual guesses had a mean of 400 pieces of candy and a standard deviation of 3,000 pieces of candy (the students had a lot of trouble guessing the count). Is it appropriate to use a normal distribution to approximate the sampling distribution of the sample mean? Hint: check your assumptions and consider, if applicable, what the distribution would look like. Select one: a. Yes, because the population distribution is normally distributed, and we have a random sample. b. Yes, because the sample size is at least 30, and we have a random sample. c. Yes, because np and nq are both at least 10, and we have a random sample. d. No, because we don't have a random sample. e. No, because the sample size isn't at least 30 and the population isn't normally distributed. f. No, because np and nq aren't both greater than 10. g. No, because we would need a larger sample size before the sampling distribution would be reasonably normally distributed.

The correct answer is: No, because we would need a larger sample size before the sampling distribution would be reasonably normally distributed.

A researcher felt there might be a correlation between weather conditions and people's mental health. In order to verify his conjecture, he obtained data on the number of mostly sunny days (x) in 2015 from 245 randomly selected cities in the nation and their corresponding citizens' happiness indices (y). In other words, he wants to test if there is a statistically significant correlation between x and y. How should he carry out the analysis? Select one: a. Run a differences-in-means test (also sometimes called here a "paired t-test") on x and y with the null hypothesis that their means are equal and see if the resulting p-value is small for appropriately phrased hypotheses b. Run a simple linear regression and test if the slope estimate is statistically different from 1 with a t-test c. Run a simple linear regression and test if the slope estimate is statistically different from 0 with a t-test d. Run a simple linear regression and check if r2 is bigger than 0.5 e. Run a t-test on y with the null hypothesis that the true mean of y equals the sample mean of x

The correct answer is: Run a simple linear regression and test if the slope estimate is statistically different from 0 with a t-test

John is a new college graduate working at his first job. After years of living in an apartment he has decided to purchase a home. He has found a great neighborhood from which he can walk to work. Before buying a home in the area he has decided to collect some data on the homes in this neighborhood. A data set has been compiled that represents a sample of 100 homes in the neighborhood he is considering. The variables included in this data set include:* Value: the current value of the home as determined by the county tax assessor.* Size: the size of the home in square feet.* Year: the year the homes were built.* Basement: does the home have a basement (y=yes, n=no).* Fireplace: does the home have a fireplace (y=yes, n=no).* Type: the structure a single family house or a townhouse. (house or townhouse). Click here to see the data in StatCrunch and click here for tutorial videos.Create histograms for each of the numeric variables and create bar charts for each of the categorical variables. When creating the histogram for the "value" variable use bin width of 20000. Use these variables to explore the data and determine which of the following best fits this situation. Select one: The histogram for value is basically symmetric with one large outlier. This is probably because the neighborhood was built more recently around an older estate home that is very large and very expensive. The histogram for value is basically symmetric with one large outlier. This is probably because there the majority of the homes in the neighborhood are townhouses with only one large single family home. The histogram for value is basically symmetric with one large outlier. This is probably because it is the only home in the neighborhood that has a fireplace and that substantially increases its value. The histogram for value is basically symmetric with one large outlier. This is probably because one home is very new while all of the others are about 50 years old. The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the homes in the neighborhood have higher priced single family houses and lower priced town homes. The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood was built in two phases, the newer phase consists of larger more expensive homes and the older phase consists of smaller less expensive homes. The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood was built in two phases, the older phase consists of larger more expensive homes and the newer phase consists of smaller less expensive homes. The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood has some homes that have basements that tend to be larger in size with another group of homes that do not have basements and tend to be smaller.

The correct answer is: The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood was built in two phases, the newer phase consists of larger more expensive homes and the older phase consists of smaller less expensive homes.

The manager of the motor pool wants to know if it costs more to maintain cars that are driven more often. Data are gathered on each car in the motor pool regarding number of miles driven (X) in a given year and maintenance costs for that year (Y) in thousands of dollars. The regression equation is computed as: Y=50+0.03X, and the p-value for the slope estimate is 0.345. What conclusion can we draw from this study? Select one: a. Cars that are driven more tend to cost more to maintain. b. There's no statistically significant linear relationship between the number of miles driven and the maintenance cost. c. The correlation between the response variable and independent variable is significant. d. The slope estimate is significantly different from zero.

The correct answer is: There's no statistically significant linear relationship between the number of miles driven and the maintenance cost.

A researcher wants to learn how the GAWKS (gestational age in weeks) affects AC (abdominal circumference). To test this conjecture, you will perform a simple linear regression in StatCrunch on the variables GAWKS and AC. Click this link to get to the data set in StatCrunch. (You may want to right-click on that link and open it in another tab to avoid closing your quiz accidentally). What is the equation of the regression line? What is the correlation coefficient?

The correct answer is: What is the equation of the regression line? → -55.1884 + 10.3385 * GAWKS, What is the correlation coefficient? → 0.9863

What is the slope of the regression line? What is the intercept of the regression line?

The correct answer is: What is the slope of the regression line? → 10.187626 What is the intercept of the regression line? → -99.126832

Which of the following are necessary conditions to perform inference in regression? (Select all that apply) Select one or more: a. The values of x should not have a correlation with the values of y. b. The data come from a well-designed random sample or randomized experiment. c. The actual relationship between x and y is linear. d. The x values must have a normal distribution centered at 0 e. The values of y must have a normal distribution centered at 0 f. The distribution of the values around the regression line should have a normal distribution.

The correct answers are: -The data come from a well-designed random sample or randomized experiment. -The actual relationship between x and y is linear. -The distribution of the values around the regression line should have a normal distribution.

To study the relationship between the Ozone level and the upper air temperature in Denver, CO from 2000 to 2015 with each year from June 1 to Sept. 30, Tom collected data and performed simple linear regression analysis. He downloaded the maximum daily 8-hour average (MDA8) Ozone (O3) level in parts per million (ppm) from the US Environmental Protection Agency (EPA), and the daily air temperature in Kelvin (K) from National Oceanic and Atmospheric Administration (NOAA). Tom performed a simple linear regression using the above data and got the following output. "Temp.700mb" is the temperature in Kelvin. Which of the following are correct conclusions or interpretations? Hint: Recall that 0 degrees Kelvin is absolute zero (-459.67 degrees Fahrenheit). Also recall that ozone is measured in parts per million and therefore must be a positive number. Select one or more: a. The intercept is negative, so therefore the regression analysis is wrong. b. We would predict that at 0 degrees Kelvin the ozone will be about -389.38 ppm on average. c. Zero degrees Kelvin is outside the range of temperatures found on Earth. Therefore, we should not try to interpret the intercept estimate seriously. d. Although the intercept is negative, we can still use the regression analysis for predictions for ozone levels for typical temperatures in Denver, CO.

The correct answers are: -Zero degrees Kelvin is outside the range of temperatures found on Earth. Therefore, we should not try to interpret the intercept estimate seriously. -Although the intercept is negative, we can still use the regression analysis for predictions for ozone levels for typical temperatures in Denver, CO.

Polling suggests that 38% of the US adult population does not own a car. A random sample of 500 Americans will be taken. The sampling distribution of a sample proportion suggests that: (select all of the following which are correct) Select one or more: 38% of the 500 Americans sampled will not own a car. Across a large number of random samples, we should expect that, on average, 38% of the 500 Americans sampled will not own a car. It is less likely for less than 36% of the 500 Americans sampled to not own a car than it is for more than 40% of the Americans sampled to not own a car. It is more likely for more than 37% of the 500 Americans sampled to not own a car than it is for less than 38% of the Americans sampled to not own a car. Since the 500 Americans are randomly sampled, we cannot make any kind of predictions about the resulting sample proportion.

The correct answers are: Across a large number of random samples, we should expect that, on average, 38% of the 500 Americans sampled will not own a car., It is more likely for more than 37% of the 500 Americans sampled to not own a car than it is for less than 38% of the Americans sampled to not own a car.

For the SAT college entrance exam (prior to March 2005), the combined scores ranged from 400 to 1600.A study recorded the combined scores from 100 students from each of three schools in a western state.The resulting scores were used to produce these boxplots.Which of the following are true?Select all that apply. Select one or more: School 3 is more skewed to the right than school 2. School 3 is more symmetric than school 2. The median of school 1 is greater than the median of school 2. The third quartile of school 3 is greater than the median of school 1. The third quartile of school 1 is greater than the maximum value for school 3. The first quartile of school 1 is greater than 900. The third quartile of school 1 is greater than 1400 The outliers for school 3 have a greater influence on the median than the mean. The outliers for school 3 have a greater influence on the mean than the median. School 1 has an outlier.

The correct answers are: School 3 is more skewed to the right than school 2., The first quartile of school 1 is greater than 900., The outliers for school 3 have a greater influence on the mean than the median.

A researcher was interested in utilities provided by city governments. The researcher randomly selected 20 counties from a list of all counties in the U.S. From each of these counties the researcher then contacted each city in those counties (a total of 192) and found that 12 (6.25%) of them provided electricity to their residents.Which of the following are true?Select all that apply. Select one or more: The population of interest is all city governments in the U.S. The population of interest is the 192 cities. The population of interest is all city governments that provide utilities to their residents. The population of interest is 6.25%. The parameter of interest is the proportion of the city governments in the U.S. that provide electricity to their residents. The parameter of interest is 6.25%. The population of interest is the 20 counties. The parameter of interest is the list of all counties in the U.S.

The correct answers are: The population of interest is all city governments in the U.S., The parameter of interest is the proportion of the city governments in the U.S. that provide electricity to their residents.

In the general population in the US, identical twins occur at a rate of 30 per 1,000 live births. A survey records 10,000 births during Jan 2018 to Jan 2019 and found 400 twins in total. Which of the following are true? Select one or more: The proportion of twin births during Jan 2018 to Jan 2019 is .03. The proportion of twin births during Jan 2018 to Jan 2019 is .04. The probability of twin births among the general population is .03. The probability of twin births among the general population is .04. Pr(observing a sample proportion of twin births from a random sample of 10,000 live births <= 0.04) = 0.03. Pr(observing a sample proportion of twin births from a random sample of 10,000 live births <= 0.04) = 0.5. Pr(observing a sample proportion of twin births from a random sample of 10,000 live births <= 0.03) = 0.04. Pr(observing a sample proportion of twin births from a random sample of 10,000 live births <= 0.03) = 0.5.

The correct answers are: The proportion of twin births during Jan 2018 to Jan 2019 is .04., The probability of twin births among the general population is .03., Pr(observing a sample proportion of twin births from a random sample of 10,000 live births <= 0.03) = 0.5.

According to the most recent census, the average total yearly expenses of households in Wake County is $38,500. It is also known that the distribution of household expenses in Wake County is strongly skewed to the right with a standard deviation of $10,500. A researcher is going to randomly select a sample of 5 households from Wake County. Which of the following is true? Select all that apply. Select one or more: a. The sampling distribution of the sample mean will have a smaller standard deviation than the population. b. The sampling distribution of the sample mean will have a different standard deviation than the population. c. The sampling distribution of the sample mean will be skewed right. d. The sampling distribution of the sample mean will have a larger standard deviation than the population. e. We cannot tell what the shape of the sampling distribution of the sample mean will look like. f. The shape of the sampling distribution of the sample mean will be approximately symmetric.

The correct answers are: The sampling distribution of the sample mean will have a smaller standard deviation than the population., The sampling distribution of the sample mean will have a different standard deviation than the population., The sampling distribution of the sample mean will be skewed right.

A professor wanted to determine the proportion of students in his class who have cheated on an exam. The professor selects a random sample of 30 students from his class and emails them the question "Have you ever cheated on an exam?". He receives responses from 10 of the 30 students. Which of the following statements are true?Select all that apply. Select one or more: This study would suffer from undercoverage because the professor should have selected a sample from the entire student body of the university. This study suffers from non-response bias because only 33% of the people surveyed provided a response. This study suffers from response bias since students will not want to tell a professor whether or not they have cheated. The study should not suffer from any bias since it was based on a random sample.

The correct answers are: This study suffers from non-response bias because only 33% of the people surveyed provided a response., This study suffers from response bias since students will not want to tell a professor whether or not they have cheated.

A political action committee wanted to estimate the proportion of county residents who support the installation of red-light cameras throughout the county. They took a random sample of 800 county residents and found that the proportion who wanted to install these cameras was 25% with a margin of error of +/- 4% (with 95% confidence). This implies: Select one or more: a. There is a 95% chance that the true proportion of county residents who want the law changed is 25%. b. We believe that the true proportion of county residents who want the law changed is between 21% and 29%. c. We are 95% confident that the true proportion of county residents who want the law changed is between 21% and 29%. d. If we took another sample of 800 residents the sample proportion would definitely be between 21% and 29%. e. If we take many other samples of 800 residents from this population 95% of them will have a sample proportion that is between 21 and 29%. f. If we take 1000 other samples of 800 residents from this population, about 950 of them will produce confidence intervals that capture the true proportion. g. We cannot conclude anything about the population parameter since this is only a sample.

The correct answers are: We believe that the true proportion of county residents who want the law changed is between 21% and 29%., We are 95% confident that the true proportion of county residents who want the law changed is between 21% and 29%., If we take 1000 other samples of 800 residents from this population, about 950 of them will produce confidence intervals that capture the true proportion. F1. Describe what is meant by a margin of error—that is, the margin of error is an indication of how far a statistic is from the true population parameter. F4. Define confidence level and explain its relationship to the sampling distribution of the statistic. F5. Explain why confidence intervals may miss the true population parameter and how this relates to the definition of the confidence level.

The following simple linear regression analyzes the relationship between the number of classes students are taking (the independent variable, labeled in the following output as X[,2]) and the number of books they have in their backpack (the response) at randomly chosen times. Assume all relevant assumptions are met. Which of the following are correct interpretations of the slope? Select one: a. Each additional class a student takes is associated with about a 58.7% increase in the number of books in their backpack on average. b. Each additional class a student takes is associated with about an additional 0.587 books in their backpack on average. c. Taking an additional class causes students to carry 0.587 extra books with them on average. d. The population average number of books in a student's backpack is 0.587.

The following simple linear regression analyzes the relationship between the number of classes students are taking (the independent variable) and the number of books they have in their backpack (the response) at randomly chosen times. Assume all relevant assumptions are met. Which of the following are correct interpretations of the slope (labeled here as X[,2])? The correct answer is: Each additional class a student takes is associated with about an additional 0.587 books in their backpack on average.

After taking an aptitude test, the computer told Bob that he had a z-score of 1.08.If scores on the aptitude test are normally distributed, which of the following statements can Bob conclude from his score?Select all that apply. Select one or more: Bob scored within 2 standard deviations of the mean score. Bob did better than the mean score. Bob scored within 1 standard deviation of the mean score. Bob did worse than the mean score. About 14% of students taking the aptitude test did better than Bob. About 14% of students taking the aptitude test did worse than Bob.

This question covers the following learning objective: D2. List the key characteristics of the normal distribution. D5. Given a z-score(s), use the standard normal table to find the corresponding probability above, below, or between them. The correct answers are: Bob scored within 2 standard deviations of the mean score., Bob did better than the mean score., About 14% of students taking the aptitude test did better than Bob.

A college professor stops at McDonald's every morning for 10 days to get a number 1 value meal costing $5.39. On the 11th day he orders a number 8 value meal costing $4.38.Which of the following are true?Select all that apply. Select one or more: During the first 10 days the professor's standard deviation was more than 0. During the first 10 days the professor's standard deviation was less than 0. During the first 10 days, the professor's standard deviation was 0. It is impossible to tell anything about the professor's standard deviation for the first 10 days. Considering all 11 days, the professor's standard deviation was lower than the standard deviation of the first 10 days. Considering all 11 days, the professor's standard deviation was higher than the standard deviation of the first 10 days. Considering all 11 days, the professor's standard deviation was the same as the standard deviation of the first 10 days. Considering all 11 days, It is impossible to tell anything about the professor's standard deviation compared to the first 10 days. Feedback

This question covers the following learning objective: a. The standard deviation must be greater than or equal to zero. b. When standard deviation is equal to zero, there is no spread in the data (every value in the data set is the same). C7. Explain the impact of outliers on the value of the summary statistics. The correct answers are: During the first 10 days, the professor's standard deviation was 0., Considering all 11 days, the professor's standard deviation was higher than the standard deviation of the first 10 days.

Residuals: vertical distance between our predicted value and actual value. Also called prediction errors.

e=yi - y-hat-i

If the independent variable (IV) and dependent variable (DV) are swapped in a simple linear regression analysis the correlation coefficient is unchanged.

how does sample size affect p-hat: The standard deviation of (p) hat gets smaller as the sample size n increases because n appears in the denominator of the formula for the standard deviation. That is, (p hat) is less variable in larger samples.

How to take an SRS: 1. compile a numbered list of the units in the population 2. use a computer, calculator, or table to pick items from the list.

sampling frame: LIST of individuals from which wee choose to sample.

Random sample: a sample that uses a random mechanism to select the individuals in the sample. (removes biases)

sampling frame: a list of subjects in the population of interest.


Kaugnay na mga set ng pag-aaral

The States of India and their capitals

View Set

Chapter 14- Research Synthesis (Meta-Analysis)

View Set

Sociology - EXAM 1 - Study Guide

View Set

nclex ch.23 Care of the Older Client

View Set

Ethics Chapter 4 multiple choice

View Set

Chapter 43: PEDIA : Infectious Disorders

View Set