Tingnan ang lahat ng mga set ng pag-aaral

module 4-6

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

If you suspect that you have an influential observation, the first thing you should do is:

check to make sure no error has been made in collecting or recording data.

When studying the relationship between two quantitative variables, an interval estimate of the mean value of y for a given value of x is called a(n):

confidence interval

When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a(n):

confidence interval

A bookstore is interested in genre and author preferences. They form a panel by taking the first 10 people who volunteer while walking by their storefront. They then conduct a brief interview and compile their results. This is an example of:

convenience sampling

Which one of the following statistics measures both the strength and direction of a linear relationship?

correlation coefficient

A numerical measure of linear association between two variables is the:

covariance

The term used to describe the case when the independent variables in a multiple regression model are correlated is:

multicollinearity

The mathematical equation relating the expected value of the dependent variable to the value of the independent variables, which has the form of , is called: E(y)=Bo+B1x1+B2x2+,,,+Bpxp

multiple regression equation

The p-value:

must be a number between 0 and 1

For a fixed sample size, n, in order to have a higher degree of confidence, the margin of error and the width of the interval:

must be larger For a fixed sample size, n, in order to have a higher degree of confidence, the margin of error and the width of the interval must be larger

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size:

n has the same probability of being selected

As a rule of thumb, the sampling distribution of the sample proportion can be approximated by a normal probability distribution when:

n(1 - p) ≥ 5 and np ≥ 5.

For the case where σ is unknown, the test statistic has a t distribution. How many degrees of freedom does it have?

n-1

When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, the appropriate degrees of freedom are

n-2

In computing the standard error of the mean, the finite population correction factor is used when

n/N > 0.05

When developing an interval estimate for the difference between two sample means, with sample sizes of n1 and n2 :

n1 and n2 can be different sizes

Suppose we are constructing an interval estimate for the difference between the means of two populations when the standard deviations of the two populations are unknown. Suppose it can be assumed that the two populations have equal variances. If n1 is the size of sample 1 and n2 is the size of sample 2, we must use a t distribution with:

n1+n2-2 degrees of freedom

The sampling distribution of (pbar1 - pbar2) is approximated by a normal distribution when:

n1p1, n1(1-p1), n2p2,n2(1-p2) are all greater than or equal to 5.

The numerical value of the standard deviation can never be:

negative

For a fixed confidence level and population standard deviation, if we would like to cut our margin of error to 1/3 of the original size, we should take a sample size that is:

nine times as large as the original sample size.

Convenience sampling is a:

non probability sampling technique

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term follows ɛ a(n) _____ distribution for all values of x.

normal

All things held constant, which interval will be wider: a confidence interval or a prediction interval?

prediction interval

When studying the relationship between two quantitative variables, whenever we want to predict an individual value of y for a new observation corresponding to a given value of x, we should use a(n):

prediction interval

The standard deviation of a point estimator is called the:

standard error

As the sample size increases, the:

standard error of the mean decreases

Suppose a high correlation existed between variables x 1 and x 2 . If variable x 1 was used as an independent variable, then variable x 2 :

would not add much more explanatory power to the current model

A random sample of 12 four-year-old red pine trees was selected and the diameter (in inches) of each tree's main stem was measured. The resulting observations are as follows: 11.3, 10.7, 12.4, 15.2, 10.1, 12.1 , 16.2, 10.5, 11.4, 11.0, 10.7, and 12.0 Find the point estimate that can be used to estimate the true population mean.

xbar = 11.97

What is the symbol for the sample mean?

x̄

In a simple linear regression model, the error term ε accounts for the variability in ______ that cannot be explained by the linear relationship between x and y.

A local entrepreneur would like to know if those who live in an urban or rural community are more likely to buy a real Christmas tree. He takes a random sample of 100 people who reside in the city and a separate random sample of 100 people who live in the country and asks them if they buy a real tree at Christmas time. Of the urban participants, 22 buy a real tree. Of the rural participants, 28 buy a real tree. Let p1 = the proportion of all people who live in rural communities and buy a real Christmas tree, and let p2 = the proportion of all people who live in urban communities and buy a real Christmas tree. Based upon this data, can we assume a normal distribution?

yes n1p1 = 28, n1(1-p1) = 72, n2p2 = 22 and n2(1-p2) = 78, which is all greater than 5

The following data represent a company's yearly sales volume and its advertising expenditure over a period of 8 years. Use the least squares method to develop the estimated regression equation.

yhat = -10.42 + .79x

As the test statistic becomes larger, the p-value:

becomes smaller

When the level of confidence decreases, the margin of error:

becomes smaller

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. Find a 90% confidence interval for the difference in the population means.

(.43, 1.07)

42 continued

44 c

57 c

67 continued

A company runs two production lines, A and B, for packaging canned vegetables. If a can has a dent or any visual imperfection, it is considered to be defective. From production line A, a random sample of 200 cans are selected, and it is determined that 12 are defective. From production line B, a random sample of 300 cans are selected, and it is determined that 21 are defective. Develop a 90% confidence interval for the difference between the two population proportions.

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. Which of the following correctly gives the test statistic?

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours, and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. What is the standard error of the difference in the means?

The daily production rates for a sample of factory workers before and after a training program are shown below. Let d = After - Before. We want to determine if the training program is effective. Which of the following options gives the appropriate test statistic?

The following data show the monthly sales in units of six salespersons before and after a bonus plan was introduced. Let the difference "d" be d = After - Before. We would like to calculate a 99% confidence interval to estimate the mean difference in monthly sales for all sales individuals at this company. What is the margin of error?

The following information regarding the number of semester hours taken from random samples of day and evening students is provided. What is the margin of error for a 95% confidence interval estimate for the difference between the mean semester hours taken by the two groups of students?

The standard error of (xbar1-xbar2) is given by:

The correlation coefficient ranges between:

-1 and 1

In regression analysis, the equation in the form y = 𝛽0 + 𝛽1x + ε is called the:

simple linear regression model

In multiple regression analysis, any observation with a standardized residual of less than _____ or greater than _____ is known as an outlier.

-2,2 In multiple regression analysis, any observation with a standardized residual of less than -2 or greater than +2 is known as an outlier.

The following data show the monthly sales in units of six salespersons before and after a bonus plan was introduced. Let the difference "d" be d = After - Before. Calculate a 99% confidence interval to estimate the mean difference in monthly sales for all salespersons before and after the bonus plan was introduced.

-2.9 to 8.9

In a sample of size five, the mean is 23 and four of the observations have the following deviations from the mean: -6, 2, 5, and 3. What is the value of the fifth observation?

-4 The value of the fifth observation is -4. The sum of deviations from the mean must sum to 0

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. What is the p-value of the test? Should the null hypothesis be rejected?

.000061 the null hypothesis should be rejected

The manager of a grocery store has taken a random sample of 100 customers. The average length of time it took these 100 customers to check out was 3.0 minutes. It is known that the standard deviation of the population of checkout times is one minute. The standard error of the mean is equal to:

.1 1/sqrt(100)

Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes from 2 to 3 minutes to fill an order?

.1283

A local entrepreneur would like to know if those who live in an urban or rural community are more likely to buy a real Christmas tree. He takes a random sample of 100 people who reside in the city and a separate random sample of 100 people who live in the village and asks them if they buy a real tree at Christmas time. Of the urban participants, 22 buy a real tree. Of the rural participants, 28 buy a real tree. Let p1 = the proportion of all people who live in rural communities and buy a real Christmas tree, and let p2 = the proportion of all people who live in urban communities and buy a real Christmas tree. What is the p-value?

.16 The p-value is .16 based upon a standard normal curve with a test statistic of .98

The manager of a grocery store has taken a random sample of 100 customers. The average length of time it took these 100 customers to checkout was 3 minutes. It is known that the standard deviation of the population of checkout times is 1 minute. With a .95 probability, the sample mean will provide a margin of error of:

.196 za/2(o/sqrtn) 1.96(1/sqrt100)

Random samples of size 100 are taken from an infinite population whose population proportion is .2. The mean and standard deviation of the sample proportion are:

.2 and 0.4 SD of the sample proportion = sqrt ((.2*.8)/100)

The time it takes to ring up a customer at the grocery store follows an exponential distribution with a mean of 3.5 minutes. What is the probability that it takes more than 5 minutes to ring up a customer?

.2397

.25

The height of the probability density function for a uniform distribution ranging between 2 and 6 is:

.25 The total area must equal one. The width is 4; therefore, the height must be 1/4 or .25

A poll of 600 voters showed 210 that were in favor of stricter gun control measures. Develop a 90% confidence interval estimate for the proportion of all the voters who are in favor of stricter gun control measures.

.32 to .38

The CEO of a company wants to estimate the percent of employees who use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged on to Facebook that day. What is the point estimate of the proportion of the population who logged on to Facebook that day?

.35 53/150

For the standard normal probability distribution, the area to the left of the mean is:

.50 In a standard normal probability distribution, the mean is located at the center of the, curve which means that half of the area is below the mean and half is above the mean.

When computing the sample size needed to estimate a proportion within a given margin of error for a specific confidence level, what planning value of p* should be used when no estimate of p*is available?

.50 \The planning value of p* = .50 should be used to find the sample size when no estimate of p* is available.

In a regression analysis, if SSE = 200 and SSR = 300, then the coefficient of determination is:

A doctor records a pair of information (temperature when admitted, temperature 24 hours later) for 400 hospital patients selected at random. He finds that the average patients' temperature upon admission is 101 degrees and 99 degrees 24 hours later. The standard deviation is 1.5 degrees for temperatures when admitted and .5 degrees 24 hours later. The correlation between the two temperatures is .8. What is the covariance between the variables?

.60

Multicollinearity can cause problems if the absolute value of the sample correlation coefficient exceeds:

.7 for any two of the independent variables

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours, and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviation to be 1.25 hours and 1.5 hours for men and women, respectively. What is the point estimate for the difference between the mean number of hours that women and men sleep?

.75 The point estimate for the difference between the mean number of hours that women and men sleep is the difference in the sample means. .

For a multiple regression model, SSR = 600 and SSE = 200. The multiple coefficient of determination is:

.77

The correlation coefficient between two scores X and Y equals .80. If both the X scores and the Yscores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be:

.80

If z is a standard normal random variable, then compute P(-1.5 < z < 1.5).

.87

In a multiple regression model, the error term ε is assumed to have a mean of:

The random variable x is known to be uniformly distributed between 2 and 12. Compute P(x = 6).

0 A single point is an interval of zero width. This implies that the probability of a continuous random variable assuming any particular value exactly is zero

The range of the Durbin-Watson statistic is:

0 to 4

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term ɛ is a random variable with a mean or expected value of:

The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get at least 100 miles per gallon?

0.006 The z-score that corresponds with the car getting 100 mpg is z = 2.5. P(z ≥ 2.5) = .006.

A company runs two production lines, A and B, for packaging canned vegetables. If a can has a dent or any visual imperfection, it is considered to be defective. From production line A, a random sample of 200 cans are selected, and it is determined that 12 are defective. From production line B, a random sample of 300 cans are selected, and it is determined that 21 are defective. What is the point estimate of the difference between the two population proportions?

0.01

A sample of 100 footballs showed an average air pressure of 13 psi. The standard deviation of the population is known to be .25 psi. What is the standard error of the mean?

0.025 SD/sqrt of n

The CEO of a company wants to estimate the percent of employees who use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the standard error of the proportion ?

0.039

A random sample of 150 people was taken from a very large population. Ninety of the people in the sample were female. The standard error of the proportion is:

0.0400 sqrt((.6*.4)/150)

A sample of 100 footballs showed an average air pressure of 13 psi. The standard deviation of the population is known to be .25 psi. With a .90 probability, the margin of error is approximately equal to:

0.041 Za/2(o/sqrt(n)) 1.645(.25/sqrt(100))

A sample of 400 observations will be taken from an infinite population. The population proportion equals .8. The probability that the sample proportion will be greater than .83 is:

0.0668 sqrt((.8*.2)/400)

0.076

0.4866

The random variable x is known to be uniformly distributed between 2 and 12. Compute P(x > 6)

0.60 P(x > 6) = (width)(height) = (6)(1/10) = .60. The probability of a continuous random variable assuming a value in any interval is the same whether or not the end points are included

The random variable x is known to be uniformly distributed between 2 and 12. Compute P(x ≥ 6).

0.60 P(x ≥ 6) = (width)(height) = (6)(1/10) = .60

Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. What is the value of sb1 ?

0.9038

A sample of 66 observations will be taken from an infinite population. The population proportion equals .12. The probability that the sample proportion will be less than .1768 is:

0.92

If z is a standard normal random variable, then compute P(z > -1.75).

0.96

A health conscious student faithfully wears a device that tracks his steps. Suppose that the number of steps he takes is normally distributed with a mean of 10,000 and a standard deviation of 1500 steps. What is the probability that he takes less than 15,000 steps in a given day?

0.9996 The z-score for 15,000 steps is z = 3.33. P(z ≤ 3.33) = .9996

Male college basketball players have to weigh-in during season, and this information is published. We can, therefore, know the standard deviation of the entire population. Suppose we do not know the population mean and wanted to estimate it. Suppose we took a random sample of 25 male college basketball players and recorded their weights. The sample mean was found to be 220 lbs. The population standard deviation was 5 lbs. What is the standard error of the mean?

1 SD/sqrt of n 5/sqrt25

Assuming a bell-shaped distribution with mean of 66 and standard deviation of 2, calculate the z-score for X = 68 and X = 69, respectively.

1 and 1.5 The z-scores for the two X values are 1 and 1.5, respectively

A simple random sample of 100 observations was taken from a large population. The sample mean and the standard deviation were determined to be 80 and 12, respectively. The standard error of the mean is:

1.2 σx= (σ/sqrt(100))

1.414

1.5

When dealing with the problem of nonconstant variance, we can apply the reciprocal transformation, which means that we use:

1/y as the dependent variable instead of y

Suppose a 95% confidence interval, based upon a sample of size 25, for the mean number of hours of sleep that college students get per night was 5 to 8 hours. How many students should be surveyed in order to cut the width of the interval down from 3 hours to 1.5 hours?

100 4x25

A professor would like to estimate the average number of hours his students spend doing work and studying throughout the semester for the course he teaches. He wants to estimate μ with 99% confidence and with a margin of error of at most 2 hours. From past experience, he believes that the standard deviation of the number of hours students spent is 8 hours. How many students need to be surveyed to meet these requirements?

107

An elementary school teacher asked a random sample of 12 of her students what their favorite number was. Assume the population of responses would follow a normal distribution. The students stated that their favorite numbers are: 2,10,7,4,0,5,6,4,4,6,1,100 What is the appropriate degrees of freedom to use when calculating a 95% confidence interval for μ ?

11 DF=n-1=11

The sample size needed to provide a margin of error of 2 or less with 95% confidence when the population standard deviation is equal to 11 is:

117 ((Za/2)^2o^2)/E^2 ((1.96)^2*(11)^2)/2^2

The following estimated regression model was developed to predict yearly income (in $1,000s) of 30 individuals with their age (x1) and their gender (x2) (0 if male and 1 if female) for a sample of 50 engineers. y=-10+4x1+7x2 What is the estimated income of a 30-year-old female?

11700

Suppose we have the following data: 12, 17, 13, 25, 16, 21, 30, 14, 16, and 18. To find the 10% trimmed mean, what numbers should be deleted from the calculation?

12 and 30

A health conscious student faithfully wears a device that tracks his steps. Suppose that the number of steps he takes is normally distributed with a mean of 10,000 and a standard deviation of 1500 steps. How many steps would he have to take to make the cut for the top 5% for his distribution?

12,467 To be at the top 5% of his distribution, he would need a z-score of 1.645. This equates to taking 12,467 steps

The time it takes to ring up a customer at the grocery store follows an exponential distribution with a mean of 3.5 minutes. What is the variance of this distribution?

12.25 SD=3.5 variance=3.5^2=12.25

12.42 +- 2.201 (27.7/sqrt12)

Given the following information: Standard deviation = 8 Coefficient of variation = 64% The mean is

12.5 SD/CV

A sample of 100 footballs showed an average air pressure of 13 psi. The standard deviation of the population is known to be .25 psi. The 99% confidence interval for the true mean air pressure for all footballs is:

13 +- 2.576 (.25/sqrt100)

Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. Predict approximately how many people will use the public pool in a day when the temperature is 90 degrees.

131

The following information regarding the number of semester hours taken from random samples of day and evening students is provided. We would like to know if the difference in the mean semester hours taken by the two groups of students is statistically significant at the ⍺ = .05 level. What is the test statistic?

14.08

The following results are for independent random samples taken from two populations We are not willing to assume that the population standard deviations are equal. What is the appropriate degrees of freedom?

143 Using the degrees of freedom formula, the appropriate degrees of freedom for this set of statistics is 143.97, which we round down to 143. See Section 10.2, Inferences About the Difference Between Two Population Means and SD Unknown.

There are 6 children in a family. The number of children defines a population. The number of simple random samples of size 2 (without replacement) that are possible is equal to:

A simple random sample of 64 observations was taken from a large population. The sample mean and standard deviation were determined to be 320 and 120, respectively. The standard error of the mean is:

15 ox=o/sqrt(n)

The standard deviation of a sample was reported to be 20.The report indicated that Sigma(xi-xbar)^2 = 7200. What is the sample size?

A random sample of 144 observations has a mean of 20, a median of 21, and a mode of 22. The population standard deviation is known and is equal to 4.8. The 95.44% confidence interval for the population mean is:

19.200 to 20.800

2.25 mins variance=SD^2=1.5^2

2.28% The z-score for 13,000 steps is z = 2. P(z ≥ 2) = 2.28%

In an interval estimation for a proportion of a population, the z value for 99% confidence is:

2.576

The z value for a 99% confidence interval estimation is:

2.58 The associated 𝛼 for 99% confidence is .01 because 1 - .99 = .01. To find the z value, we must look up 𝛼/2, or .005, in a standard normal table. The z value for a 99% confidence interval estimation is 2.58

A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. What is the standard error of the mean?

2.846 s/sqrt(n) 9/sqrt10

The random variable x is known to be uniformly distributed between 2 and 12. Compute the standard deviation of x.

2.887 The variance of a random variable that is uniformly distributed is Var(x) = (b - a)^2/12 = (12 - 2)^2/12 = 8.333. The standard deviation is the square root of the variance, so the standard deviation is 2.887

In order to determine an interval for the mean of a population with unknown standard deviation, a sample of 24 items is selected. The mean of the sample is determined to be 23. The number of degrees of freedom for reading the t value is:

23 n-1

A researcher is interested to determine the average age at which people obtain their first credit card. If past information shows a mean of 22 years and a standard deviation of 2 years, what size sample should be taken so that at 95% confidence the margin of error will be 3 months or less?

246

Suppose that a basketball player scored, on average, 15 points per game. Also suppose that the distribution of points scored by this player was normal. If he scores 20 points or more 4.78% of the time, what is his standard deviation?

The t value for a 99% confidence interval estimation based upon a sample of size 10 is:

3.250 The t value for a 99% confidence interval estimation based upon a sample of size 10 is 3.250. Use a t distribution table with 9 degrees of freedom and an upper tail area of .005 to look up this value.

The time it takes to ring up a customer at the grocery store follows an exponential distribution with a mean of 3.5 minutes. What is the standard deviation of this distribution?

3.5 A property of the exponential distribution is that the mean of the distribution and the standard deviation of the distribution are equal, so the standard deviation is 3.5 minutes

The summary statistics for the hourly wages of a sample of 130 system analysts are given below: Mean = 60 Range = 20 Mode = 73 Variance = 324 Median = 74 The coefficient of variation is equal to:

30% The coefficient of variation is (the square root of 324, divided by 60. then multiplied by 100)

A multiple regression model has the form yhat=5+6x1-7x2. Predict when and x1 and x2.

A data set has the following five-number summary: {31, 50, 58, 62, and 87}. Which of the following pairs of values in this data set would be considered outliers?

31 and 81 An outlier is any value that is more than 1.5(IQR) above Q3 or below Q1. The values 31 and 81 would be considered outliers

A local health center noted that in a sample of 400 patients 80 were referred to them by the local hospital. What size sample would be required to estimate the proportion of hospital referrals with a margin of error of .04 or less at 95% confidence?

385

A researcher is interested to determine the average number of years new teachers remain in the classroom. If past information shows a standard deviation of 5 years, what size sample should be taken so that at 95% confidence the margin of error will be 6 months or less?

385

An analyst for a cell phone company would like to know the average age at which people obtain their first cell phone. Past records show a mean of 14 years and a standard deviation of 8 years. What sample size should be taken so that at 99% confidence the margin of error will be 1 year or less?

425

Growth factors for the population of Dallas in the past five years have been 1, 2, 3, 4, and 5, respectively. The geometric mean is:

5 square root of 120

How many simple random samples of size 5 can be selected from a population of size 8?

The following information regarding the number of semester hours taken from random samples of day and evening students is provided. Find a 95% confidence interval estimate for the difference between the mean semester hours taken by the two groups of students.

6.354 to 8.446

The CEO of a company wants to estimate the percent of employees who use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. The CEO would like to cut the margin of error in half without changing the confidence level. How large of a sample size is needed?

600 To cut the margin of error in half, the sample size must quadruple. n = 600.

The above is a histogram showing the actual frequency of an average college professor's salary from 50 randomly selected colleges. Based on the frequency histogram for salaries, the bin that contains the 80th percentile is:

63-68 The value 65 is at the 80th percentile. The location of the pth percentile can be calculated using the formula Lp= (p/100)(n+1) . See Section 3.1, Measures of Location

The heights of adult women are approximately normally distributed about a mean of 65 inches, with a standard deviation of 2 inches. If Rachel is at the 95th percentile in height for adult women, then her height is closest to:

69 inches Using the Empirical rule, approximately 68% of the observations are within 1 standard deviation of the mean. Approximately 95% are within 2 standard deviations of the mean. By drawing a normal curve, it can be seen that 69 is the closest to the 95th percentile.

In a sample of 500 students whose mean height is 67.8 inches, 150 were women. If the mean height of the women was 63.0 inches, what is the mean height of the men in the sample?

69.9 inches (150(63)+350(x))/500

The random variable x is known to be uniformly distributed between 2 and 12. Compute E(x).

7 E(x) = (a + b)/2 = (2 + 12)/2 = 7

The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What value represents the 50th percentile of this distribution?

A survey conducted a local university asked students how many hours they studied a week. The survey showed that on average, students at this university study 7.5 hours per week with a standard deviation of 1.25 hours. What percentage of students study between 5 hours and 10 hours per week?

75% The data values of 5 and 10 hours per week are 2 standard deviations away from the mean. According to Chebyshev's theorem, 75% of the data values must be within 2 standard deviations from the mean

The following results are for independent random samples taken from two populations. We are not willing to assume that the population standard deviations are equal. What is the appropriate degrees of freedom?

The following data show the results of an aptitude test and the grade point average of 10 students. The t test for a significant relationship between GPA and Aptitude Test Score is based on a t distribution with _____ degrees of freedom.

8 A t test for slope is based on a t distribution with n - 2 degrees of freedom. The sample size for this problem is n = 10, so there are 8 degrees of freedom.

8.56 to 21.44 15 +- 2.262 (9/sqrt10)

An outlier should remain in the data set under what circumstances?

An outlier should remain in the data set when an unusual data value has been recorded correctly.

The following information regarding the number of semester hours taken from random samples of day and evening students is provided. What is the appropriate degrees of freedom for calculating a 95% confidence interval for the difference between the mean semester hours taken by the two groups of students?

The following regression model has been proposed to predict sales at a gas station:y hat = 10-4x1+7x2+18x3, where x1= competitor's previous day's sales (in $1,000s), x2= population within five miles (in 1,000s), x3= 1 if any form of advertising was used, 0 if otherwise, and = sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within five miles, and six radio advertisements.

86,000 =10-4(3)+7(10)+18(1)

A student had scores of 85, 56, and 91 on her first three statistics tests. What score does she need to get on her next test to have a test average of 80?

88 (85+56+91+x)/4 = 80

Suppose, after calculating an estimated multiple regression equation, we find that the value of R2is .9201. Interpret this value.

92.01% the variability in y can be explained by the estimated regression equation.

With negative autocorrelation, we expect a positive residual in one period to be followed by:

A negative residual in the next period, then a positive residual, and so on

Which of the following scenarios follows a matched sample design?

A teacher uses a pretest and then a posttest with her students to see how much they have improved.

Which of the following describes a Type II Error?

Accept H0 when H0 is false.

The following data show the results of an aptitude test and the grade point average of 10 students. If GPA and Aptitude Test Scores are linearly related, which of the following must be true?

B1 =/ 0 If x and y are linearly related, then B1 =/ 0

The Durbin-Watson test is generally inconclusive for:

smaller sample sizes

The fact that the sampling distribution of sample means can be approximated by a normal probability distribution whenever the sample size becomes large is based on the:

Central limit theorem

A data set has 10 observations. Observation 5 seems much larger than the other observations. When looking back at the data, you notice a mistake has been made in recording observation 5. Including this wrongly recorded observation in the data set has a substantial effect on the goodness of fit. What should you do with the wrongly recorded observation?

Change it to the correctly recorded observation.

What test can be used to determine whether first-order autocorrelation is present?

Durbin-Watson test

Which of the following would not be a correct interpretation of correlation coefficient of r = -.89?

Eighty-nine percent of the variation between the two variables in the linear regression is explained.

A regression model involving 8 independent variables for a sample of 69 periods resulted in the following sum of squares: SSE = 306, SST = 1800. At α = .05, test to determine whether or not the model is significant. State the F value and your conclusion.

F = 36.62; p-value < .05. The model is significant.

The following data show the results of an aptitude test and the grade point average of 10 students. At 95% confidence, test to determine if the model is significant (Perform an F test). What is the test statistic and p-value ?

F = 39.07 and p-value =0.0002 The test statistic is F = 39.07, and the p-value = .0002. F = MSR/MSE. The p-value is based on an F distribution with 1 degree of freedom in the numerator and n − 2 degrees of freedom in the denominator.

Which of the following is(are) true I. The mean of a population depends on the particular sample chosen. II. The standard deviations of two different samples from the same population may be the same. III. Statistical inferences can be used to draw conclusions about the populations based on sample data.

II and III

Which of the following statements about the backward elimination procedure is false?

It begins with zero independent variables

Which of these best describes a sampling distribution of a statistic?

It is the distribution of all of the statistics calculated from all possible samples of the same sample size.

When drawing a sample from a population, the goal is for the sample to:

Match the targeted population

The proportion of the variability in the dependent variable that can be explained by the estimated multiple regression equation is called the:

Multiple coefficient of determination

The study of how a dependent variable y is related to two or more independent variables is called:

Multiple regression analysis

The daily production rates for a sample of factory workers before and after a training program are shown below. Let d = After - Before. We want to determine if the training program was effective. Conduct a hypothesis test using ⍺ = .05. What is your conclusion?

The p-value is less than .05, so we have enough evidence to conclude that the training program was effective.

A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. What is the p-value? State your conclusion. Use = .05.

The p-value is greater than .10. We do not have enough evidence to conclude that the inclusion of x 3 results in a significant reduction in the error sum of squares.

A local entrepreneur would like to know if those who live in an urban or rural community are more likely to buy a real Christmas tree. He takes a random sample of 100 people who reside in the city and a separate random sample of 100 people who live in the country and asks them if they buy a real tree at Christmas time. Of the urban participants, 22 buy a real tree. Of the rural participants, 28 buy a real tree. Let p1 = the proportion of all people who live in rural communities and buy a real Christmas tree and let p2 = the proportion of all people who live in urban communities and buy a real Christmas tree. Do we have enough evidence at the ⍺ = .05 level to conclude that the proportion of people who buy a real Christmas tree is greater for those living in rural than urban communities?

No, the p-value is greater than .05, so we do not have enough evidence to conclude that the proportion of people who buy a real Christmas tree is greater for those living in rural than urban communities. The p-value is greater than .05, so we do not have enough evidence to conclude that the proportion of people who buy a real Christmas tree is greater for those living in rural than urban communities.

A company runs two production lines, A and B, when packaging canned vegetables. If a can has a dent or any visual imperfection it is considered to be defective. From production line A, a random sample of 200 cans are selected, and it is determined that 12 are defective. From production line B, a random sample of 300 cans are selected, and it is determined that 21 are defective. Develop a 90% confidence interval for the difference between the two population proportions. Based upon the calculated confidence interval, do we have evidence to conclude that the proportion of defective cans differs for the two production lines?

No, the value of zero is contained in the 90% confidence interval, so we do not have the evidence to conclude that the proportion of defective cans differs for the two production lines.

A sample of 92 observations is taken from an infinite population. The sampling distribution of is approximately normal because

Of the central limit theorem

Which of the following statements is false?

Regression analysis can be interpreted as a procedure for establishing a cause-and-effect relationship between variables.

Which of the following describes a Type I Error?

Reject H0 when H0 is true.

The following data represent a company's yearly sales volume and its advertising expenditure over a period of 8 years. Identify the independent and dependent variables.

The independent variable is the advertising expenses, and the dependent variable is sales.

A student was performing an experiment that compared a new high protein food to the old food for pigs. He found the mean weight gain for subjects consuming the new food to be 12.8 pounds with a standard deviation of 3.5 pounds. Later, he realized that the scale was out of calibration by 1.5 pounds (meaning that the scale weighed items 1.5 pounds more). What should the mean and standard deviation be for the subjects consuming the new food?

The mean should be 11.3, and the standard deviation should remain unchanged

When testing water for chemical impurities, results are often reported as BDL, that is, below detection limit. The following data set gives the measurements of the amount of lead in a series of water samples taken from inner-city households (ppm): 5, 7, 12, BDL, 10, 8, BDL, 20, 6 Which of the following is correct?

The median lead level in the water is 7 ppm. First, order the values from least to greatest: BDL, BDL, 5, 6, 7, 8, 10, 12, and 20. The middle value, 7, is the median

When carrying out an F test to determine if the addition of extra predictor variables results in a significant reduction in the error sum of squares, what are the degrees of freedom of the numerator and denominator of the F statistic?

The numerator degrees of freedom equals the number of predictors added to the model. The denominator degrees of freedom is n—p—1.

The numerator has 1 degree of freedom, and the denominator has 24 degrees of freedom.

Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. Test for a significant relationship between the number of people who use the public pool and the outside temperature. Use ⍺ =.05. State your conclusion.

The p-value < .05. The data provide evidence of a significant relationship between the number of people who use the public pool and the outside temperature. The p-value < .05. From the computer printout, the p-value = .000. The data provide evidence of a significant relationship between the number of people who use the public pool and the outside temperature

A researcher would like to know whether or not the addition of three variables to a model will result in a significant reduction in the error sum of squares. She currently has a first-order model with two predictor variables based upon a sample of 25 observations. For this model, SSE = 725. Then, she estimated the relationship with a first-order model with three additional predictor variables x3, x4 , and x5. The SSE for the new model is 320. Determine the p-value. State your conclusion. Use = .05.

The p-value is less than .05. We have enough evidence to conclude that the inclusion of the three additional predictor variables results in a significant reduction in the error sum of squares.

The average gasoline price of one of the major oil companies has been hovering around $2.20 per gallon. Because of cost reduction measures, it is announced that there will be a significant reduction in the average price over the next month. In order to test this belief, we wait one month, then randomly select a sample of 36 of the company's gas stations. We find that the average price for the stations in the sample was $2.15. The standard deviation of the prices for the selected gas stations is $.10. Given that the p-value is .002, state the conclusion. Use 𝛼 = 0.01.

The p-value is less than 𝛼 = .01, so we can reject H0 .

The following data represent a company's yearly sales volume and its advertising expenditure over a period of 8 years. Create a scatter diagram in order to answer the following question: What does the scatter diagram indicate about the relationship between the two variables?

The scatter diagram indicates a positive relationship between advertising expenses and sales.

Which of the following statements regarding the sampling distribution of sample means is incorrect?

The standard deviation of the sampling distribution is the standard deviation of the population.

The test statistic follows a t distribution because the mean and standard deviation are both calculated from the sample

The following information regarding the number of semester hours taken from random samples of day and evening students is provided. Is there a significant difference in the mean semester hours taken by the two groups of students at the ⍺ = .05 level?

Yes, the p-value is less than the significance level of .05.

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. Based on the sample data, can you conclude that the women get less sleep than men do, on average?

Yes, the p-value is less than ⍺ = .05, so the sample results do provide sufficient evidence to conclude that women sleep less than men do, on average.

What is the probability of making a Type I error?

The mathematical equation that explains how the dependent variable y is related to several independent variables and has the form y=Bo+B1x1+B2x2+.....+Bpxp is called

a multiple regression model

The correlation coefficient is:

a number between -1 and 1 inclusive that measures the strength and direction of the linear relationship between two numerical variables.

The medical director of a company looks at the medical records of all 50 employees and finds that the mean systolic blood pressure for these employees is 126.07. The value of 126.07 is:

a parameter

A random sample of 121 bottles of cologne showed an average content of 4 ounces. It is known that the standard deviation of the contents (i.e., of the population) is .22 ounces. In this problem, the value .22 ounces is:

a parameter.

Positive values of covariance indicate:

a positive relation between the x values and y values.

Cluster sampling is:

a probability sampling method

A normal distribution with a mean of zero and a standard deviation of one is called:

a standard normal probability distribution

a t distribution should be used because o is unknown

If we are interested in testing whether the proportion of items in population 1 is larger than the proportion of items in population 2, then the:

alternative hypothesis should state p1 -p2 > 0

The sum of deviations of the individual data elements from their mean is:

always equal to zero

The multiple regression equation based on the sample data, which has the form of, is called: y=bo+b1x1+b2x2+...+bpxp

an estimated multiple regression equation

When working with regression analysis, an outlier is:

any observation that does not fit the trend shown by the remaining data

The estimated regression equation, Yhat = -10.42 + .79x , can be used to predict a company's sales volume (y), in millions, based upon its advertising expenditure (x), in $10,000s. What is the company's predicted sales volume if they spend $500,000 on advertising?

approx $29 million $500,000 is (50)(10,000), so x = 50. Substituting x = 50 into the regression equation gives ŷ = 29.08, which is in millions of dollars. See Section 14.2, Least Squares Method.

Suppose a residual plot of x verses the residuals, y - ŷ, shows a nonconstant variance. In particular, as the values of x increase, suppose that the values of the residuals also increase. This means that:

as the values of x get larger, the ability to predict y becomes less accurate.

For a lower tail test, the p-value is the probability of obtaining a value for the test statistic:

at least as small as the provided by the sample

For a two-tailed test, the p-value is the probability of obtaining a value for the test statistic:

at least as unlikely or as more unlikely than that provided by the sample.

Correlation in the errors that arises when the error terms at successive points in time are related is called:

autocorrelation

In interval estimation, as the sample size becomes larger, the interval estimate:

becomes narrow

As the number of degrees of freedom for a t distribution increases, the difference between the tdistribution and the standard normal distribution:

becomes smaller

Using an α = .04, a confidence interval for a population proportion is determined to be .65 to .75. If the level of significance is decreased, the interval for the population proportion:

becomes wider

The variable selection procedure that identifies the best regression equation, given a specified number of independent variables, is:

best-subsets regression

Which of the following is not an iterative variable selection procedure?

best-subsets regression

Which of the following options guarantees that the best model for a given number of variables will be found?

best-subsets regression

In multiple regression analysis, the general linear model:

can be used to accommodate curvilinear relationships between the independent variables and the dependent variable.

The coefficient of determination:

cannot be negative

If two large independent random samples are taken from two populations, the sampling distribution of the difference between the two sample means:

can be approximated by a normal distribution

If the coefficient of determination is a positive value, then the coefficient of correlation:

can be either negative or positive

The value of the coefficient of correlation (r):

can be equal to the value of the coefficient of determination (r^2).

As the sample size increases, the margin of error:

decreases

In regression analysis, the variable that is being predicted is the:

dependent variable

If the value of y in time period t is related to its value in time period t - 1, we say that:

first-order autocorrelation is present

Whenever the probability of making a Type II error has not been determined and controlled, only two conclusions are possible. We either reject H0 or:

do not reject H0.

A variable used to model the effect of categorical independent variables is called a(n):

dummy variable

A simple random sample of size n from an infinite population is a sample selected such that:

each element is selected independently and is selected from the same population.

The central limit theorem is important in statistics because it:

enables reasonably accurate probabilities to be determined for events involving the sample average when the sample size is large regardless of the distribution of the variable.

The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the p independent variables is the:

error term, e

The model developed from sample data that has the form yhat = B0 + B1x is known as the:

estimated simple linear regression equation

If a significant relationship exists between x and y and the coefficient of determination shows that the fit is good, the estimated regression equation should be useful for:

estimation and prediction

If the population follows a normal distribution, the confidence interval is _____ and can be used for any sample size. If the population does not follow a normal distribution, the confidence interval will be _____. Which of the following choices correctly complete this statement?

exact, approx

A continuous probability distribution that is useful in describing the time, or space, between occurrences of an event is a(n):

exponential probability distribution

If arrivals follow a Poisson probability distribution, the time between successive arrivals must follow a(n):

exponential probability distribution

There is a lower limit but no upper limit for a random variable that follows the:

exponential probability distribution

For a fixed confidence level and population standard deviation, if we would like to cut our margin of error in half, we should take a sample size that is:

four times as large s the original sample size

Which of the following variables is categorical? height, gender, weight, age

gender Gender is a categorical variable. It puts individuals into groups/categories.

Looking at the sample correlation coefficients between the response variable and each of the independent variables can give us a quick indication of which independent variables are, by themselves,

good predictors

When data are positively skewed, the mean will usually be

greater than the median

A recent study found that hamburger fat calories "x" had a positive linear association with the amount of sodium "y" found in that hamburger. This can be interpreted to indicate that:

hamburgers with a low amount of fat calories tend to have a low amount of sodium.

Observations with extreme values for the independent variables are called:

high leverage points

A student wants to determine if pennies are really fair, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. What are the appropriate null and alternative hypotheses?

ho: p = .5, ha: p =/ .5

It has been stated that at least 75 out of every 100 people who go to the movies on Saturday night buy popcorn. Identify the null and alternative hypotheses.

ho: p >_ .75, ha: p < .75

A local entrepreneur would like to know if those who live in a rural community are more likely to buy a real Christmas tree than those who live in an urban community. He takes a random sample of 100 people who reside in the city and a separate random sample of 100 people who live in the country and asks them if they buy a real tree at Christmas time. Of the urban participants, 22 buy a real tree. Of the rural participants, 28 buy a real tree. Let P1 = the proportion of all people who live in rural communities and buy a real Christmas tree, and let P2 = the proportion of all people who live in urban communities and buy a real Christmas tree. State the null and alternative hypotheses.

ho: p1 -p2 <_ 0, ha: p1-p2 >0

The average hourly wage of computer programmers with 2 years of experience has been $21.80. Because of high demand for computer programmers, it is believed there has been a significant increase in the average wage of computer programmers. To test whether there has been an increase, the correct hypotheses to be tested are:

ho: u <_ 21.80, Ha: u > 21.80

which of the following null hypotheses cannot be correct

ho: u =/ 10

For a uniform probability density function, the height of the function:

is the same for each value of x

The average gasoline price of one of the major oil companies has been hovering around $2.20 per gallon. Because of cost reduction measures, it is announced that there will be a significant reduction in the average price over the next month. In order to test this belief, we wait one month, then randomly select a sample of 36 of the company's gas stations. We find that the average price for the stations in the sample was $2.15. The standard deviation of the prices for the selected gas stations is $.10. State the appropriate null and alternative hypotheses for testing the company's claim.

ho: u >_ 2.20, ha: <2.20

The average number of hours for a random sample of mail order pharmacists from company A was 50.1 hours last year. It is believed that changes to medical insurance have led to a reduction in the average work week. To test the validity of this belief, the hypotheses are:

ho: u >_ 50.1, ha: u < 50.1

A news reporter states that the average number of temperature in January has never dropped below 10 degrees Fahrenheit. You go online to research this claim. The appropriate hypotheses are:

ho: u >_10, ha: u<10

A fast food restaurant has automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, undoubtedly some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?

ho: u=12, ha:u =/ 12

A doctor would like to know if men and women got the same amount of sleep per night or if women tended to get less sleep than men. He took a random sample of 100 of his male and 100 of his female patients and asked them how many hours of sleep they got, per night, on average. The women slept an average of 6.75 hours and the men slept an average of 7.5 hours. Suppose we also knew the population standard deviations to be 1.25 hours and 1.5 hours for men and women, respectively. Formulate the null and alternative hypotheses to test whether women sleep less than men do, on average.

ho: um-uw =o, ha:um-uw >0 Note that is ha:um-uw >0 equivalent to ha: um>uw, which implies that the population of women sleep less on average than the population of men. Also, the null and alternative hypotheses always need to be stated in terms of parameters, not statistics. See Section 10.1, Inferences About the Difference Between Two Population Means: SD1 andSD2 Known.

The central limit theorem states that:

if the sample size n is large, then the sampling distribution of the sample mean can be approximated by a normal distribution.

A multiple regression model has the form yhat=5+6x1-7x2. As x1 increases by 1 unit (holding x2constant), the dependent variable is expected to:

increase by 6 units As x1 increases by 1 unit (holding x2 constant), the dependent variable is expected to increase by 6 units.

In general, higher confidence levels provide larger confidence intervals. One way to have high confidence and a small margin of error is to:

increase the sample size

A regression model between sales (y in $1,000) and unit price (x1 in dollars) and television advertisement (x2 in dollars) resulted in the following function:y=5-3x1+4x2 . The coefficient of the unit price indicates that if the unit price is:

increased by $1 (holding advertisement constant), the sales are expected to decrease by $3,000.

In general, R^2 always _____ as independent variables are added to the regression model.

increases

We can reduce the margin of error in an interval estimate of p by doing any of the following except:

increasing the planning value p* to .5

The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the values of ɛ are:

independent

In a multiple regression model, the values of the error term, ε, are assumed to be:

independent of each other

Regarding inferences about the difference between two population means, the alternative to the matched sample design is:

independent samples

When we conduct significance tests for a multiple regression relationship, the t test can be conducted for each of the independent variables in the model. Each of those tests are called tests for:

individual significance

For a(n) _____ , it is impossible to construct a sampling frame.

infinite population

An observation that has a strong influence or effect on the regression results is called a(n):

influential observation

The effect of two independent variables acting together is called:

interaction

An approximate value of a population parameter that provides limits and believed to contain the value of the parameter is known as the:

interval estimate

An elementary school teacher asked a random sample of 12 of her students what their favorite number was. Assume the population of responses would follow a normal distribution. The students stated that their favorite numbers are: 2,10,7,4,0,5,6,4,4,6,1,100 Suppose we were to create a 95% confidence interval for μ. What effect does the value 100 have on the width of the confidence interval?

it makes the interval wider

Which of the following is a nonprobability sampling technique?

judgement sampling

If a categorical variable has k levels, then:

k - 1 dummy variables are needed

Larger values of r^2 imply that the observations are more closely grouped about the

least squares line

The method used to develop the estimated regression equation that minimizes the sum of squared residuals is called the:

least squares method

The pth percentile is a value such that at least p percent of the observations are:

less than or equal to this value

The probability of making a Type I error when the null hypothesis is true as an equality is called the:

level of significance

The probability that the interval estimation procedure will generate an interval that does not contain µ is known as the:

level of significance

The tests of significance in regression analysis are based on several assumptions about the error term ɛ. Additionally, we make an assumption about the form of the relationship between x and y. We assume that the relationship between x and y is:

linear

The value added and subtracted from a point estimate in order to develop an interval estimate of the population parameter is known as the:

margin of error

A researcher recruits 25 people to participate in a study on alcohol consumption and its interactions with Tylenol. The 25 participants had to come to a check-in center every day at 7:00 a.m. for one week. They were given various amounts of alcohol. Each day, each participant would flip a coin to determine if they also took Tylenol with their alcohol. They found that their BAC was 25% higher on days when they were given Tylenol with their alcohol than when they drank alcohol alone. This is an example of a(n):

matched sample design

A company wants to identify which of the two production methods has the smaller completion time. One sample of workers is selected and each worker first uses one method and then uses the other method. The sampling procedure being used to collect completion time data is based on:

matched samples

The measure of location that is most likely to be influenced by extreme values in a data set is the:

mean

Which of the following is not resistant to the outliers in a data set?

mean

Which of the following provides a measure of central location for the data?

mean

When a set of data has suspect outliers, which of the following is the referred measure of central tendency?

median

When the sample size is large, the sampling distribution of (pbar1 -pbar2) is approximated by a:

normal distribution

A ________ is a graph of the standardized residuals plotted against values of the normal scores. This helps to determine whether the assumption that the error term has a normal probability distribution appears to be valid.

normal probability plot

In a multiple regression model, the values of the error term, ε, are assumed to be:

normally distributed

The normal probability distribution can be used to approximate the sampling distribution of pbar as long as:

np >_ 5 and n(1-p) >_5

The sampling distribution of pbar can be approximated by a normal distribution as long as:

np >_ 5 and n(1-p)>_ 5

The p-value is a probability that measures the evidence against the:

null hypothesis

Parameters are

numerical characteristics of a population.

The drying time of one particular paint is approximately normally distributed with a mean of 45 minutes. It is also known that 15% of walls painted with this paint need more than 55 minutes to dry completely. Find the standard deviation of this distribution.

o=9.62 Using the z table, we see that the z-score corresponding to .85 is 1.04. Plug in the given information and solve the formula z=(x-u/o) for σ. The standard deviation of this distribution is σ=9.62.

The parameters of nonlinear models have exponents:

other than one

When we conduct significance tests for a multiple regression relationship, the F test will be used as the test for:

overall significance

The average gasoline price of one of the major oil companies has been hovering around $2.20 per gallon. Because of cost reduction measures, it is announced that there will be a significant reduction in the average price over the next month. In order to test this belief, we wait one month, then randomly select a sample of 36 of the company's gas stations. We find that the average price for the stations in the sample was $2.15. The standard deviation of the prices for the selected gas stations is $.10. Given that the test statistic for this sample is t = -3, determine the p-value.

p-value = .002 From a t-distribution table, we can determine that the p-value is between .001 and .0025

A student wants to determine if pennies are really fair, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. Calculate the p-value and state the conclusion. Use 𝛼 = .05.

p-value = .397. Do not reject ho The p-value is based upon a standard normal distribution. When looking up the test statistic (z = .85), remember that the value found must be doubled because this is a two-sided test. The p-value is greater than α = .05, so we do not reject H0. See Section 9.5, Population Proportion.

When completing a two-tailed hypothesis test about the difference between two population means, the

p-value must be doubles

Sample statistics, such as x̅ , s, or p̅, that provide the point estimate of the population parameter are known as:

point estimator

When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a(n):

prediction interval

Suppose a multiple coefficient of determination coming from a regression analysis with 50 observations and 3 independent variables is .8455. Calculate the adjusted multiple coefficient of determination.

r-sq(adj)=83.54%

In best-subsets regression, Minitab can be used to provide output that identifies the two best one-variable estimated regression equations, the two best two-variable estimated regression equations, the two best three-variable estimated regression equations, and so on. What criterion is used in determining which estimated regression equations are best?

r^2

In stratified random sampling:

randomly selected elements within each of the strata form the sample.

The difference between the largest and the smallest data values is the:

range

The interquartile range is used as a measure of variability to overcome the fact that the:

range is comprised of extreme values

In a recent Gallup Poll, the decision was made to increase the size of its random sample of voters from 1500 people to about 4000 people. The purpose of this increase is to:

reduce the standard error of the estimate

Doubling the size of the sample will:

reduce the standard error of the mean

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is called a(n):

residual

Graphical representation of the residuals that can be used to determine whether the assumptions made about the regression model appear to be valid is called a:

residual plot

Since the multiple regression equation generates a plane or surface, its graph is called a:

response surface

Suppose we have a t distribution based upon two sample means with unknown population standard deviations, which we are unwilling to assume are equal. When we calculate the appropriate degrees of freedom, we should:

round the calculated degrees of freedom down to the nearest integer.

For the case where σ is unknown, which statistic is used to estimate σ?

When xbar is unknown, which of the following is used to estimate xbar ?

Which of the following is a point estimator?

Which of the following is not a symbol for a parameter?

The margin of error in an interval estimate of the population mean is a function of all of the following except the:

sample mean

A numerical value used as a summary measure for a sample, such as sample mean, is known as a:

sample statistic

The value of the _____ is used to estimate the value of the population parameter.

sample statistic

The distribution of values taken by a statistic in all possible samples of the same size from the same population is called a:

sampling distribution

The probability distribution of all possible values of the sample proportion is the:

sampling distribution of pbar.

the regression mode y=Bo+B1x1+B2x21+e is a

second-order model with one predictor variable

An F test, based on the F probability distribution, can be used to test for:

significance in regression.

Applications of hypothesis testing that only control for the Type I error are called:

significance tests

The mathematical equation relating the independent variable to the expected value of the dependent variable,E(y) = B0 +B1X , is known as the:

simple linear regression equation

When determining the best estimated regression equation to model a set of data, the procedure that allows an independent variable to enter the model at one step, be removed at a subsequent step, and then enter the model at a later step is:

stepwise regression

When determining the best estimated regression equation to model a set of data, the procedure that begins each step by determining whether any of the variables already in the model should be removed is called:

stepwise regression

A probability sampling method in which we randomly select one of the first k elements and then select every kth element thereafter is:

systematic sampling

In a survey of public opinion concerning state aid to a particular city, every 40th person registered as a voter was interviewed, beginning with a person selected at random from among the first 40 listed. This is an example of:

systematic sampling

The following data show the results of an aptitude test and the grade point average of 10 students. Does the t test indicate a significant relationship between GPA and Aptitude Test Score? State the test statistic, and then state your conclusion using ⍺ = .05.

t = 6.25. The p- value is less than .05, so the evidence is sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores.

Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. State the test statistic and p-value used to determine whether the number of people who use the public pool is related to the outside temperature.

t = 8.98 and p-value = .000

When "s" is used to estimate "σ," the margin of error is computed by using the:

t distribution

When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, what distribution do confidence and prediction intervals follow?

t distribution

From a population that is normally distributed, a sample of 30 elements is selected and the standard deviation of the sample is computed. For the interval estimation of μ, the proper distribution to use is the:

t distribution with 29 degrees of freedom

Independent simple random samples are taken to test the difference between the means of two populations whose variances are not known, but are assumed to be equal. The sample sizes are n1 = 32 and n2 = 40. The correct distribution to use is the:

t distribution with 70 degrees of freedom n1+n2 - 2

t test for a difference in two means

The population we want to make inferences about is called the:

target population

If the variance of a data set is correctly computed with the formula using in the denominator, which of the following is true?

the data set is a sample

The interquartile range is:

the difference between Q3 and Q1, the third and first quartiles

In most applications of the interval estimation and hypothesis testing procedures, random samples with n1 ≥ 30 and n2 ≥ 30 are adequate. In cases where either or both sample sizes are less than 30:

the distribution of the population becomes an important consideration

When autocorrelation is present, which of the following assumptions is violated?

the error terms are independent

Which of the following does not need to be known in order to compute the p-value?

the level of significance

In a five-number summary, which of the following is not used for data summarization?

the mean

There are three executives in an office with ages of 56, 57, and 58. If a 57-year-old executive enters the room, then:

the mean age will stay the same, but the variance will decrease.

The center of a normal curve is:

the mean of the distribution

A sample of 99 distances has a mean of 24 feet and a median of 21.5 feet. Unfortunately, it has just been discovered that an observation that was erroneously recorded as "30" actually had a value of "35." If we make this correction to the data, then:

the median remains the same, but the mean is increased.

For the interval estimation of μ when σ is known and the sample is large, the proper distribution to use is:

the normal distribution

In hypothesis testing, the tentative assumption about the population parameter is called

the null hypothesis

A negative value of z indicates that:

the number of standard deviations of an observation is below the mean.

The sampling distribution of pbar is the:

the probability distribution of all possible values of the sample proportion

If a residual plot of x versus the residuals, y - ŷ, shows a non-linear pattern, then we should conclude that:

the regression model is not an adequate representation of the relationship between the variables.

The tests of significance in regression analysis are based on assumptions about the error term ɛ . One such assumption is that the variance of ɛ, denoted by 𝝈2, is:

the same for all values of x

In a multiple regression model, the variance of the error term, ε, is assumed to be:

the same for all values of x1, x2,.....,xp

The distribution of values taken by a statistic in all possible samples of the same size from the same population is the sampling distribution of:

the sample

The coefficient of variation is:

the standard deviation divided by the mean and multiplied by 100.

Which of the following is not a characteristic of the normal probability distribution?

the standard deviation must be one

In multiple regression analysis:

there can be several independent variables, but only one dependent variable.

Suppose the correlation coefficient rxy between the amount of sleep (in hours) "x" and number of yawns made in 8:00 a.m. classes "y" of 100 business statistics students is computed to be -.82. Then:

there is a strong negative linear relationship between the two variables.

In order to use the output from a multiple regression model to perform the ANOVA test on the difference among the means of four populations, how many dummy variables do we need to use to indicate treatments?

three

What value of Durbin-Watson statistic indicates that no autocorrelation is present?

two

The sample mean is the point estimator of:

To compute the necessary sample size for an interval estimate of a population proportion, all of the following procedures are recommended when p* is unknown except:

using .95 as an estimate

To compute the necessary sample size for an interval estimate of a population mean, all of the following procedures are recommended when σ is unknown except:

using σ = 1.

Dummy variables must always have:

values of either 0 or 1

The matched sample design often leads to a smaller sampling error than the independent sample design. The primary reason is that in a matched sample design:

variation between subjects is eliminated because the same subjects are used for both treatments. The matched sample design often leads to a smaller sampling error than the independent sample design. The primary reason is that in a matched sample design, variation between subjects is eliminated because the same subjects are used for both treatments

We cannot select a simple random sample from an infinite population because:

we cannot construct a frame consisting of all the elements.

Regarding hypothesis tests about p1 -p2, the pooled estimate of P is a:

weighted average of pbar1 and pbar2

Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. What is the estimated regression equation?

yhat = 57.912 +.81138x

A student believes that no more than 20% of the students who finish a statistics course get an A. A random sample of 100 students was taken. Twenty-two percent of the students in the sample received an A. Calculate the test statistic.

z = .5

In a standard normal distribution, what z-score corresponds to the 75th percentile?

z=.67

z=.85

How many independent variables does the forward selection process start with?

zero 0

What is the symbol for the population mean?

The sample statistic characteristic s is the point estimator of:

Tingnan ang lahat ng mga set ng pag-aaral

module 4-6

Kaugnay na mga set ng pag-aaral

Economics Test 2

Homework 19.2 Adjusting Nominal Values to Real Values and Tracking Real GDP Over Time

Case study #2 BIOL 1322

Principles of Microeconomics Ch 4 Elasticity

NTP

Gov. All in a days work

All Inman Questions

Chapter 43 Care of the Patient with an Integumentary Disorder

oklahoma history

Pathology Review: Airflow Pathway Story

Investments CH 1-5

FINAL ch 15-27, lesson 7-13

UNIT 1

Nutrition - Exam 2**

GRE Quantitative: Arithmetic/Algebra

systems ch 11

Astronomy Chapter 5

ChandlerMicro1

Macro Final

Cells