10. Sampling and Estimation

Ace your homework & exams now with Quizwiz!

From a sample of 41 orders for an on-line bookseller, the average order size is $75, and the sample standard deviation is $18. Assume the distribution of orders is normal. For which interval can one be exactly 90% confident that the population mean is contained in that interval? A) $71.29 to 78.71. B) $70.27 to $79.73. C) $74.24 to $75.76.

B If the distribution of the population is normal, but we don't know the population variance, we can use the Student's t-distribution to construct a confidence interval. Because there are 41 observations, the degrees of freedom are 40. From the student's t table, we can determine that the reliability factor for tα/2, or t0.05, is 1.684. Then the 90% confidence interval is $75.00 ± 1.684($18.00 / √41), or $75.00 ± 1.684 × $2.81 or $75.00 ± $4.73

Studies of performance of a sample of mutual fund managers most likely suffer from: A) sample-selection bias. B) survivorship bias. C) look-ahead bias.

B Studies of the performance of mutual fund managers often suffer from survivorship bias as poorly performing funds are closed down and are not included in the sample.

From a population of 5,000 observations, a sample of n = 100 is selected. Calculate the standard error of the sample mean if the population standard deviation is 50. A) 50.00. B) 4.48. C) 5.00.

C The standard error of the sample mean equals the standard deviation of the population divided by the square root of the sample size: 50 / 1001/2 = 5.

Which of the following statements about sampling and estimation is most accurate? A) A confidence interval estimate consists of a range of values that bracket the parameter with a specified level of probability, 1 ? β. B) Time-series data are observations over individual units at a point in time. C) A point estimate is a single estimate of an unknown population parameter calculated as a sample mean.

C Time-series data are observations taken at specific and equally-spaced points. A confidence interval estimate consists of a range of values that bracket the parameter with a specified level of probability, 1 ? α.

The average salary for a sample of 61 CFA charterholders with 10 years experience is $200,000, and the sample standard deviation is $80,000. Assume the population is normally distributed. Which of the following is a 99% confidence interval for the population mean salary of CFA charterholders with 10 years of experience? A) $172,514 to $227,486. B) $160,000 to $240,000. C) $172,754 to $227,246.

C If the distribution of the population is normal, but we don't know the population variance, we can use the Student's t-distribution to construct a confidence interval. Because there are 61 observations, the degrees of freedom are 60. From the student's t table, we can determine that the reliability factor for tα/2, or t0.005, is 2.660. Then the 99% confidence interval is $200,000 ± 2.660($80,000 / √61) or $200,000 ± 2.660 × $10,243, or $200,000 ± $27,246.

The practice of repeatedly using the same database to search for patterns until one is found is called: A) data snooping. B) sample selection bias. C) data mining.

C The practice of data mining involves analyzing the same data so as to detect a pattern, which may not replicate in other data sets, also known as torturing the data until it confesses.

Which of the following is NOT a prediction of the central limit theorem? A) The variance of the sampling distribution of sample means will approach the population variance divided by the sample size. B) The mean of the sampling distribution of the sample means will be equal to the population mean. C) The standard error of the sample mean will increase as the sample size increases.

C The standard error of the sample mean is equal to the sample standard deviation divided by the square root of the sample size. As the sample size increases, this ratio decreases. The other two choices are predictions of the central limit theorem.

Which of the following statements regarding the central limit theorem (CLT) is least accurate? The CLT: A) gives the variance of the distribution of sample means as σ2 / n, where σ2 is the population variance and n is the sample size. B) holds for any population distribution, assuming a large sample size. C) states that for a population with mean μ and variance σ2, the sampling distribution of the sample means for any sample of size n will be approximately normally distributed.

C This question is asking you to select the inaccurate statement. The CLT states that for a population with mean μ and a finite variance σ2, the sampling distribution of the sample means becomes approximately normally distributed as the sample size becomes large. The other statements are accurate.

A sample of size n = 25 is selected from a normal population. This sample has a mean of 15 and a sample variance of 4. What is the standard error of the sample mean? A) 2.0. B) 0.4. C) 0.8.

B

The range of possible values in which an actual population parameter may be observed at a given level of probability is known as a: A) significance level. B) confidence interval. C) degree of confidence.

B

Which of the following statements about sampling and estimation is most accurate? A) The probability that a parameter lies within a range of estimated values is given by α. B) The standard error of the sample means when the standard deviation of the population is unknown equals s / √n, where s = sample standard deviation. C) The standard error of the sample means when the standard deviation of the population is known equals σ / √n, where σ = sample standard deviation adjusted by n ? 1.

B

A sample of 100 individual investors has a mean portfolio value of $28,000 with a standard deviation of $4,250. The 95% confidence interval for the population mean is closest to: A) $19,500 to $28,333. B) $27,159 to $28,842. C) $27,575 to $28,425.

B Confidence interval = mean ± tc{S / √n} = 28,000 ± (1.98) (4,250 / √100) or 27,159 to 28,842 If you use a z-statistic because of the large sample size, you get 28,000 ± (1.96) (4,250 / √100) = 27,167 to 28,833, which is closest to the correct answer.

From a population with a known standard deviation of 15, a sample of 25 observations is taken. Calculate the standard error of the sample mean. A) 0.60. B) 3.00. C) 1.67.

B

Frank Grinder is trying to introduce sampling into the quality control program of an old-line manufacturer. Grinder samples 38 items and finds that the standard deviation in size is 0.019 centimeters. What is the standard error of the sample mean? A) 0.00308. B) 0.00204. C) 0.00615.

A

If the number of offspring for females of a certain mammalian species has a mean of 16.4 and a standard deviation of 3.2, what will be the standard error of the sample mean for a survey of 25 females of the species? A) 0.64. B) 1.28. C) 3.20.

A

Melissa Cyprus, CFA, is conducting an analysis of inventory management practices in the retail industry. She assumes the population cross-sectional standard deviation of inventory turnover ratios is 20. How large a random sample should she gather in order to ensure a standard error of the sample mean of 4? A) 25. B) 20. C) 80.

A

Which of the following characterizes the typical construction of a confidence interval most accurately? A) Point estimate +/- (Reliability factor x Standard error). B) Standard error +/- (Point estimate / Reliability factor). C) Point estimate +/- (Standard error / Reliability factor).

A

The sample mean is an unbiased estimator of the population mean because the: A) expected value of the sample mean is equal to the population mean. B) sampling distribution of the sample mean has the smallest variance of any other unbiased estimators of the population mean. C) sample mean provides a more accurate estimate of the population mean as the sample size increases.

A An unbiased estimator is one for which the expected value of the estimator is equal to the parameter you are trying to estimate.

Which one of the following distributions is described entirely by the degrees of freedom? A) Student's t-distribution. B) Normal distribution. C) Lognormal distribution.

A Student's t-distribution is defined by a single parameter known as the degrees of freedom.

Which one of the following statements about the t-distribution is most accurate? A) The t-distribution approaches the standard normal distribution as the number of degrees of freedom becomes large. B) Compared to the normal distribution, the t-distribution is more peaked with more area under the tails. C) The t-distribution is the appropriate distribution to use when constructing confidence intervals based on large samples.

A As the number of degrees of freedom grows, the t-distribution approaches the shape of the standard normal distribution. Compared to the normal distribution, the t-distribution is less peaked with more area under the tails. When choosing a distribution, three factors must be considered: sample size, whether population variance is known, and if the distribution is normal.

A research paper that reports finding a profitable trading strategy without providing any discussion of an economic theory that makes predictions consistent with the empirical results is most likely evidence of: A) data mining. B) a sample that is not large enough. C) a non-normal population distribution.

A Data mining occurs when the analyst continually uses the same database to search for patterns or trading rules until he finds one that works. If you are reading research that suggests a profitable trading strategy, make sure you heed the following warning signs of data mining: Evidence that the author used many variables (most unreported) until he found ones that were significant. The lack of any economic theory that is consistent with the empirical results.

From a sample of 41 monthly observations of the S& Mid-Cap index, the mean monthly return is 1% and the sample variance is 36. For which of the following intervals can one be closest to 95% confident that the population mean is contained in that interval? A) 1.0% ± 1.9%. B) 1.0% ± 6.0%. C) 1.0% ± 1.6%.

A If the distribution of the population is nonnormal, but we don't know the population variance, we can use the Student's t-distribution to construct a confidence interval. The sample standard deviation is the square root of the variance, or 6%. Because there are 41 observations, the degrees of freedom are 40. From the Student's t distribution, we can determine that the reliability factor for t0.025, is 2.021. Then the 95% confidence interval is 1.0% ± 2.021(6 / √41) or 1.0% ± 1.9%.

A traffic engineer is trying to measure the effects of carpool-only lanes on the expressway. Based on a sample of 20 cars at rush hour, he finds that the mean number of occupants per car is 2.5, with a standard deviation of 0.4. If the population is normally distributed, what is the confidence interval at the 5% significance level for the number of occupants per car? A) 2.313 to 2.687. B) 2.387 to 2.613. C) 2.410 to 2.589.

A The reliability factor corresponding with a 5% significance level (95% confidence level) for the Student's t-distribution with (20 ? 1) degrees of freedom is 2.093. The confidence interval is equal to: 2.5 ± 2.093(0.4 / √20) = 2.313 to 2.687. (We must use the Student's t-distribution and reliability factors because of the small sample size.)

A traffic engineer is trying to measure the effects of carpool-only lanes on the expressway. Based on a sample of 100 cars at rush hour, he finds that the mean number of occupants per car is 2.5, and the sample standard deviation is 0.4. What is the standard error of the sample mean? A) 0.04. B) 1.00. C) 5.68.

A The standard error of the sample mean when the standard deviation of the population is not known is estimated by the standard deviation of the sample divided by the square root of the sample size. In this case, 0.4 / √100 = 0.04.

A local high school basketball team had 18 home games this season and averaged 58 points per game. If we assume that the number of points made in home games is normally distributed, which of the following is most likely the range of points for a confidence interval of 90%? A) 34 to 82. B) 24 to 78. C) 26 to 80.

A This question has a bit of a trick. To answer this question, remember that the mean is at the midpoint of the confidence interval. The correct confidence interval will have a midpoint of 58. (34 + 82) / 2 = 58.

If the true mean of a population is 16.62, according to the central limit theorem, the mean of the distribution of sample means, for all possible sample sizes n will be: A) 16.62. B) 16.62 / √n. C) indeterminate for sample with n < 30.

A According to the central limit theorem, the mean of the distribution of sample means will be equal to the population mean. n > 30 is only required for distributions of sample means to approach normal distribution.

The central limit theorem states that, for any distribution, as n gets larger, the sampling distribution: A) approaches a normal distribution. B) becomes larger. C) approaches the mean.

A As n gets larger, the variance of the distribution of sample means is reduced, and the distribution of sample means approximates a normal distribution.

The sample of per square foot sales for 100 U.S. retailers in December 2004 is an example of: A) cross-sectional data. B) unbiased data. C) time-series data.

A Cross-sectional data are a sample of observations taken at a single point in time. A time-series is a sample of observations taken at specific and equally spaced points in time.

According to the Central Limit Theorem, the distribution of the sample means is approximately normal if: A) the sample size n > 30. B) the underlying population is normal. C) the standard deviation of the population is known.

A The Central Limit Theorem states that if the sample size is sufficiently large (i.e. greater than 30) the sampling distribution of the sample means will be approximately normal.

The central limit theorem concerns the sampling distribution of the: A) sample mean. B) sample standard deviation. C) population mean.

A The central limit theorem tells us that for a population with a mean m and a finite variance σ2, the sampling distribution of the sample means of all possible samples of size n will approach a normal distribution with a mean equal to m and a variance equal to σ2 / n as n gets large.

The average mutual fund return calculated from a sample of funds with significant survivorship bias would most likely be: A) an unbiased estimate of the mean return of the population of all mutual funds if the sample size was large enough. B) larger than the mean return of the population of all mutual funds. C) smaller than the mean return of the population of all mutual funds.

B If we try to draw any conclusions from an analysis of a mutual fund database with survivorship bias, we overestimate the average mutual fund return, because we don't include the poorer-performing funds that dropped out. A larger sample size from a database with survivorship bias will still result in a biased estimate.

A random sample of 25 Indiana farms had a mean number of cattle per farm of 27 with a sample standard deviation of five. Assuming the population is normally distributed, what would be the 95% confidence interval for the number of cattle per farm? A) 23 to 31. B) 25 to 29. C) 22 to 32.

B The standard error of the sample mean = 5 / √25 = 1 Degrees of freedom = 25 ? 1 = 24 From the student's T table, t5/2 = 2.064 The confidence interval is: 27 ± 2.064(1) = 24.94 to 29.06 or 25 to 29.

The sampling distribution of a statistic is: A) always a standard normal distribution. B) the probability distribution consisting of all possible sample statistics computed from samples of the same size drawn from the same population. C) the same as the probability distribution of the underlying population.

B A sample statistic itself is a random variable, so it also has a probability distribution. For example, suppose we start with a sample of the prices of 200 stocks, and we calculate the sample mean of a random sample of 40 of those stocks. If we repeat this many times, we will have many different estimates of the sample mean. The distribution of these estimates of the mean is the sampling distribution of the mean. A statistic's sampling distribution is not necessarily normal or the same as that of the population.

Monthly Gross Domestic Product (GDP) figures from 1990-2000 are an example of: A) cross-sectional data. B) time-series data. C) systematic data.

B A time-series is a group of observations taken at specific and equally spaced points in time. Cross-sectional data are observations taken at a single point in time.

Which of the following is least likely a step in stratified random sampling? A) The population is divided into strata based on some classification scheme. B) The size of each sub-sample is selected to be the same across strata. C) The sub-samples are pooled to create the complete sample.

B In stratified random sampling we first divide the population into subgroups, called strata, based on some classification scheme. Then we randomly select a sample from each stratum and pool the results. The size of the samples from each strata is based on the relative size of the strata relative to the population and are not necessarily the same across strata.

An equity analyst needs to select a representative sample of manufacturing stocks. Starting with the population of all publicly traded manufacturing stocks, she classifies each stock into one of the 20 industry groups that form the Index of Industrial Production for the manufacturing industry. She then selects a number of stocks from each industry based on its weight in the index. The sampling method the analyst is using is best characterized as: A) data mining. B) stratified random sampling. C) nonrandom sampling.

B In stratified random sampling, a researcher classifies a population into smaller groups based on one or more characteristics, takes a simple random sample from each subgroup based on the size of the subgroup, and pools the results.

An analyst wants to generate a simple random sample of 500 stocks from all 10,000 stocks traded on the New York Stock Exchange, the American Stock Exchange, and NASDAQ. Which of the following methods is least likely to generate a random sample? A) Assigning each stock a unique number and generating a number using a random number generator. Then selecting the stock with that number for the sample and repeating until there are 500 stocks in the sample. B) Using the 500 stocks in the S& 500. C) Listing all the stocks traded on all three exchanges in alphabetical order and selecting every 20th stock.

B The S&P 500 is not a random sample of all stocks traded in the U.S. because it represents the 500 largest stocks. The other two choices are legitimate methods of selecting a simple random sample.

A population has a mean of 20,000 and a standard deviation of 1,000. Samples of size n = 2,500 are taken from this population. What is the standard error of the sample mean? A) 0.04. B) 400.00. C) 20.00.

C

Joseph Lu calculated the average return on equity for a sample of 64 companies. The sample average is 0.14 and the sample standard deviation is 0.16. The standard error of the mean is closest to: A) 0.1600. B) 0.0025. C) 0.0200.

C

When sampling from a population, the most appropriate sample size: A) minimizes the sampling error and the standard deviation of the sample statistic around its population value. B) is at least 30. C) involves a trade-off between the cost of increasing the sample size and the value of increasing the precision of the estimates.

C A larger sample reduces the sampling error and the standard deviation of the sample statistic around its population value. However, this does not imply that the sample should be as large as possible, or that the sampling error must be as small as can be achieved. Larger samples might contain observations that come from a different population, in which case they would not necessarily improve the estimates of the population parameters. Cost also increases with the sample size. When the cost of increasing the sample size is greater than the value of the extra precision gained, increasing the sample size is not appropriate.

The average return on small stocks over the period 1926-1997 was 17.7%, and the standard error of the sample was 33.9%. The 95% confidence interval for the return on small stocks in any given year is: A) 16.8% to 18.6%. B) -16.2% to 51.6%. C) -48.7% to 84.1%.

C A 95% confidence level is 1.96 standard deviations from the mean, so 0.177 ± 1.96(0.339) = (-48.7%, 84.1%).

The sample mean is a consistent estimator of the population mean because the: A) sampling distribution of the sample mean has the smallest variance of any other unbiased estimators of the population mean. B) expected value of the sample mean is equal to the population mean. C) sample mean provides a more accurate estimate of the population mean as the sample size increases.

C A consistent estimator provides a more accurate estimate of the parameter as the sample size increases.

A scientist working for a pharmaceutical company tries many models using the same data before reporting the one that shows that the given drug has no serious side effects. The scientist is guilty of: A) look-ahead bias. B) sample selection bias. C) data mining.

C Data mining is the process where the same data is used with different methods until the desired results are obtained.

The mean return of Bartlett Co. is 3% and the standard deviation is 6% based on 20 monthly returns. What is the respective standard error of the sample and the confidence interval of a two tailed z-test with a 5% level of significance? A) 2.00; 0.37 to 5.629. B) 1.34; ?0.66 to 4.589. C) 1.34; 0.37 to 5.629.

C The standard error of the sample is the standard deviation divided by the square root of n, the sample size. 6/201/2 = 1.34%. The confidence interval = point estimate +/- (reliability factor × standard error) confidence interval = 3 +/- (1.96 × 1.34) = 0.37 to 5.629

An analyst has compiled stock returns for the first 10 days of the year for a sample of firms and estimated the correlation between these returns and changes in book value for these firms over the just ended year. What objection could be raised to such a correlation being used as a trading strategy? A) Use of year-end values causes a time-period bias. B) Use of year-end values causes a sample selection bias. C) The study suffers from look-ahead bias.

C The study suffers from look-ahead bias because traders at the beginning of the year would not be able to know the book value changes. Financial statements usually take 60 to 90 days to be completed and released.

Which statement best describes the properties of Student's t-distribution? The t-distribution is: A) skewed, and defined by a single parameter. B) symmetrical, and defined by two parameters. C) symmetrical, and defined by a single parameter.

C The t-distribution is symmetrical like the normal distribution but unlike the normal distribution is defined by a single parameter known as the degrees of freedom.

When is the t-distribution the appropriate distribution to use? The t-distribution is the appropriate distribution to use when constructing confidence intervals based on: A) large samples from populations with known variance that are nonnormal. B) small samples from populations with known variance that are at least approximately normal. C) small samples from populations with unknown variance that are at least approximately normal.

C The t-distribution is the appropriate distribution to use when constructing confidence intervals based on small samples from populations with unknown variance that are either normal or approximately normal.

In which one of the following cases is the t-statistic the appropriate one to use in the construction of a confidence interval for the population mean? A) The distribution is normal, the population variance is known, and the sample size is less than 30. B) The distribution is nonnormal, the population variance is known, and the sample size is at least 30. C) The distribution is nonnormal, the population variance is unknown, and the sample size is at least 30.

C The t-distribution is the theoretically correct distribution to use when constructing a confidence interval for the mean when the distribution is nonnormal and the population variance is unknown but the sample size is at least 30.

Which of the following statements about sample statistics is least accurate? A) The z-statistic is used to test normally distributed data with a known variance, whether testing a large or a small sample. B) The z-statistic is used for nonnormal distributions with known variance, but only for large samples. C) There is no sample statistic for non-normal distributions with unknown variance for either small or large samples.

C There is no sample statistic for non-normal distributions with unknown variance for small samples, but the t-statistic is used when the sample size is large.

A study reports that from 2002 to 2004 the average return on growth stocks was twice as large as that of value stocks. These results most likely reflect: A) look-ahead bias. B) survivorship bias. C) time-period bias.

C Time-period bias can result if the time period over which the data is gathered is either too short because the results may reflect phenomenon specific to that time period, or if a change occurred during the time frame that would result in two different return distributions. In this case the time period sampled is probably not large enough to draw any conclusions about the long-term relative performance of value and growth stocks, even if the sample size within that time period is large. Look-ahead bias occurs when the analyst uses historical data that was not publicly available at the time being studied. Survivorship bias is a form of sample selection bias in which the observations in the sample are biased because the elements of the sample that survived until the sample was taken are different than the elements that dropped out of the population.

With 60 observations, what is the appropriate number of degrees of freedom to use when carrying out a statistical test on the mean of a population? A) 61. B) 60. C) 59.

C When performing a statistical test on the mean of a population based on a sample of size n, the number of degrees of freedom is n - 1 since once the mean is estimated from a sample there are only n - 1 observations that are free to vary. In this case the appropriate number of degrees of freedom to use is 60 - 1 = 59.

An analyst divides the population of U.S. stocks into 10 equally sized sub-samples based on market value of equity. Then he takes a random sample of 50 from each of the 10 sub-samples and pools the data to create a sample of 500. This is an example of: A) simple random sampling. B) systematic cross-sectional sampling. C) stratified random sampling.

C In stratified random sampling we first divide the population into subgroups, called strata, based on some classification scheme. Then we randomly select a sample from each stratum and pool the results. The size of the samples from each strata is based on the relative size of the strata relative to the population. Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same (non-zero) likelihood of being included in the sample.

Thomas Merton, a car industry analyst, wants to investigate a relationship between the types of ads used in advertising campaigns and sales to customers in certain age groups. In order to make sure he includes manufacturers of all sizes, Merton divides the industry into four size groups and draws random samples from each group. What sampling method is Merton using? A) Cross-sectional sampling. B) Simple random sampling. C) Stratified random sampling.

C In stratified random sampling, we first divide the population into subgroups based on some relevant characteristic(s) and then make random draws from each group.

Which of the following statements about sampling errors is least accurate? A) Sampling error is the difference between a sample statistic and its corresponding population parameter. B) Sampling error is the error made in estimating the population mean based on a sample mean. C) Sampling errors are errors due to the wrong sample being selected from the population.

C Sampling error is the difference between a sample statistic (the mean, variance, or standard deviation of the sample) and its corresponding population parameter (the mean, variance, or standard deviation of the population).

Sampling error can be defined as: A) the standard deviation of a sampling distribution of the sample means. B) rejecting the null hypothesis when it is true. C) the difference between a sample statistic and its corresponding population parameter.

C Sampling error is the difference between any sample statistic (the mean, variance, or standard deviation of the sample) and its corresponding population parameter (the mean, variance or standard deviation of the population). For example, the sampling error for the mean is equal to the sample mean minus the population mean.

A simple random sample is a sample constructed so that: A) the sample size is random. B) each element of the population is also an element of the sample. C) each element of the population has the same probability of being selected as part of the sample.

C Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same (non-zero) likelihood of being included in the sample.

Suppose the mean debt/equity ratio of the population of all banks in the United States is 20 and the population variance is 25. A banking industry analyst uses a computer program to select a random sample of 50 banks from this population and compute the sample mean. The program repeats this exercise 1000 times and computes the sample mean each time. According to the central limit theorem, the sampling distribution of the 1000 sample means will be approximately normal if the population of bank debt/equity ratios has: A) a normal distribution, because the sample is random. B) a Student's t-distribution, because the sample size is greater than 30. C) any probability distribution.

C The central limit theorem tells us that for a population with a mean μ and a finite variance σ2, the sampling distribution of the sample means of all possible samples of size n will be approximately normally distributed with a mean equal to μ and a variance equal to σ2/n, no matter the distribution of the population, assuming a large sample size.

From the entire population of McDonald's franchises, an analyst constructs a sample of the monthly sales volume for 20 randomly selected franchises. She calculates the mean sales volume for those 20 franchises to be $400,000. The sampling distribution of the mean is the probability distribution of the: A) mean monthly sales volume estimates from all possible samples. B) monthly sales volume for all McDonald's franchises. C) mean monthly sales volume estimates from all possible samples of 20 observations.

C The sampling distribution of a sample statistic is a probability distribution made up of all possible sample statistics computed from samples of the same size randomly drawn from the same population, along with their associated probabilities.


Related study sets

NU370 PrepU Week 7: Leadership & Management

View Set

Cranial nerves: origination + foramen

View Set