Quantitative Analysis and Decision Making Test 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

If the random variable X is normally distributed with mean and standard deviation , then the random variable Z defined by is also normally distributed with mean 0 and standard deviation 1. a. True b. False

T

If the regression equation includes anything other than a constant plus the sum of products of constants and variables, the model will not be linear. a. True b. False

T

If two random samples of size 40 each are selected independently from two populations whose variances are 35 and 45, then the standard error of the sampling distribution of the sample mean difference, x1-x2, equals 1.4142. a. True b. False

T

In a constant elasticity, or multiplicative, relationship the dependent variable is expressed as a product of explanatory variables raised to powers. a. True b. False

T

In a multiple regression analysis involving 4 explanatory variables and 40 data points, the degrees of freedom associated with the sum of squared errors, SSE, is 35. a. True b. False

T

In a multiple regression analysis with three explanatory variables, suppose that there are 60 observations and the sum of the residuals squared is 28. The standard error of estimate must be 0.7071. a. True b. False

T

In a multiple regression problem with two explanatory variables if, the fitted regression equation is Y = 56.5 - 4.5X1 + 0.60X2 then the estimated value of Y when X1 = 2 and X2 = 3 is 49.4 a. True b. False

T

In a multiplicative seasonal model, we multiply a "base" forecast by an appropriate seasonal index. These indexes, one for each season, typically average to 1. a. True b. False

T

In a nonlinear transformation of data, the Y variable or the X variables may be transformed, but not both. a. True b. False

T

In a simple linear regression model, testing whether the slope of the population regression line could be zero is the same as testing whether or not the linear relationship between the response variable Y and the explanatory variable X is significant. a. True b. False

T

In a simple linear regression problem, if the standard error of estimate = 15 and n = 8, then the sum of squares for error, SSE, is 1,350. a. True b. False

T

In a simple linear regression problem, suppose that Sum e1^2 = 12.48 and sum(Y-Yhat)^2= 124.8. Then R^2= 0.90.

T

In an additive seasonal model, we add an appropriate seasonal index to a "base" forecast. These indexes, one for each season, typically average to 0. a. True b. False

T

In cluster sampling, the population is divided into subsets called clusters (such as cities or city blocks), and then a random sample of the clusters is selected. Once the clusters are selected, we typically acquire information from all of the members in each selected cluster. a. True b. False

T

In conducting hypothesis testing for difference between two means when samples are dependent (paired samples), the variable under consideration is D; the sample mean difference between the pairs. a. True b. False

T

In determining the sample size n for estimating the population proportion p, a conservative value of n can be obtained by using 0.50 as an estimate of p. a. True b. False

T

In choosing the "best-fitting" line through a set of points in linear regression, we choose the one with the a. smallest sum of squared residuals. b. largest sum of squared residuals. c. smallest number of outliers. d. largest number of points on the line.

a. smallest sum of squared residuals.

When we replace with the sample standard deviation (s), we introduce a new source of variability and the sampling distribution becomes the a. t -distribution. b. F- distribution. c. chi-square distribution. d. normal distribution.

a. t -distribution.

In linear regression, the fitted value is a. the predicted value of the dependent variable. b. the predicted value of the independent value. c. the predicted value of the slope. d. the predicted value of the intercept.

a. the predicted value of the dependent variable.

In developing a confidence interval for the difference between two population means using two independent samples, we use the pooled estimate in estimating the standard error of the sampling distribution of the sample mean difference x1-x2 if the populations are normal with equal variances. a. True b. False

T

A forward procedure is a type of equation building procedure that begins with only one explanatory variable in the regression equation and successively adds one variable at a time until no remaining variables make a significant contribution. a. True b. False

F

A logarithmic transformation of the response variable Y is often useful when the distribution of Y is symmetric. a. True b. False

F

A low p-value provides evidence for accepting the null hypothesis and rejecting the alternative. a. True b. False

F

A one-tailed alternative is one that is supported by evidence in either direction. a. True b. False

F

A professor of statistics doubts the claim that the proportion of Republican voters in Michigan is at most 45%. To test the claim, the hypotheses:, H0: p = 0.45, Ha: Pdoes not equal 0.45, should be used. a. True b. False

F

A regression analysis between X = sales (in $1000s) and Y = advertising (in $) resulted in the following least squares line: Y= 32 + 8X. This implies that an increase of $1 in advertising is expected to result in an increase of $40 in sales. a. True b. False

F

A shortcoming of the RMSE (root mean square error) is that it is not in the same units as the forecast variable. a. True b. False

F

A simple random sample is one where each member of the population has a known chance (this may differ from one member to another) or probability of being chosen. a. True b. False

F

All nominal data may be treated as ordinal data. a. True b. False

F

An example of a paired sample is the number of defective computer chips of a particular type from two different manufacturers. a. True b. False

F

An interaction variable is the product of an explanatory variable and the dependent variable. a. True b. False

F

As a general rule, the normal distribution is used to approximate the sampling distribution of the sample proportion only if the sample size n is greater than 30. a. True b. False

F

As is the case with residuals from regression, the forecast errors for nonregression methods will always average to zero. a. True b. False

F

Assume that the trend line was calculated from quarterly data for 2011 - 2015, where t = 1 for the first quarter of 2011. The trend value for the second quarter of the year 2016 is 0.75. a. True b. False

F

Cluster sampling is often less convenient and more costly than other random sampling methods. a. True b. False

F

Conditional probability is the probability that an event will occur, with no other events taken into consideration. a. True b. False

F

Correlation is measured on a scale from 0 to 1, where 0 indicates no linear relationship between two variables, and 1 indicates a perfect linear relationship. a. True b. False

F

Holt's method is an exponential smoothing method, which is appropriate for a series with seasonality and possibly a trend. a. True b. False

F

If A and B are two independent events with P(A) = 0.20 and P(B) = 0.60, then P(A and B) = 0.80. a. True b. False

F

If P(A and B) = 0, then A and B must be collectively exhaustive. a. True b. False

F

If a null hypothesis is rejected at the 0.025 level of significance, then it must also be rejected at the 0.01 level. a. True b. False

F

If a random sample of size 250 is taken from a population, where it is known that the population proportion p = 0.4, then the mean of the sampling distribution of the sample proportion p is 0.60. a. True b. False

F

If the sample size is greater than 30, the Central Limit Theorem (CLT) will always guarantee that the sampling distribution of the sample mean is approximately normal. a. True b. False

F

If the standard deviation of X is 15, the covariance of X and Y is 94.5, and the correlation is 0.90, then the variance of Y is 7.0. a. True b. False

F

If the standard error of the sampling distribution of the sample proportion is 0.0324 for samples of size 200, then the population proportion must be 0.30. a. True b. False

F

If two random samples of sizes 30 and 35 are selected independently from two populations whose means are 85 and 90, then the mean of the sampling distribution of the sample mean difference, x1-x2, equals 5. a. True b. False

F

If we cannot make the strong assumption that the variances of two samples are equal, then we must use the pooled standard deviation in calculating the standard error of a difference between the means. a. True b. False

F

If we use a value close to 1 for the level smoothing constant and a value close to 0 for the trend smoothing constant in Holt's exponential smoothing model, then we expect the model to respond very quickly to changes in the level, but very slowly to changes in the trend. a. True b. False

F

If we use a value close to 1 for the smoothing constant in a simple exponential smoothing model, then we expect the model to respond very slowly to changes in the level. a. True b. False

F

In a random walk model, there are significantly more runs than expected, and the autocorrelations are not significant. a. True b. False

F

In a regression model with seasonal dummy variables, the coefficients on the dummy variables represent the additive factor relative to the reference quarter value, not the multiplicative factor. a. True b. False

F

In a simple linear regression problem, if R^2= 0.95, this means that 95% of the variation in the explanatory variable X can be explained by the regression. a. True b. False

F

In simple linear regression, the divisor of the standard error of estimate is n - 1, because there is only one explanatory variable. a. True b. False

F

In stratified sampling with proportional sample sizes, the proportion of each stratum selected differs from stratum to stratum. a. True b. False

F

In testing the overall fit of a multiple regression model in which there are three explanatory variables, the null hypothesis is H0: B1 = B2 = B3 . a. True b. False

F

In the multiple regression model Y = 6.75 + 2.25X1 +3.5X2 we interpret X1 as follows: holding X2 constant, if X1 increases by 1 unit, then the expected value of Y will icrease by 9 units a. True b. False

F

Mean absolute deviation (MAD) is the average of the squared deviations. a. True b. False

F

Multiple regression represents an improvement over simple regression because it allows any number of response variables to be included in the analysis. a. True b. False

F

Phone numbers, Social Security numbers, and zip codes are typically treated as numerical variables. a. True b. False

F

Regression models with seasonal dummy variables produce coefficients for each quarter, which represent the additive or multiplicative factors relative to the annual average. a. True b. False

F

Sample evidence is statistically significant at the level only if the p-value is larger than . a. True b. False

F

Scatterplots are used for identifying outliers and indicating what you should do about the outliers you may find. a. True b. False

F

Stratified samples are typically not used in real applications because they provide less accurate estimates of population parameters for a given sampling cost. a. True b. False

F

Suppose that a sample of 10 observations has a standard deviation of 3. Then the sum of the squared deviations from the sample mean is 30. a. True b. False

F

Suppose that one equation has 3 explanatory variables and an F-ratio of 49. Another equation has 5 explanatory variables and an F-ratio of 38. The first equation will always be considered a better model. a. True b. False

F

Suppose that we want to estimate the difference in the proportion of all men and women that favor a homeless shelter opening up in their community. We should use a confidence interval for a difference between two population means to create our estimate. a. True b. False

F

The Poisson probability distribution is one of the most commonly used continuous probability distributions. a. True b. False

F

The binomial distribution is a continuous distribution. a. True b. False

F

The binomial random variable represents the number of successes that occur in a specific period of time. a. True b. False

F

The burden of proof is traditionally on the null hypothesis. a. True b. False

F

The coefficients for logarithmically transformed explanatory variables should be interpreted as the percent change in the dependent variable for a 1% percent change in the explanatory variable. a. True b. False

F

The confidence interval for the population standard deviation σ is centered at the point estimate, the sample standard deviation s. a. True b. False

F

The interval estimate 18.52.5 is developed for a population mean in which the sample standard deviation s is 7.5. Had s equaled 15 instead, the interval estimate would be 375.0. a. True b. False

F

The mean of the sampling distribution of the sample proportion, when the sample size n = 100 and the population proportion p = 0.15, is 15.0. a. True b. False

F

The multiplication rule for two events A and B is: P(A and B) = P(A|B)P(A). a. True b. False

F

The regression line Y= 3 + 2X has been fitted to the data points (4, 14), (2, 7), and (1, 4). The sum of the residuals squared will be 8.0. a. True b. False

F

The rejection region is the set of sample data that leads to the rejection of the alternative hypothesis. a. True b. False

F

The relative frequency of an event is the number of times the event occurs out of the total number of times the random experiment is run. a. True b. Falseq

F

The seasonal component of a time series is more difficult to predict than the cyclic component because cyclic variation is much more regular. a. True b. False

F

The seasonal component of a time series is more likely to exhibit the relatively steady growth of a variable, such as the population of Egypt from 35 million in 1960 to 93 million in 2016. a. True b. False

F

The standard error of sample mean is large when the observations in the population are spread out (large ), but that the standard error can be reduced by taking a smaller sample. a. True b. False

F

The standard error of the sampling distribution of the sample proportion, when the sample size n = 50 and the population proportion p = 0.25, is 0.00375. a. True b. False

F

In developing a confidence interval for the population standard deviation , we make use of the fact that the sampling distribution of the sample standard deviation s is not the normal distribution or the t-distribution, but rather a right-skewed distribution called the chi-square distribution, which (for this procedure) has n - 1 degrees of freedom. a. True b. False

T

In every regression study there is a single variable that we are trying to explain or predict. This is called the response variable or dependent variable. a. True b. False

T

In exponential smoothing models, the forecast is based on the level at time t, Lt, which is not observable and can only be estimated. a. True b. False

T

In general, the paired-sample procedure is appropriate when the samples are naturally paired in some way and there is a reasonably large positive correlation between the pairs. In this case, the paired-sample procedure makes more efficient use of the data and generally results in narrower confidence intervals. a. True b. False

T

In multiple regression with k explanatory variables, the t-tests of the individual coefficients allows us to determine whether B not= 0(for i = 1, 2, ...., k), which tells us whether a linear relationship exists between and Y. a. True b. False

T

In multiple regression, if the F-ratio is large, the explained variation is large relative to the unexplained variation. a. True b. False

T

In multiple regression, if the F-ratio is small, the explained variation is small relative to the unexplained variation. a. True b. False

T

In multiple regression, if there is multicollinearity between independent variables, the t-tests of the individual coefficients may indicate that some variables are not linearly related to the dependent variable, when in fact, they are. a. True b. False

T

In order to estimate with 90% confidence a particular value of Y for a given value of X in a simple linear regression problem, a random sample of 20 observations is taken. The appropriate t-value that would be used is 1.734. a. True b. False

T

In regression analysis, the total variation in the dependent variable Y, measured by sum(Y-Yhat_)^2 and referred to as SST, can be decomposed into two parts: the explained variation, measured by SSR, and the unexplained variation, measured by SSE. a. True b. False

T

In regression analysis, we can often use the standard error of estimate to judge which of several potential regression equations is the most useful. a. True b. False

T

In simple linear regression the test statistic for testing H0 = B1 = 0 is t-distributed with n - 2 degrees of freedom. a. True b. False

T

In stratified sampling, the population is divided into relatively homogeneous subsets called strata, and then random samples are taken from each stratum. a. True b. False

T

In systematic sampling, one of the first k members is selected randomly, and then every kth member after this one is selected. The value k is called the sampling interval and equals the ratio N / n, where N is the population size and n is the desired sample size. a. True b. False

T

It is customary to approximate the standard error of the sample mean by substituting the sample standard deviation s for in the formula: SE(x) = omega/sqrt(n). a. True b. False

T

Lilliefors test for normality compare two cumulative distribution functions (cdf's): the cdf from a normal distribution and the cdf corresponding to the given data (called the empirical cdf). a. True b. False

T

Much of the study of probabilistic inventory models, queuing models, and reliability models relies heavily on the Poisson and exponential distributions. a. True b. False

T

Multicollinearity is a situation in which two or more of the explanatory variables are highly correlated with each other. a. True b. False

T

One method of dealing with heteroscedasticity is to try a logarithmic transformation of the data. a. True b. False

T

One method of diagnosing heteroscedasticity is to plot the residuals against the predicted values of Y, then look for a change in the spread of the plotted values. a. True b. False

T

One obvious advantage of stratified sampling is that we obtain separate estimates within each stratum - which we would not obtain if we took a simple random sample from the entire population. A more important advantage is that we can increase the accuracy of the resulting population estimates by using appropriately defined strata. a. True b. False

T

One of the potential characteristics of an outlier is that the value of the dependent variable is much larger or smaller than predicted by the regression line. a. True b. False

T

Probability is a number between 0 and 1, inclusive, which measures the likelihood that some event will occur. a. True b. False

T

Regression analysis can be applied equally well to cross-sectional and time series data. a. True b. False

T

Simple random sampling can result in under-representation or over-representation of certain segments of the population. This is one of several reasons that simple random samples are almost never used in real applications. a. True b. False

T

Tests in which samples are not independent are referred to as matched pairs or paired samples. a. True b. False

T

The 95% confidence interval for the population mean, given that the sample size n = 49 and the population standard deviation = 7, is. a. True b. False

T

The F distribution is a skewed distribution useful for testing equality of variances. a. True b. False

T

The Lilliefors test is used to test for normality. a. True b. False

T

The Poisson distribution is applied to events for which the probability of occurrence over a given span of time, space, or distance is very small. a. True b. False

T

The Poisson distribution is characterized by a single parameter, which must be positive. a. True b. False

T

The Poisson random variable is a discrete random variable with infinitely many possible values. a. True b. False

T

The adjusted R2 is adjusted for the number of explanatory variables in a regression equation, and it has the same interpretation as the standard R2. a. True b. False

T

The adjusted R2 is used primarily to monitor whether extra explanatory variables belong in a multiple regression model. a. True b. False

T

The advantage that correlation has over covariance is that correlation has a set lower and upper limit. a. True b. False

T

The alternative hypothesis always contains a statement of inequality, such as "less than", "greater than", or "not equal to". a. True b. False

T

The alternative hypothesis, or research hypothesis, is the hypothesis that the analyst is attempting to prove. a. True b. FalseT

T

The approximate standard error of the point estimate of the population total is Ns/sqrt(n). a. True b. False

T

The assumptions of regression are: 1) there is a population regression line, 2) the dependent variable is normally distributed, 3) the standard deviation of the response variable remains constant as the explanatory variables increase, and 4) the errors are probabilistically independent. a. True b. False

T

The correlation between two variables is unitless and always between -1 and +1. a. True b. False

T

The cyclic component of a time series is likely to exhibit business cycles that record periods of economic recession and inflation. a. True b. False

T

The degrees of freedom for the t and chi-square distributions is a numerical parameter of the distribution that defines the precise shape of the distribution. a. True b. False

T

The difference between the point estimate and the true value of the population parameter being estimated is called the estimation error. a. True b. False

T

The effect of a logarithmic transformation on a variable that is skewed to the right by a few large values is to "squeeze" the values together and make the distribution more symmetric. a. True b. False

T

The finite population correction factor is a correction for the standard error when the sample size is fairly large relative to the population size. a. True b. False

T

The mean and standard deviation of a normally distributed random variable that has been "standardized" are zero and one, respectively. a. True b. False

T

The mean is a measure of central tendency. a. True b. False

T

The most common form of autocorrelation is positive autocorrelation, where large observations tend to follow large observations and small observations tend to follow small observations. a. True b. False

T

The moving average method is perhaps the simplest and one of the most frequently-used extrapolation methods. a. True b. False

T

The multiple R for a regression is the correlation between the observed Y values and the fitted Y values. a. True b. False

T

The null hypothesis always contains a statement of equality, like "equal to", "less than or equal to", or "greater than or equal to". a. True b. False

T

The null hypothesis in a runs test is the data series is random. a. True b. False

T

The number of car insurance policy holders is an example of a discrete random variable. a. True b. False

T

The number of loan defaults per month at a bank is Poisson distributed. a. True b. False

T

The number of people entering a shopping mall on a given day is an example of a discrete random variable. a. True b. False

T

The only meaningful way to summarize categorical data is with counts of observations in the categories. a. True b. False

T

The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true. a. True b. False

T

The p-value of a test is the smallest level of significance at which the null hypothesis can be rejected. a. True b. False

T

The power of a test is the probability of rejecting the null hypothesis when the alternative hypothesis is true. a. True b. False

T

The primary advantage of cluster sampling is sampling convenience (and possibly less cost). The downside, however, is that the inferences drawn from a cluster sample can be less accurate, for a given sample size, than for other sampling plans. a. True b. False

T

The probability of making a Type I error and the level of significance are the same. a. True b. False

T

The probability that event A will not occur is denoted as . a. True b. False

T

The purpose of using the moving average is to take away the short-term seasonal and random variation, leaving behind a combined trend and cyclical movement. a. True b. False

T

The residual is defined as the difference between the actual and predicted, or fitted values of the response variable. a. True b. False

T

The residuals are observations of the error variable . Consequently, the minimized sum of squared deviations is called the sum of squared error, labeled SSE. a. True b. False

T

The runs test is a formal test of the null hypothesis of randomness. If there are too many or too few runs in the series, then we conclude that the series is not random. a. True b. False

T

The sampling distribution of any point estimate (such as the sample mean or proportion) is the distribution of the point estimates we would obtain from all possible samples of a given size drawn from the population. a. True b. False

T

The sampling distribution of the mean will have the same mean as the original population from which the samples were drawn. a. True b. False

T

The significance level also determines the rejection region. a. True b. False

T

The size of a sample can be selected by first determining the desired standard error and then using the formula to calculate n. SE(x) = omega/sqrt(n) to calculate n a. True b. False

T

The smoothing constant used in simple exponential smoothing is analogous to the span in moving averages. a. True b. False

T

The smoothing constants in exponential smoothing models are effectively a way to assign different weights to past levels, trends and cycles in the data. a. True b. False

T

The standard error of an estimate is the standard deviation of the sampling distribution of the estimate. It measures how much estimates from different samples vary. a. True b. False

T

The standard error of the estimate measures how much estimates vary from sample to sample. a. True b. False

T

The t-distribution and the standard normal distribution are practically indistinguishable as the degrees of freedom increase. a. True b. False

T

The temperature of the room in which you are writing this test is a continuous random variable. a. True b. False

T

The test statistic employed to test H0: omega1^2/omega2^2 = 1 is F = s1^2/s2^2, which is F distributed with n1-1 and n2-1 degrees of freedom. a. True b. False

T

The test statistic for a hypothesis test of a population proportion is the z-value. a. True b. False

T

The time series component that reflects a wavelike pattern describing a long-term trend that is generally apparent over a number of years is called cyclical. a. True b. False

T

The time students spend in a computer lab during one day is an example of a continuous random variable. a. True b. False

T

The total area under the normal distribution curve is equal to one. a. True b. False

T

The two primary objectives of regression analysis are to study relationships between variables and to use those relationships to make predictions. a. True b. False

T

The upper limit of the 90% confidence interval for the population proportion p, given that n = 100; and p = 0.20 is 0.2658. a. True b. False

T

The value of the mean times the number of observations equals the sum of all of the data values. a. True b. False

T

The value of the sum of squares due to regression, SSR, can never be larger than the value of the sum of squares total, SST. a. True b. False

T

The variance of a binomial distribution is given by the formula, where n is the number of trials, and p is the probability of success in any trial. a. True b. False

T

To form a scatterplot of X versus Y, X and Y must be paired variables. a. True b. FalseT

T

Two common ways of displaying categorical data is column charts and pie charts. a. True b. False

T

Two events A and B are said to mutually be exclusive if P(A and B) = 0. a. True b. False

T

Two events are said to be independent when knowledge of one event is of no value when assessing the probability of the other. a. True b. False

T

Two or more events are said to be exhaustive if one of them must occur. a. True b. False

T

Two or more events are said to be mutually exclusive if at most one of them can occur. a. True b. False

T

Type I errors are usually considered more "costly" which can lead to conservative decision making. a. True b. False

T

Using dummy variables is an efficient way of determining counts of categorical variables. a. True b. False

T

Using the standard normal curve, the Z- score representing the 75th percentile is 0.674. a. True b. False

T

Using the standard normal distribution, the Z- score representing the 99th percentile is 2.326. a. True b. False

T

We can form a confidence interval for the population total T by finding a confidence interval for the population mean in the usual way, and then multiplying the lower and upper limits the confidence interval by the population size N. a. True b. False

T

We should include an interaction variable in a regression model if we believe that the effect of one explanatory variable X1 on the response variable Y depends on the value of another explanatory variable. a. True b. False

T

When samples of size n are drawn from a population, then the sampling distribution of the sample mean is approximately normal, provided that n is reasonably large. a. True b. False

T

When the sample size is greater than 5% of the population, the formula for the standard error of the mean should be modified with a finite population correction. a. True b. False

T

When the scatterplot appears as a shapeless swarm of points, this can indicate that there is no relationship between the response variable Y and the explanatory variable X, or at least none worth pursuing. a. True b. False

T

When we wish to determine the probability that at least one of several events will occur, we would use the addition rule. a. True b. False

T

You think you have a 90% chance of passing your statistics class. This is an example of subjective probability. a. True b. False

T

You will always get more accurate forecasts by using more complex forecasting methods. a. True b. False

T

In regression analysis, the ANOVA table analyzes a. the variation of the response variable Y. b. the variation of the explanatory variable X. c. the total variation of all variables. d. some of the variation in the explanatory variable and some of the variation in the response variable.

a. the variation of the response variable Y.

A time series can consist of four different components: trend, seasonal, cyclical, and random (or noise). a. True b. False

T

A time series is any variable that is measured over time in sequential order. a. True b. False

T

A trend component of a time series is a long-term, relatively smooth pattern or direction exhibited by a series, and its duration is more than one year. a. True b. False

T

The opportunity for nonsampling error is larger when the a. sample size is large. b. sample size is small. c. population size is small. d. population size is large.

a. sample size is large.

. Non-truthful response is a particular problem when a. sensitive questions are asked. b. surveys are anonymous. c. interviewers are not trained. d. the sample is from an unusual population.

a. sensitive questions are asked.

A useful graph in almost any regression analysis is a scatterplot of residuals (on the vertical axis) versus fitted values (on the horizontal axis), where a "good" fit not only has small residuals, but it has residuals scattered randomly around zero with no apparent pattern. a. True b. False

T

Abby has been keeping track of what she spends to stream movies. The last seven week's expenditures, in dollars, were 6, 4, 8, 9, 6, 12, and 4. The mean amount Abby spent streaming movies over these 7 weeks is $7.the a. True b. False

T

An autocorrelation is a type of correlation used to measure whether the values of a time series are related to their own past values. a. True b. False

T

An exponential distribution with parameter = 0.2 has mean and standard deviation both equal to 5. a. True b. False

T

An exponential trend is appropriate when the time series changes by a constant percentage each period. a. True b. False

T

An outlier is an observation that falls outside of the general pattern of the rest of the observations on a scatterplot. a. True b. False

T

An unbiased estimate is a point estimate such that the mean of its sampling distribution is equal to the true value of the population parameter being estimated. a. True b. False

T

Correlation is not useful for describing the strength and direction of nonlinear relationships. a. True b. False

T

Correlation is used to determine the strength of the linear relationship between an explanatory variable X and response variable Y. a. True b. False

T

Correlogram is a bar chart of autocorrelation at different lags. a. True b. False

T

Econometric forecasting models, also called causal models, use regression to forecast a time series variable by using other explanatory time series variables. a. True b. False

T

Estimation is the process of inferring the value of an unknown population parameter using data from a random sample drawn from the population. a. True b. False

T

Every form of exponential smoothing model has at least one smoothing constant, which is always between 0 and 1. a. True b. False

T

Extrapolation forecasting methods are quantitative methods that use past data of a time series variable - and nothing else, except possible time itself - to forecast values of the variable. a. True b. False

T

Football teams toss a coin to see who will get their choice of kicking or receiving to begin a game. The probability that given team will win the toss three games in a row is 0.125. a. True b. False

T

Heteroscedasticity means that the variability of Y values is larger for some X values than for others. a. True b. False

T

Homoscedasticity means that the variability of Y values is the same for all X values. a. True b. False

T

If P(A and B) = 1, then A and B must be collectively exhaustive. a. True b. False

T

If a categorical variable is to be included in a multiple regression, a dummy variable for each category of the variable should be used, but the original categorical variables should not be used. a. True b. False

T

If a histogram of a data set is symmetric and bell shaped, with a mean of 75 and standard deviation of 10. Then, approximately 95% of the data values will be between 55 and 95. a. True b. False

T

If a null hypothesis is rejected at the 0.01 level of significance, then it will also be rejected at the 0.025 level. a. True b. False

T

If a sample has 20 observations and a 95% confidence estimate for is needed, the appropriate value of t-multiple is 2.093 a. True b. False

T

If a time series exhibits an exponential trend, then a plot of its logarithm should be approximately linear. a. True b. False

T

If exact multicollinearity exists, redundancy exists in the data. a. True b. False

T

The local police department is interested in estimating the number of cars that fail to stop at a stop sign during a specified lunch hour. Which probability distribution should they use? a. Binomial distribution b. Poisson distribution c. Normal distribution d. Uniform distribution

b. poisson distribution

The approximate standard error of the sample mean is calculated as a. omega/sqrt(n) b. s/sqrt(n) c. 2omega/sqrt(n) d. 2s/sqrt(n)

b. s/sqrt(n)

Which of the following statements is correct? a. A confidence interval describes a range of values that is likely not to include the actual population parameter b. A confidence interval is an estimate of the range for a sample statistic. c. A confidence interval is an estimate of the range of possible values for a population parameter. d. A confidence interval describes a range of values that will always include the actual population parameter.

c. A confidence interval is an estimate of the range of possible values for a population parameter.

Time series data often exhibits which of the following characteristics? a. Homoscedasticity b. Heteroscedasticity c. Autocorrelation d. Multicollinearity

c. Autocorrelation

Which approach can be used to test for autocorrelation? a. Regression coefficient b. Correlation coefficient c. Durbin-Watson statistic d. F-test or t-test

c. Durbin-Watson statistic

The four areas of a pivot table are a. Crosstabs, Fields, Rows, and Columns. b. Data, Count, Contingency, and Percentage. c. Filters, Rows, Columns, and Values. d. Sort, Rows, Columns, and Count.

c. Filters, Rows, Columns, and Values.

Which statement is true regarding regression error, ε? a. It is the same as a residual. b. It can be calculated from the predicted observations. c. It cannot be calculated from the observed data. d. It is unbiased.

c. It cannot be calculated from the observed data.

Which of the following is not a method for dealing with seasonality in data? a. Winter's exponential smoothing model b. Deseasonalizing the data, using any forecasting model, then reseasonalizing the data c. Multiple regression with lags for the seasons d. Multiple regression with dummy variables for the seasons

c. Multiple regression with lags for the seasons

Which of the following statements is true? a. Probabilities must be negative. b. Probabilities must be greater than 1. c. The sum of all probabilities for a random variable must be equal to 1. d. The sum of all probabilities for a random variable must be equal to 0.

c. The sum of all probabilities for a random variable must be equal to 1.

If the value of the standard normal random variable Z is positive, then the original score is where in relationship to the mean? a. Equal to the mean b. To the left of the mean c. To the right of the mean d. None of these choices

c. To the right of the mean

The objective typically used in the tree types of equation-building procedures is to find the equation with a. a small se. b. a large R2. c. a small se and a large R2. d. the smallest F-ratio.

c. a small se and a large R2

An important condition when interpreting the coefficient for a particular independent variable X in a multiple regression equation is that a. the dependent variable will remain constant. b. the dependent variable will be allowed to vary. c. all of the other independent variables remain constant. d. all of the other independent variables be allowed to vary.

c. all of the other independent variables remain constant.

Forward regression a. begins with all potential explanatory variables in the equation and deletes them one at a time until further deletion would do more harm than good. b. adds and deletes variables until an optimal equation is achieved. c. begins with no explanatory variables in the equation and successively adds one at a time until no remaining variables make a significant contribution. d. randomly selects the optimal number of explanatory variables to be used.

c. begins with no explanatory variables in the equation and successively adds one at a time until no remaining variables make a significant contribution.

Confidence intervals are a function of the a. population, the sample, and the standard deviation. b. sample, the variable of interest, and the degrees of freedom. c. data in the sample, the confidence level, and the sample size. d. sampling distribution, the confidence level, and the degrees of freedom.

c. data in the sample, the confidence level, and the sample size.

Sampling done without replacement means that a. only certain members of the population can be sampled. b. each member of the population can be sampled repeatedly. c. each member of the population can be sampled only once. d. each member of the population can be sampled twice.

c. each member of the population can be sampled only once.

Which of the following is not a consideration when determining appropriate sample size? a. the cost of sampling b. the timely collection of the data c. interviewer fatigue d. the likelihood of nonsampling error

c. interviewer fatigue

The shape of a chi-square distribution a. is symmetric. b. is skewed to the left. c. is skewed to the right. d. depends on the sample data.

c. is skewed to the right.

The covariance is not used as much as the correlation because a. it is not always a valid predictor of linear relationships. b. it is difficult to calculate. c. it is difficult to interpret because it depends on the units of measurement. d. of all of these options.

c. it is difficult to interpret because it depends on the units of measurement.

The most common form of autocorrelation is positive autocorrelation, in which a. large observations tend to follow both large and small observations. b. small observations tend to follow both large and small observations. c. large observations tend to follow large observations and small observations tend to follow small observations. d. large observations tend to follow small observations and small observations tend to follow large observations.

c. large observations tend to follow large observations and small observations tend to follow small observations.

Outliers are observations that a. lie outside the sample. b. render the study useless. c. lie outside the typical pattern of points on a scatterplot. d. disrupt the entire linear trend.

c. lie outside the typical pattern of points on a scatterplot.

Correlation is useful only for a. assessing the weakness of a linear relationship. b. conveying the same information in a simpler format than a scatterplot. c. measuring the strength of a linear relationship. d. measuring the strength of a nonlinear relationship.

c. measuring the strength of a linear relationship.

The finite population correction factor,sqrt((N-n)/(N-1)) , should generally be used when a. N is any finite size. b. n is less than 5% of the population size N. c. n is greater than 5% of the population size N. d. n is any finite size.

c. n is greater than 5% of the population size N.

The chi-square goodness-of-fit test can be used to test for a. significance of sample statistics. b. difference between population means. c. normality. d. difference between population variances.

c. normality.

The idea behind the runs test is that a random number series should have a number of runs that is a. large. b. small. c. not large or small. d. constant.

c. not large or small.

A parameter such as is sometimes referred to as a(n) _____ parameter, because many times, we need its value even though it is not the parameter of primary interest. a. dependent b. random c. nuisance d. independent

c. nuisance

The tool that provides useful information about a data set by breaking it down into categories is a a. histogram. b. scatterplot. c. pivot table. d. spreadsheet.

c. pivot table.

There are a variety of deseasonalizing methods, but they are typically variations of a. ratio-to-seasonality methods. b. ratio-to-exponential-smoothing methods. c. ratio-to-moving-average methods. d. linear trend.

c. ratio-to-moving-average methods.

The percentage of variation () can be interpreted as the fraction (or percent) of variation of the a. explanatory variable explained by the independent variable. b. explanatory variable explained by the regression line. c. response variable explained by the regression line. d. error explained by the regression line.

c. response variable explained by the regression line.

A two-tailed test is one where _____ lead to rejection of the null hypothesis. a. results in only one direction can b. negative sample means c. results in either of two directions can d. no results

c. results in either of two directions can

The two basic sources for error when using random sampling are _____ error. a. sampling and selection b. identification and selection c. sampling and nonsampling d. bias and randomness

c. sampling and nonsampling

A judgmental sample is a sample in which the a. sampling units are chosen using a random number table. b. quality of sampling units judged before they are added to the sample. c. sampling units are chosen according to the sampler's judgment. d. sampling units condemn the sampling method used.

c. sampling units are chosen according to the sampler's judgment.

The value set for is known as the a. rejection level. b. acceptance level. c. significance level. d. type II error in the hypothesis test.

c. significance level.

A sample chosen in such a way that every possible subset of same size has an equal chance of being selected is called a _____sample. a. cluster b. systematic random c. simple random d. stratified random

c. simple random

In the standardized value (b-B)/s the symbol represents the a. mean of b b. variance of b. c. standard error of b d. degrees of freedom of b

c. standard error of b

Selecting a random sample from each identifiable subgroup within a population is called _____ sampling. a. simple random b. systematic c. stratified d. cluster

c. stratified

Which of the following would be considered a definition of an outlier? a. An extreme value for one or more variables b. A value whose residual is abnormally large in magnitude c. Values for individual explanatory variables that fall outside the general pattern of the other observations d. All of these choices

d. All of these choices

A continuous probability distribution is characterized by a. a list of possible values. b. counts. c. an array of individual values. d. a continuum of possible values.

d. a continuum of possible values

In a moving averages method, which of the following represent(s) the number of terms in the moving average? a. A smoothing constant b. The explanatory variables c. An alpha value d. A span

d. a span

According to the empirical rule, how many observations lie within +/- 3 standard deviation from the mean? a. 50% b. 68% c. 95% d. Almost all

d. almost all

The t-value for testing H0: B1 = 0 is calcuated using which of the following equations? a. n - k - 1 b. sum (X/Y) c. B1 / s1 d. b/s

d. b/s

The tables of counts that result from pivot tables are often called a. samples. b. sub-tables. c. populations. d. crosstabs.

d. crosstabs

A regression approach can also be used to deal with seasonality by using ____ variables for the seasons. a. smoothing b. response c. residual d. dummy

d. dummy

For a given confidence level, the procedure for controlling interval length usually begins with the specification of the a. point estimate. b. population standard deviation, s. c. sample standard deviation, s. d. interval half-length, B.

d. interval half-length, B.

Let A and B be the events of the FDA approving and rejecting a new drug to treat hypertension, respectively. The events A and B are a. independent. b. conditional. c. unilateral. d. mutually exclusive

d. mutually exclusive

A scatterplot that appears as a shapeless mass of data points indicates _____ relationship among the variables. a. a curved b. a linear c. a nonlinear d. no

d. no

Changing the location of fields in a pivot table is known as a. slicing. b. dicing. c. sorting. d. pivoting.

d. pivoting

Which measure of variability is defined as the maximum value of a data set minus the minimum value of a data set? a. Variance b. Standard deviation c. Interquartile range d. Range

d. range

If two events are independent, what is the probability that they both occur? a. 0 b. 0.50 c. 1.00 d. This cannot be determined from the information given.

d. this cannot be determined from the information given

If two events are mutually exclusive, what is the probability that one or the other occurs? a. 0.25 b. 0.50 c. 1.00 d. This cannot be determined from the information given.

d. this cannot be determined from the information given

The approximate 95% confidence interval for a population mean is a. xbar +- omega/sqrt(n) b. xbar +- s/sqrt(n) c. xbar +- 2omega/sqrt(n) d. xbar +- 2s/sqrt(n)

d. xbar +-2s/sqrt(n)

Which of the following statements is true regarding the chi-square goodness-of-fit test for normality? a. The test does depend on which and how many categories we use for the histogram. b. The test is not very effective unless the sample size is large, say, at least 80 or 100. c. The test tends to be too sensitive if the sample size is really large. d. None of these choices is true. e. Choices a, b, and c are all true.

e. Choices a, b, and c are all true.

The sampling distribution of the mean will have the same standard deviation as the original population from which the samples were drawn. a. True b. False

f

The lower limit of the 95% confidence interval for the population proportion p, given that n = 300; and p = 0.10 is 0.1339. a. True b. False

F

To deseasonalize an observation (assuming a multiplicative model of seasonality), multiply it by the appropriate seasonal index. a. True b. False

F

To help explain or predict the response variable in every regression study, we use one or more explanatory variables. These variables are also called response variables or independent variables. a. True b. False

F

Two events A and B are said to be independent if P(A and B) = P(A) + P(B). a. True b. False

F

Two or more events are said to be exhaustive if at most one of them can occur. a. True b. False

F

Unlike histograms, box plots depict only one aspect of a variable. a. True b. False

F

We can measure the accuracy of judgmental samples by applying some simple rules of probability. This way, judgmental samples are not likely to contain our built-in biases. a. True b. False

F

We compare the percent of variation explained R2 for a regression model with seasonal dummy variables to the MAPE for the smoothing model with seasonality to see which model is more accurate. a. True b. False

F

When calculating a confidence interval for the difference between two population proportions, we multiply the standard error of the difference between sample proportions by t-multiple. a. True b. False

F

When testing the equality of two population variances, the test statistic is the ratio of the population variances; namely omega1^2/omega2^2. a. True

F

When two events are independent, they are also mutually exclusive. a. True b. False

F

From a sample of 500 items, 30 were found to be defective. The point estimate of the population proportion defective will be a. 0.06. b. 30.0. c. 470. d. 0.94.

a. 0.06.

Approximately what percentage of the observed Y values are within one standard error of the estimate () of the corresponding fitted Y values? a. 67% b. 95% c. 99% d. 99.7%

a. 67%

Which of the following best describes the concept of probability? a. It is a measure of the likelihood that a particular event will occur. b. It is a measure of the likelihood that a particular event will occur, given that another event has already occurred. c. It is a measure of the likelihood of the simultaneous occurrence of two or more events. d. None of these choices describe the concept of probability.

a. It is a measure of the likelihood that a particular event will occur.

The general form of a confidence interval is a. Point Estimate ± Multiple × Standard Error. b. Point Estimate ± Multiple + Standard Error. c. Point Estimate ± Multiple - Standard Error. d. Point Estimate ± Multiple ± Standard Error.

a. Point Estimate ± Multiple × Standard Error.

The binomial probability distribution is used with a. a discrete random variable. b. a continuous random variable. c. either a discrete or a continuous random variable, depending on the variance. d. either a discrete or a continuous random variable, depending on the sample size.

a. a discrete random variable.

In regression analysis, extrapolation is performed when you a. attempt to predict beyond the limits of the sample. b. have to estimate some of the explanatory variable values. c. have to use a lag variable as an explanatory variable in the model. d. do not have observations for every period in the sample.

a. attempt to predict beyond the limits of the sample.

In a random series, successive observations are probabilistically independent of one another. If this property is violated, the observations are said to be a. autocorrelated. b. intercorrelated. c. causal. d. seasonal.

a. autocorrelated.

If you are constructing a confidence interval for a single mean, the length of the confidence interval will _____ with an increase in the sample size. a. decrease b. increase c. stay the same d. increase or decrease, depending on the sample data,

a. decrease

When the samples we want to compare pair in some natural way, such as a pretest/posttest for each person or husband/wife pairs, a more appropriate form of analysis is to not construct a confidence interval treating them as two separate variables, but instead to construct a confidence interval to estimate the mean a. difference. b. sum. c. ratio. d. total.

a. difference.

. Another term for constant error variance is a. homoscedasticity. b. heteroscedasticity. c. autocorrelation. d. multicollinearity.

a. homoscedasticity.

Larger p-values indicate more evidence in support of the a. null hypothesis. b. alternative hypothesis. c. type II error. d. type I error.

a. null hypothesis.

The standard deviation of a probability distribution is a measure of a. variability of the distribution. b. central location. c. the shape of the distribution. d. skewness of the distribution.

a. variability of the distribution.

Given that Z is a standard normal random variable, P(-1.0Z1.5) is a. 0.7745. b. 0.8413. c. 0.0919. d. 0.9332.

a. 0.7745

The probability of an event and the probability of its complement always sum to a. 1. b. 0. c. any value between 0 and 1. d. any positive value.

a. 1

The mean of a data set is 75 and one observation has the value of 65. What is the squared deviation of the observation, 65, from the mean? a. 100 b. 20 c. 400 d. 10

a. 100

An administrator at Lakeside Middle School is interested in examining the probability of a child being late to school. The child is categorized as either late or not late. What type of distribution should the school use to examine this issue? a. Binomial distribution b. Normal distribution c. Exponential distribution d. Poisson distribution

a. binomial distribution

There are two types of random variables, they are a. discrete and continuous. b. exhaustive and mutually exclusive. c. complementary and cumulative. d. real and unreal.

a. discrete and continuous

Where will you find "time" on a time series graph? a. horizontal axis b. first column c. vertical axis d. last column

a. horizontal axis

The difference between the first and third quartile is called the a. interquartile range. b. range. c. standard deviation. d. variance.

a. interquartile range

. If a value represents the 95th percentile, this means that 95% of all values in the data set are _____this value. a. less than or equal to b. greater than c. less than d. greater than or equal to e. different than

a. less than or equal to

. In a box plot, the asterisk inside the box indicates the location of the a. mean. b. median. c. minimum value. d. maximum value.

a. mean

What are the three most common measures of central tendency? a. Mean, median, and mode b. Mean, variance, and standard deviation c. Mean, median, and variance d. Mean, median, and standard deviation

a. mean, median, and mode

The median can also be described as the a. middle observation when the data values are arranged in ascending order. b. best estimate of the variability in a skewed distribution. c. second percentile. d. the average of all values.

a. middle observation when the data values are arranged in ascending order

Excel® stores dates as a. numbers. b. variables. c. records. d. text.

a. numbers

Probabilities that can be estimated from long-run relative frequencies of events are called _____ probabilities. a. objective b. subjective c. complementary d. joint

a. objective

The law of large numbers is relevant to the estimation of _____ probabilities. a. objective b. subjective c. both objective and subjective d. neither objective nor subjective

a. objective

Winters' model differs from Holt's model and simple exponential smoothing in that it includes an index for a. seasonality. b. trend. c. residuals. d. cyclical fluctuations.

a. seasonality

A histogram that is positively skewed may also be described as a. skewed to the right. b. skewed to the left. c. balanced. d. symmetric.

a. skewed to the right

Which statement is true for the following data values: 7, 5, 6, 4, 7, 8, and 12? a. The mean, median, and mode are all equal. b. Only the mean and median are equal. c. Only the mean and mode are equal. d. Only the median and mode are equal.

a. the mean, median, and mode are all equal

There is an approximately _____% chance that any particular will be within two standard deviations of the population mean (). a. 90 b. 95 c. 99 d. 99.7

b. 95

Which probability distribution applies to the number of events occurring within a specified period of time or space? a. Binomial distribution b. Poisson distribution c. Any discrete probability distribution d. Any continuous probability distribution

b. Poisson distribution

What is the idea behind the chi-square test for independence? a. To compare the quantile-quantile (Q-Q) plot with what would be expected under independence b. To compare the actual counts in a contingency table with what would be expected under independence c. To compare the cumulative distribution with what would be expected under independence d. To compare the normal distribution with a chi-squared distribution

b. To compare the actual counts in a contingency table with what would be expected under independence

What does a scatterplot illustrate? a. The median of the variables b. What type of relationship there is between two variables c. The percentage of values that fall in a particular category d. The variability of the middle 50% of the data

b. What type of relationship there is between two variables

A 95% confidence interval can be used to reject the null hypothesis of a two-sided test at the 5% significance level if and only if a. a 95% confidence interval includes the hypothesized value of the parameter. b. a 95% confidence interval does not include the hypothesized value of the parameter. c. the null hypothesis is less than 0.05. d. the null hypotheses falls in the rejection region.

b. a 95% confidence interval does not include the hypothesized value of the parameter

The runs test uses a series of 0's and 1's. The 0's and 1's typically represent whether each observation is a. above or below the predicted value of Y. b. above or below the mean value of Y. c. is above or below the mean value of the previous two observations. d. is positive or negative.

b. above or below the mean value of Y.

Two independent samples of sizes 50 and 50 are randomly selected from two populations to test the difference between the population means, u1-u2. The sampling distribution of the sample mean difference x1-x2 is a. normally distributed. b. approximately normal. c. t - distributed with 98 degrees of freedom. d. chi-squared distributed with 99 degrees of freedom.

b. approximately normal.

The averaging effect means that as you average more and more observations from a given distribution, the variance of the average a. increases. b. decreases. c. is unaffected. d. could either increase, decrease, or stay the same.

b. decreases.

Many statistical packages have three types of equation-building procedures. They are a. forward, linear, and non-linear. b. forward, backward, and stepwise. c. simple, complex, and stepwise. d. inclusion, exclusion, and linear.

b. forward, backward, and stepwise.

If you increase the confidence level, the length of the confidence interval a. decreases. b. increases. c. stays the same. d. may increase or decrease, depending on the sample data.

b. increases.

A point that "tilts" the regression line toward it, is referred to as a(n) _____ point. a. magnetic b. influential c. extreme d. explanatory

b. influential

The sampling method in which a population is divided into blocks and then selected by choosing a random mechanism is called a _____ sampling. a. systematic random b. simple random c. stratified d. cluster

b. simple random

The moving average method can also be referred to as a(n) ____ method. a. causal b. smoothing c. exponential d. econometric

b. smoothing

The standard error of the estimate () is essentially the a. mean of the residuals. b. standard deviation of the residuals. c. mean of the explanatory variable. d. standard deviation of the explanatory variable.

b. standard deviation of the residuals

The accuracy of the point estimate is measured by its a. standard deviation. b. standard error. c. sampling error. d. nonsampling error.

b. standard error.

If systematic sampling is chosen as the sampling technique, it is probably because a. systematic sampling has better statistical properties than simple random sampling. b. systematic sampling is more convenient. c. systematic sampling always results in more representative sampling than simple random sampling. d. systematic sampling gives every possible sample of the same size from the population an equal chance of being selected.

b. systematic sampling is more convenient.

A restaurant manager claims that food servers at this restaurant make at least $15 per hour in tips. A random sample of 50 hours shows the mean amount of tips to be $12 per hour with standard deviation $0.75 per hour. What is the value of the test statistic that can be used to determine if there is convincing evidence that the mean amount of tips made per hour is less than the manager claimed? a. t = -7.165 b. t = -0.566 c. t = 9.172 d. t = -0.08

b. t = -0.566

Smaller p-values indicate more evidence in support of a. the null hypothesis. b. the alternative hypothesis. c. the quality of the researcher. d. none of these choices.

b. the alternative hypothesis.

In sampling, a population is a. the set of all humans. b. the set of all members about which a study intends to make inferences. c. the set of all members from whom data was collected. d. a random group of individuals, households, cities, or countries.

b. the set of all members about which a study intends to make inferences.

The null and alternative hypotheses divide all possibilities into a. two sets that overlap. b. two non-overlapping sets. c. two sets that may or may not overlap. d. as many sets as necessary to cover all possibilities.

b. two non-overlapping sets.

The chi-square and F-distributions are used primarily to make inferences about population ____. a. means b. variances c. medians d. proportions

b. variances

An unbiased estimator is a sample statistic a. used to approximate a population parameter. b. which has an expected value equal to the value of the population parameter. c. whose value is usually less than the population parameter. d. that arises from samples that are of size 30 or fewer.

b. which has an expected value equal to the value of the population parameter.

The test statistic in a hypothesis test for a population proportion is the a. t-value calculated from the sample. b. z-value calculated from the sample. c. F-value calculated from the sample. d. the sample proportion.

b. z-value calculated from the sample.

The correlation value ranges from a. 0 to +1. b. -1 to +1. c. -2 to +2. d. 0 to 100.

b. -1 to + 1

If the random variable X is exponentially distributed with parameter = 1.5, then P(2X 4), up to 4 decimal places, is a. 0.6667. b. 0.0473. c. 0.5000. d. 0.2500.

b. 0.0473

If A and B are any two events with P(A) = 0.8 and P(B|) = 0.7, then P(and B) is a. 0.56. b. 0.14. c. 0.24. d. none of these choices.

b. 0.14

The probabilities shown in a table with two rows, A1 and A2 and two columns, B1 and B2, are as follows: P( A1 and B1 ) = 0.10, P( A1 and B2 ) = 0.30, P(A2 and B1) = 0.05, and P(A2 and B2) = 0.55. Then P(A1|B2) is a. 0.33. b. 0.35. c. 0.65. d. 0.67.

b. 0.35

Suppose that a simple exponential smoothing model is used (with = 0.40) to forecast monthly sandwich sales at a local sandwich shop. The forecasted demand for September was 1560 and the actual demand was 1480 sandwiches. Given this information, what would be the forecast number of sandwiches for October? a. 1480 b. 1528 c. 1560 d. 1592

b. 1528

The following are the values of a time series for the first four time periods: t 1 2 3 4 24 25 26 27 Using a four-period moving average, the forecasted value for time period 5 is a. 24.5. b. 25.5. c. 26.5. d. 27.5.

b. 25.5

After calculating the sample size needed to estimate a population proportion to within 0.05, you have been told that the maximum allowable error (B) must be reduced to just 0.025. If the original calculation led to a sample size of 1000, the sample size will now have to be a. 2000. b. 4000. c. 1000. d. 8000.

b. 4000

One-tailed alternative hypotheses use the _____ symbol. a. not equal b. < or > c. = d. <= or >=

b. < or >

The approprate hypothesis test for a regression coefficient is a. H0: B not+ 0, Ha: B = 0 b. H0: B = 0, Ha: Bnot= 0 c. H0: B = 1, Ha: Bnot=1 d. none of these choices

b. H0: B = 0, Ha: Bnot= 0

The appropriate hypothesis test for an ANOVA test is a. H0: all B not+ 0, Ha: at least one B = 0 b. H0: all B = 0, Ha: at least one B not= 0 c. H0: at least one B not+0, Ha: all B = 0 d. H0: at least one B = 0, Ha: all B not= 0

b. H0: all B = 0, Ha: at least one B not= 0

If P(A) = P(A|B), then events A and B are said to be a. mutually exclusive. b. independent. c. dependent. d. complementary.

b. independent

A function that associates a numerical value with each possible outcome of an uncertain event is called a _____ variable. a. conditional b. random c. population d. sample

b. random

Probabilities that cannot be estimated from long-run relative frequencies of events are called a. objective probabilities. b. subjective probabilities. c. complementary probabilities. d. joint probabilities.

b. subjective probabilities

Holt's model differs from simple exponential smoothing in that it includes a term for a. seasonality. b. trend. c. residuals. d. cyclical fluctuations.

b. trend

. Suppose there are 500 accounts in a population. You sample 50 of them and find a sample mean of $500. What would be your estimate for the population total? a. $5,000 b. $50,000 c. $250,000 d. $500,000

c. $250,000

Which pair of tests is used to test for normality? a. A t-test and an ANOVA test b. An empirical cumulative distribution function test and an F-test c. A chi-square test and a Lilliefors test d. A quantile-quantile plot and a p-value test

c. A chi-square test and a Lilliefors test

Which of the following are considered numerical summary measures? a. Mean and variance b. Variance and correlation c. Correlation and covariance d. Covariance and variance

c. Correlation and covariance

Data collected from approximately the same period of time from a cross-section of a population are called _____ data. a. time series b. linear c. cross-sectional d. historical

c. cross-sectional

If two events are collectively exhaustive, what is the probability that one or the other occurs? a. 0.25 b. 0.50 c. 1.00 d. This cannot be determined from the information given.

c. 1.00

With symmetric, "bell-shaped" distributions, approximately what percent of the observations are within two standard deviations of the mean? a. 50% b. 68% c. 95% d. 99.7%

c. 95%

Scatterplots are also referred to as a. crosstabs. b. contingency charts. c. X-Y charts. d. all of these choices

c. X-Y charts

What is a component of a time series? a. Base series b. Trend c. Seasonal component d. All of these choices

d. All of these choices

If two events are collectively exhaustive, what is the probability that both occur at the same time? a. 0.00 b. 0.50 c. 1.00 d. This cannot be determined from the information given.

d. This cannot be determined from the information given.

. In linear regression, a dummy variable is used to a. represent residual variables. b. represent missing data in each sample. c. include hypothetical data in the regression equation. d. include categorical variables in the regression equation.

d. include categorical variables in the regression equation.

Identifiable subpopulations within a population are called a. clusters. b. samples. c. blocks. d. strata.

d. strata.

The forecast error is the difference between a. this period's value and the next period's value. b. the average value and the expected value of the response variable. c. the explanatory variable value and the response variable value. d. the actual value and the forecast value.

d. the actual value and the forecast value.

The length of the box in the box plot portrays the a. mean. b. median. c. range. d. interquartile range.

d. interquartile range

A meandering pattern is an example of a random time series. a. True b. False

F

If two samples contain the same number of observations, then the data must be paired. a. True b. False

F

In a simple regression analysis, if the standard error of estimate S= 15 and the number of observations n = 10, then the sum of the residuals squared must be 120. a. True b. False

F

In a simple regression with a single explanatory variable, the multiple R is the same as the standard correlation between the Y variable and the explanatory X variable. a. True b. False

F

In general, increasing the confidence level will narrow the confidence interval, and decreasing the confidence level widens the interval. a. True b. False

F

In multiple regression, the problem of multicollinearity affects the t-tests of the individual coefficients as well as the F-test in the analysis of variance for regression, because the F-test combines these t-tests into a single test. a. True b. False

F

In order to construct a confidence interval estimate of the population mean, the value of must be given. a. True b. False

F

In regression analysis, the unexplained part of the total variation in the response variable Y is referred to as the sum of squares due to regression, SSR. a. True b. False

F

The analyst gets to choose the significance level . It is typically chosen to be 0.50, but it is occasionally chosen to be 0.01. a. True b. False

F

The necessary sample size can only be determined for estimation problems regarding proportions because we do not know the value of the population standard deviation needed to calculate the necessary sample size for problems dealing with a population mean(s). a. True b. False

F

The number of cars produced by GM during a given quarter is a continuous random variable. a. True b. False

F

The time series component that reflects a long-term, relatively smooth pattern or direction exhibited by a time series over a long time period, is called seasonal. a. True b. False

F

There are four quartiles that divide the values in a data set into four equal parts. a. True b. False

F

Time series graphs chart the values of one or more time series, using time on the vertical axis. a. True b. False

F

To calculate the five-period moving average for a time series, we average the values in the two preceding periods, and the values in the three following time periods. a. True b. False

F

Using the standard normal curve, the Z- score representing the 10th percentile is 1.28. a. True b. False

F

Using the standard normal distribution, the Z-score representing the 5th percentile is 1.645. a. True b. False

F

Voluntary response bias occurs when the responses to questions do not reflect what the investigator had in mind. a. True b. False

F

We compute the five-period moving averages for all time periods except the first two. a. True b. False

F

When there is a group of explanatory variables that are in some sense logically related, all of them must be included in the regression equation. a. True b. False

F

When we sample less than 5% of the population, the finite population correction factor; fpc = sqrt((N-n)(N-1)), is used to modify the formula for the standard error of the sample mean. a. True b. False

F

Winter's method is an exponential smoothing method, which is appropriate for a series with trend but no seasonality. a. True b. False

F

A Q-Q plot can be used to test for normality. a. True b. False

T

A null hypothesis is a statement about the value of a population parameter. It is usually the current thinking, or "status quo".

T

A point estimate is a single numeric value, a "best guess" of a population parameter, calculated from the sample data. a. True b. False

T

A regression analysis between X = sales (in $1000s) and Y = advertising ($) resulted in the following least squares line: Yhat= 84 +7X. This implies that if advertising is $800, then the predicted amount of sales (in dollars) is $140,000. a. True b. False

T

A regression analysis between X = sales (in $1000s) and Y = advertising ($) resulted in the following least squares line: Yhat= 84 +7X. This implies that if there is no advertising, then the predicted amount of sales (in dollars) is $84,000. a. True b. False

T

A regression analysis between weight (Y in pounds) and height (X in inches) resulted in the following least squares line: Y= 140 + 5X. This implies that if the height is increased by 1 inch, the weight is expected to increase on average by 5 pounds. a. True b. False

T

A sample of size 20 is selected at random from a population of size N. If the finite population correction factor is 0.9418, then N must be 169. a. True b. False

T

A trend line on a scatterplot is a line or a curve that "fits" the scatter as well as possible. a. True b. False

T

An equation for the random walk model is given by the equation: DY = u + e , where DY is the change in the time series from time t to time t - 1, is a constant, and e is a random variable (noise) with mean 0 and some standard deviation . a. True b. False

T

An estimator is said to be biased if the mean of its sampling distribution is not equal to the value of the population parameter being estimated. a. True b. False

T

An example of a joint category of two variables is the count of all non-drinkers who are also nonsmokers. a. True b. False

T

Cross-sectional data are usually data gathered from approximately the same period of time from a population. a. True b. False

T

For a given probability of success p that is not too close to 0 or 1, the binomial distribution takes on more of a symmetric bell shape as the number of trials n increases. a. True b. False

T

For the multiple regression model Y = 40 +15X1 - 10X2 +5X3, if X2 were to increase by 5 units, holding X1 and X3 constant, the value of Y would be expected to decrease by 50 units. a. True b. False

T

Forecasting software packages typically report several summary measures of the forecasting error. The most important of these are MAE (mean absolute error), RMSE (root mean square error), and MAPE (mean absolute percentage error). a. True b. False

T

Given that events A and B are independent and that P(A) = 0.8 and P(B/A) = 0.4, then P(A and B) = 0.32. a. True b. False

T

In reference to the equation, Y = -0.70 + 0.10 X , the value 0.10 is the expected change in Y per unit change in X. a. True b. False

T

R2 can only increase when extra explanatory variables are added to a multiple regression model. a. True b. False

T

R^2 is the square of the correlation between the observed Y values and the fitted Y values. a. True b. False

T

Samples of exam scores for employees before and after a training class are examples of paired data. a. True b. False

T

Seasonal variations will not be present in a deseasonalized time series. a. True b. False

T

Side-by-side box plots allow you to quickly see how two or more categories of a numerical variable compare. a. True b. False

T

Side-by-side box plots are typically a good way to begin the analysis when comparing two populations. a. True b. False

T

Simple exponential smoothing is appropriate for a series without a pronounced trend or seasonality. a. True b. False

T

Simple random samples are samples in which every possible sample of size n from the population has the same probability of being chosen. a. True b. False

T

Statisticians often refer to the pivot tables that display counts as contingency tables or crosstabs. a. True b. False

T

Strongly related variables may have a correlation close to zero if the relationship is nonlinear. a. True b. False

T

Suppose A and B are mutually exclusive events where P(A) = 0.2 and P(B) = 0.5, then P(A or B) = 0.70. a. True b. False

T

Suppose that after graduation, you will either buy a new car (event A) or take a trip to Europe (event B). In this case, events A and B are mutually exclusive. a. True b. False

T

Systematic sampling is generally similar to simple random sampling in its statistical properties. a. True b. False

T

The binomial distribution deals with consecutive trials, each of which has two possible outcomes. a. True b. False

T

The binomial distribution is a discrete distribution that deals with a sequence of identical trials, each of which have only two possible outcomes. a. True b. False

T

The central limit theorem (CLT) says that as long as the sample size is reasonably large, there is about a 95% chance that the magnitude of the sampling error for the mean will be no more than two standard errors. a. True b. False

T

The central limit theorem (CLT) states that the sampling distribution of the mean is approximately normal, no matter what the distribution of the population, as long as the sample size is large enough. a. True b. False

T

The chi-square test for normality makes a comparison between the observed histogram and a histogram based on normality with the same mean and standard deviation of that of the sample data. a. True b. False

T

A confidence interval is an interval calculated from the population data, where we strongly believe the true value of the population parameter lies. a. True b. False

F

If a random series has too few runs, then it is zigzagging too often. a. True b. False

F

If a scatterplot of residuals shows a parabola shape, then a logarithmic transformation may be useful in obtaining a better fit. a. True b. FalseF

F

If events A and B have nonzero probabilities, then they can be both independent and mutually exclusive. a. True b. False

F

If the observations of a time series increase or decrease regularly through time, we say that the time series has a random (or noise) component. a. True b. False

F

If the span of a moving average is large - say, 12 months - then few observations go into each average, and extreme values have relatively large effect on the forecasts. a. True b. False

F

Subjective probability is the probability that a given event will occur, given that another event has already occurred. a. True b. False

F

Suppose A and B are mutually exclusive events where P(A) = 0.3 and P(B) = 0.4. Then, P(A and B) = 0.12. a. True b. False

F

Suppose A and B are two events where P(A) = 0.5, P(B) = 0.4, and P(A and B) = 0.2, then P(B/A) = 0.5. a. True b. False

F

In time series data, errors are often not probabilistically independent. a. True b. False

T

Which correlation coefficient suggests the strongest relationship? a. +1 b. -1 c. 0 d. +0.5

a. +1

In statistical analysis, the burden of proof lies traditionally with the a. alternative hypothesis. b. null hypothesis. c. analyst. d. p-value.

a. alternative hypothesis.

A linear trend means that the time series variable changes by a _____ each time period. a. constant amount b. constant percentage c. positive amount d. negative amount

a. constant amount

As the sample size increases, the t-distribution becomes more similar to the _____ distribution. a. normal b. exponential c. binomial d. chi-square

a. normal

The standard deviation of x bar is usually called the a. standard error of the mean. b. standard error of the sample. c. standard error of the population. d. randomized standard error.

a. standard error of the mean.

If events A and B are mutually exclusive, then the probability of both events occurring simultaneously is equal to a. 0.0. b. 0.5. c. 1.0. d. any value between 0.5 and 1.0.

a. 0.0

If two events are mutually exclusive and collectively exhaustive, what is the probability that both occur? a. 0.0 b. 0.5 c. 1.0 d. This can be any probability between 0 and 1.

a. 0.0

If two events are mutually exclusive, what is the probability that both occur at the same time? a. 0.0 b. 0.5 c. 1.0 d. This can be any probability between 0 and 1.

a. 0.0

In a simple linear regression analysis, the following sums of squares are produced: sum(Y-Yhat)^2 = 400, sum(Y-Yhat)^2 - 80, sum(y-yhat)^2 = 320 The proportion of the variation in Y that is explained by the variation in X is a. 20%. b. 80%. c. 25%. d. 50%.

b. 80%.

Which of the following is not one of the assumptions of regression? a. There is a population regression line. b. The explanatory variable is normally distributed. c. The response variable is normally distributed. d. The errors are probabilistically independent.

b. The explanatory variable is normally distributed.

A 90% confidence interval can be used to reject the null hypothesis of a two-sided test at the 10% significance level if and only if a. a 90% confidence interval includes the hypothesized value of the parameter. b. a 90% confidence interval does not include the hypothesized value of the parameter. c. the null hypothesis is less than 0.05. d. the null hypotheses includes sampling error is greater than 0.05.

b. a 90% confidence interval does not include the hypothesized value of the parameter.

In regression analysis, the variable we are trying to explain or predict is called the _____ variable. a. independent b. dependent c. regression d. statistical

b. dependent

When a portion of the sample does not respond to the survey, ____ has occurred. a. a measurement error b. nonresponse bias c. a sampling error d. systematic failure

b. nonresponse bias

In linear regression, we fit the least squares line to a set of values (or points on a scatterplot). The distance from the line to a point is called the a. fitted value. b. residual. c. correlation. d. covariance.

b. residual.

What type of test is used to determine if the population mean is equal to a hypothesized value? a. t test b. z test c. F test d. Chi-squared test

b. z test

According to the empirical rule, how many observations lie within +/- 1 standard deviation from the mean? a. 50% b. 68% c. 95% d. Almost all

b. 68%

A line or curve superimposed on a scatterplot to quantify an apparent relationship is known as a(n) a. average. b. trend line. c. slope. d. function.

b. trend line

If you can determine that the outlier is not really a member of the relevant population, then it is appropriate and probably best to _____ it. a. average b. reduce c. delete d. leave

c. delete

Regression analysis asks a. if there are differences between distinct populations. b. if the sample is representative of the population. c. how a single variable depends on other relevant variables. d. how several variables depend on each other.

c. how a single variable depends on other relevant variables.

Which of the following is the relevant sampling distribution for regression coefficients? a. Normal distribution b. t-distribution with n-1 degrees of freedom c. t-distribution with n-1-k degrees of freedom d. F-distribution with n-1-k degrees of freedom

c. t-distribution with n-1-k degrees of freedom

Measurement error occurs when a. a portion of the sample does not respond to the survey. b. the sample responses are not clear. c. the responses to question do not reflect what the investigator had in mind. d. the investigator does not correctly tally all responses.

c. the responses to question do not reflect what the investigator had in mind.

Correlation is a summary measure that indicates a. a curved relationship among the variables. b. the rate of change in Y for a one unit change in X. c. the strength of the linear relationship between pairs of variables. d. the magnitude of difference between two variables.

c. the strength of the linear relationship between pairs of variables.

A Poisson distribution is a. relevant when we sample from a population with only two types of members. b. relevant when we perform a series of independent, identical experiments with only two possible outcomes. c. usually relevant when we are interested in the number of events that occur over a given interval of time. d. the cornerstone of statistical theory.

c. usually relevant when we are interested in the number of events that occur over a given interval of time.

Given that Z is a standard normal variable, the value z for which P(Z z) = 0.2580 is a. 0.70. b. 0.758. c. -0.65. d. 0.242.

c. -0.65

If A and B are any two events with P(A) = 0.8 and P(B|A) = 0.4, then the joint probability of A and B is: a. 0.80 b. 0.40 c. 0.32 d. 1.20

c. 0.32

If A and B are mutually exclusive events with P(A) = 0.30 and P(B) = 0.40, then the probability that either A or B occur is a. 0.10. b. 0.12. c. 0.70. d. none of these choices.

c. 0.70

If we plot a continuous probability distribution f(x), the total probability under the curve is a. -1. b. 0. c. 1. d. 100.

c. 1

A sampling error is the result of a. measurement error. b. nonresponse bias. c. nontruthful responses. d. "unlucky" sampling.

d. "unlucky" sampling.

49. Which of the following values is not typically used for? a. 0.01 b. 0.05 c. 0.10 d. 0.50

d. 0.50

The variance of a binomial distribution for which n = 100 and p = 0.20 is a. 100. b. 80. c. 20. d. 16.

d. 16

Suppose that a simple exponential smoothing model is used (with α = 0.30) to forecast monthly sandwich sales at a local sandwich shop. After June's demand is observed at 1520 sandwiches, the forecasted demand for July is 1600 sandwiches. At the beginning of July, what would be the forecasted demand for August? a. 1520 b. 1544 c. 1550 d. 1600

d. 1600

Nontruthful responses is an example of a nonsampling error. a. True b. False

t

. If A and B are independent events with P(A) = 0.40 and P(B) = 0.50, then P(A/B) is 0.50. a. True b. False

F

Correlation and covariance can be used to examine relationships between numeric variables as well as for categorical variables that have been coded numerically. a. True b. False

F

The law of large numbers states that subjective probabilities can be estimated based on the long run relative frequencies of events. a. True b. False

F

The least squares line is the line that minimizes the sum of the residuals. a. True b. False

F

A population includes all elements or objects of interest in a study, whereas a sample is a subset of the population used to gain insights into the characteristics of the population. a. True b. False

T

A probability sample is a sample in which the sampling units are chosen from the population by means of a random mechanism such as a random number table. a. True b. False

T

In order to test the significance of a multiple regression model involving 4 explanatory variables and 40 observations, the numerator and denominator degrees of freedom for the critical value of F are 4 and 35, respectively. a. True b. False

T

In regression analysis, homoscedasticity refers to constant error variance. a. True b. False

T

The Durbin-Watson statistic can be used to test for autocorrelation. a. True b. False

T

The density function specifies the probability distribution of a continuous random variable.​ a. True b. False

T

The value of the standard error of the difference between sample proportions depends on the sample proportion from each of the two populations as well as the sample size selected from each population. a. True b. False

T

The variance of a binomial distribution for which n = 50 and p = 0.20 is 8.0. a. True b. False

T

In multiple regression, the coefficients reflect the expected change in _____ by one unit. a. Y when the associated X value increases b. X when the associated Y value increases c. Y when the associated X value decreases d. X when the associated Y value decreases

a. Y when the associated X value increases

Which of the following statements are correct? a. A point estimate is an estimate of the range of a population parameter. b. A point estimate is a single value estimate of the value of a population parameter. c. A point estimate is an unbiased estimator if its standard deviation is the same as the actual value of the population standard deviation. d. All of these choices are correct.

b. A point estimate is a single value estimate of the value of a population parameter.

Which Excel® function allows you to count using more than one criterion? a. COUNTIF b. COUNTIFS c. HLOOKUP d. VLOOKUP

b. COUNTIFS

One characteristic of "paired variables" is that a. one variable is a negative value and the other is a positive value. b. both variables are positive values. c. each variable has the same number of observations. d. each variable has a different number of observations.

c. each variable has the same number of observations.

A list of all members of the population is called a a. sampling unit. b. probability sample. c. frame. d. relevant population

c. frame.

The t-distribution for developing a confidence interval for a mean has degrees of freedom equal to a. n + 2. b. n +1. c. n. d. n - 1.

d. n - 1.

The linear trend Y = 120 + 2t was estimated using a time series with 20 time periods. The forecasted value for time period 21 is a. 120. b. 122. c. 160. d. 162.

d. 162

. Expressed in percentiles, the interquartile range is the difference between the _____ percentiles. a. 10th and 60th b. 15th and 65th c. 20th and 70th d. 25th and 75th

d. 25th and 75th

The regression line has been fitted to the data points (28, 60), (20, 50), (10, 18), and (25, 55). The sum of the squared residuals will be a. 20.25. b. 16.00. c. 49.00. d. 94.25.

d. 94.25

Residuals separated by one period that are autocorrelated indicate _____ autocorrelation. a. simple b. redundant c. time 1 d. lag 1

d. lag 1

The daily closing values of the Dow Jones Industrial Average over a period of 30 days are best described as _____ data. a. cross-sectional b. discrete c. time-series d. nominal

time-series

If the number of observations in a single-variable data set is even, the median is the a. average of the two middle observations. b. difference between the two middle observations. c. most frequent observation. d. difference between the highest and smallest observation.

a. average of the two middle observations

A distribution with a high kurtosis has almost all of its observations within three standard deviations of the mean. a. True b. False

F

A random variable X is standardized when each value of X has the mean of X subtracted from it, and the difference is divided by the standard deviation of X. a. True b. False

T

A random variable is a function that associates a numerical value with each possible outcome of a random phenomenon. a. True b. False

T

A data set is typically a rectangular array of data, with observations in columns and variables in rows. a. True b. False

F

. The filters field of a pivot table contains the data that you want summarized. a. True b. False

F

A Poisson distribution is appropriate to determine the probability of a given number of defective items in a shipment. a. True b. False

F

A Type II error is committed when we incorrectly accept an alternative hypothesis that is false. a. True b. False

F

A test for independence is applied to a contingency table with 4 rows and 4 columns. The degrees of freedom for this chi-square test is 9. a. True b. False

T

A test with a 0.05 significance level has a larger rejection region than a test with a 0.01 significance level. a. True b. False

T

A probability tree is a graphical representation of how events occur through time, which is useful for calculating probabilities. a. True b. False

T

A random variable X is normally distributed with a mean of 175 and a standard deviation of 50. Given that X = 150, its corresponding Z- score is -0.50. a. True b. False

T

. As a graphical tool, the histogram is ideal for showing whether the distribution of a numerical variable is symmetric or skewed. a. True b. False

T

. Both ordinal and nominal variables are categorical. a. True b. False

T

A 90% confidence interval estimate for a population mean is determined to be 72.8 to 79.6. If the confidence level is reduced to 80%, the confidence interval for becomes narrower. a. True b. False

T

A Type I error probability is represented by alpha; it is the probability of incorrectly rejecting a null hypothesis that is true. a. True b. False

T

A binomial distribution with n number of trials, and probability of success p can be approximated well by a normal distribution with mean np and variance if np > 5 and n(1-p) > 5. a. True b. False

T

A chi-square goodness-of-fit test can be used to test for normality. a. True b. False

T

A confidence interval constructed around a point prediction from a regression model is called a prediction interval, because the actual point being estimated is not a population parameter. a. True b. False

T

A confidence interval is an interval that, with a stated level of confidence, captures a population parameter. a. True b. False

T

A list of all members of the population from which we can choose a sample is called a frame, and the potential sample members are called sampling units. a. True b. False

T

A moving average is the average of the observations in the past few periods, where the number of terms in the average is the span. a. True b. False

T

A multiple regression model involves 40 observations and 4 explanatory variables produces SST = 1000 and SSR = 804. The value of MSE is 5.6. a. True b. False

T

A negative relationship between an explanatory variable X and a response variable Y means that as X increases, Y decreases, and vice versa. a. True b. False

T

A test with a 0.10 significance level has a larger rejection region than a test with a 0.05 significance level. a. True b. False

T

A teacher who is trying to determine if the evidence supports the fact that a new method of teaching economics is more effective than a traditional one will conduct a _____ test. a. one-tailed b. two-tailed c. multi-tailed d. paired

a. one-tailed

We assume that the outcomes of successive trials in a binomial experiment are a. probabilistically independent. b. probabilistically dependent. c. identical from trial to trial. d. random number between 0 and 1.

a. probabilistically independent.

A sample in which the sampling units are chosen from the population by means of a random mechanism is a _____ sample. a. probability b. judgmental c. convenience d. voluntary response

a. probability

When using exponential smoothing, a smoothing constant must be used. The value for a. ranges between 0 and 1. b. ranges between -1 and +1. c. equals the largest observed value in the series. d. represents the strength of the association between the forecasted and observed values.

a. ranges between 0 and 1.

The percentage of variation (R2) ranges from a. 0 to +1. b. -1 to +1. c. -2 to +2. d. -1 to 0.

a. 0 to +1.

Which of the following statements are true of the null and alternative hypotheses? a. Exactly one hypothesis must be true. b. Both hypotheses must be true. c. It is possible for both hypotheses to be true. d. It is possible for neither hypothesis to be true.

a. Exactly one hypothesis must be true.

Which definition best describes parsimony? a. Explaining the most with the least b. Explaining the least with the most c. Being able to explain all of the change in the response variable d. Being able to predict the value of the response variable far into the future

a. Explaining the most with the least

What is an example of a problem in which the sample data is likely to be paired? a. The difference between the means of appraised and sales house prices b. The difference between the proportion of defective items from two suppliers c. The difference in the mean life of two major brands of batteries d. The difference in the mean salaries for graduates in two different academic fields at a university

a. The difference between the means of appraised and sales house prices

The theorem that states that the sampling distribution of the sample mean is approximately normal when the sample size n is reasonably large is known as the _____ theorem. a. central limit b. central tendency c. simple random sample d. point estimate

a. central limit

The Poisson random variable is a a. discrete random variable with infinitely many possible values. b. discrete random variable with finite number of possible values. c. continuous random variable with infinitely many possible values. d. continuous random variable with finite number of possible values.

a. discrete random variable with infinitely many possible values.

The defining property of a simple random sample is that a. every possible sample of a particular size has the same chance of being chosen. b. it is the easiest method to access samples that are chosen. c. it requires the fewest samples necessary for statistical significance. d. every kth subject is chosen as a sample.

a. every possible sample of a particular size has the same chance of being chosen.

A scatterplot that exhibits a "fan" shape (the variation of Y increases as X increases) is an example of a. homoscedasticity. b. heteroscedasticity. c. autocorrelation. d. multicollinearity.

a. homoscedasticity

In regression analysis, the variables used to help explain or predict the response variable are called the _____ variables. a. independent b. dependent c. regression d. statistical

a. independent

A discrete probability distribution a. is a set of possible values and a corresponding set of probabilities that sum to 1. b. is a modeling tool that can be used to incorporate uncertainty into models. c. can be estimated from long-run proportions. d. is the distribution of a single random variable.

a. is a set of possible values and a corresponding set of probabilities that sum to 1.

In multiple regression, the constant a. is the expected value of the dependent variable Y when all of the independent variables have the value zero. b. is necessary to fit the multiple regression line to set of points. c. must be adjusted for the number of independent variables d. is all of these options.

a. is the expected value of the dependent variable Y when all of the independent variables have the value zero.

The opportunity for sampling error is decreased by a. larger sample sizes. b. smaller sample sizes. c. affluent samples. d. interviewer selected samples.

a. larger sample sizes.

A sample result is considered "convincing" if the p-value is a. less than 0.01. b. between 0.01 and 0.05. c. between 0.05 and 0.10. d. greater than 0.10.

a. less than 0.01.

Perhaps the simplest and one of the most frequently used extrapolation methods is the a. moving average. b. linear trend. c. exponential trend. d. causal model.

a. moving average.

A type II error occurs when the a. null hypothesis is incorrectly accepted when it is false. b. null hypothesis is incorrectly rejected when it is true. c. sample mean differs from the population mean. d. test procedure itself is fundamentally biased.

a. null hypothesis is incorrectly accepted when it is false.

When the error variance is nonconstant, it is common to see the variation increases as the explanatory variable increases (you will see a "fan shape" in the scatterplot). There are two ways you can deal with this phenomenon. These are a. the weighted least squares and a logarithmic transformation. b. the partial F and a logarithmic transformation. c. the weighted least squares and the partial F. d. stepwise regression and the partial F.

a. the weighted least squares and a logarithmic transformation.

A "fan" shape in a scatterplot indicates a. unequal variance. b. a nonlinear relationship. c. the absence of outliers. d. sampling error.

a. unequal variance.

When using exponential smoothing, if you want the forecast to react quickly to movements in the series, you should choose a. values of near 1. b. values of near 0. c. values of midway between 0 and 1. d. the values based on the particular data set.

a. values of near 1.

The results of tossing a coin can be portrayed in a(n) _____ distribution. a. binomial b. normal c. exponential d. Poisson

a. binomial

The mean of the sampling distribution of xbar always equals: a. the population mean. b. mu/ n. c. the population standard deviation . d. omega/ n.

a. the population mean

Which of the following is not one of the commonly used summary measures for forecast errors? a. MAE (mean absolute error) b. MFE (mean forecast error) c. RMSE (root mean square error) d. MAPE (mean absolute percentage error)

b. MFE (mean forecast error)

Which statement is true of proportional sample sizes? a. The proportion of a stratum in the sample is independent of the proportion of that stratum in the population. b. The proportion of a stratum in the sample is the same as the proportion of that stratum in the population. c. The proportion of a stratum in the sample is greater than the proportion of that stratum in the population. d. The proportion of a stratum in the sample is less than the proportion of that stratum in the population.

b. The proportion of a stratum in the sample is the same as the proportion of that stratum in the population.

Given the least squares regression line, , which statement is true? a. The relationship between X and Y is positive. b. The relationship between X and Y is negative. c. As X increases, so does Y. d. As X decreases, so does Y.

b. The relationship between X and Y is negative.

Generally speaking, the two types of statistical inference are a. sample estimation and population estimation. b. confidence interval estimation and hypothesis testing. c. the interval estimation for a mean and the point estimation for a proportion. d. independent sample estimation and dependent sample estimation.

b. confidence interval estimation and hypothesis testing.

The primary reason for standardizing random variables is to measure variables with a. different means and standard deviations on a non-standard scale. b. different means and standard deviations on a single scale. c. dissimilar means and standard deviations in like terms. d. similar means and standard deviations on two scales.

b. different means and standard deviations on a single scale.

In regression analysis, multicollinearity refers to the a. response variables being highly correlated. b. explanatory variables being highly correlated. c. response variable(s) and the explanatory variable(s) being highly correlated with one another. d. response variables being highly correlated over time.

b. explanatory variables being highly correlated.

The probability of being chosen in a simple random sample of size n from a population of size N is a. 1/N. b. N - 1/n. c. N/n. d. n/N.

d. n/N.

When determining whether to include or exclude a variable in regression analysis, if the p-value associated with the variable's t-value is above some accepted significance value, such as 0.05, then the variable a. is a candidate for inclusion. b. is a candidate for exclusion. c. is redundant. d. does not fit the guidelines of parsimony.

b. is a candidate for exclusion.

In a random walk model, the series itself a. is random. b. is not random but its differences are random. c. and its differences are random. d. and its differences are not random.

b. is not random but its differences are random.

Forecasting models can be divided into three groups. They are _____ methods. a. time series, optimization, and simulation b. judgmental, extrapolation, and econometric c. judgmental, random, and linear d. linear, non-linear, and extrapolation

b. judgmental, extrapolation, and econometric

In regression analysis, if there are several explanatory variables, it is called _____ regression. a. simple b. multiple c. compound d. nonlinear

b. multiple

A test for a population mean has _____ degrees of freedom. a. n + 1 b. n - 1 c. n d. n - 2

b. n - 1

A small p-value in the rune test provides evidence of a. randomness. b. nonrandomness. c. nonnormality. d. heteroscedasticity.

b. nonrandomness.

A type I error occurs when the a. null hypothesis is incorrectly accepted when it is false. b. null hypothesis is incorrectly rejected when it is true. c. sample mean differs from the population mean. d. test procedure is fundamentally biased.

b. null hypothesis is incorrectly rejected when it is true.

The sampling mean is the ____ estimate for the population mean. a. random b. point c. simple d. interval

b. point

An error term represents the vertical distance from any point to the a. estimated regression line. b. population regression line. c. value of the Y's. d. mean value of the X's.

b. population regression line

Suppose you forecast the values of all of the independent variables and insert them into a multiple regression equation and obtain a point prediction for the dependent variable. You could then use the standard error of the estimate to obtain an approximate a. confidence interval. b. prediction interval. c. hypothesis test. d. independence test.

b. prediction interval.

Models such as moving averages, exponential smoothing, and linear trend use only a. future values of Y to forecast previous values of Y. b. previous values of Y to forecast future values of Y. c. multiple explanatory variables (not just values of Y) to forecast future values of Y. d. ratio-to-moving-average methods.

b. previous values of Y to forecast future values of Y.

When using Holt's model, choosing values of the smoothing constant that are near 1 will result in forecast models that react very a. quickly to changes in the level. b. quickly to changes in the trend. c. quickly to changes in the level and the trend. d. slowly to changes in the level and the trend.

b. quickly to changes in the trend.

The power of a test is the probability that we _____ the null hypothesis when the alternative hypothesis is _____. a. reject. false b. reject. true c. accept, false d. accept, true

b. reject. true

Extrapolation methods attempt to a. use non-quantitative methods to predict future values. b. search for patterns in the data and then use those to predict future values. c. find variables that are correlated with the data being predicted. d. predict the next period's value by using the latest period's value.

b. search for patterns in the data and then use those to predict future values.

The key to using stratified sampling is a. identifying the strata. b. selecting the appropriate strata. c. defining the strata. d. randomizing the strata.

b. selecting the appropriate strata.

Which equation shows the process of standardizing? a. E(x) - np b. Z = (x - mean)/std dev c. f(x) = 1 - (mean/std dev) d. B(Y) = mean

b. Z = (x - mean)/std dev

The hypothesis that an analyst is trying to prove is called the _____ hypothesis. a. elective b. alternative c. optional d. null

b. alternative

The mean of a probability distribution is a measure of a. variability of the distribution. b. central location. c. the shape of the distribution. d. skewness of the distribution.

b. central location

The formal way to revise probabilities based on new information is to use _____ probabilities. a. complementary b. conditional c. unilateral d. common sense

b. conditional

In contrast to linear trend, an exponential trend is appropriate when the time series changes by a _____ each time period. a. constant amount b. constant percentage c. positive amount d. negative amount

b. constant percentage

The library is interested in estimating the number of individuals who use the computers during the lunch hour. Which probability distribution should they use? a. Binomial distribution b. Poisson distribution c. Normal distribution d. Uniform distribution

b. poisson distribution

The approximate standard error of the point estimate of the population total is a. omega/sqrt(n) b. s/sqrt(n) c. Nomega/sqrt(n) d. Ns/sqrt(n)

d. Ns/sqrt(n)

The data below represents sales for a particular product. If you were to use the moving average method with a span of 4 periods, what would be your forecast for period 5? Period Sales (in units) 1 90 2 120 3 110 4 100 a. 90 b. 100 c. 105 d. 110

c. 105

If X is a normal random variable with a standard deviation of 10, then 3X has a standard deviation equal to a. 10. b. 13. c. 30. d. 90.

c. 30

According to the empirical rule, how many observations lie within +/- 2 standard deviation from the mean? a. 50% b. 68% c. 95% d. Almost all

c. 95%

Which of the following is not one of the assumptions of regression? a. There is a population regression line that joins the means of the dependent variable for all values of the explanatory variables. b. The response variable is normally distributed. c. The standard deviation of the response variable increases as the explanatory variables increase. d. The errors are probabilistically independent.

c. The standard deviation of the response variable increases as the explanatory variables increase.

If A and B are mutually exclusive events with P(A) = 0.70, then P(B) a. can be any value between 0 and 1. b. can be any value between 0 and 0.70. c. cannot be larger than 0.30. d. can be any value between 0.30 and 0.70.

c. cannot be larger than 0.30

Econometric models can also be called _____ models. a. judgmental b. time series c. causal d. environmetric

c. casual

Gender and states of residence are examples of ____ data. a. discrete b. continuous c. categorical d. ordinal

c. categorical

Displaying all correlations between 0.6 and 0.999 on a scatterplot as green and all correlations between -1.0 and -0.6 as red is known as _____ formatting. a. rank-order b. categorical c. conditional d. numerical

c. conditional

Which distribution is best-suited to measure the length of time between arrivals at a grocery checkout counter? a. Uniform b. Normal c. Exponential d. Poisson

c. exponential

There is evidence that the regression equation provides little explanatory power when the F-ratio a. is large. b. equals the regression coefficient. c. is small. d. is the constant.

c. is small

A correlation value of zero indicates _____ relationship. a. a strong linear b. a weak linear c. no linear d. a perfect linear

c. no linear

In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the _____ of two variables. a. sum b. ratio c. product d. mean

c. product

The alternative hypothesis is also known as the _____ hypothesis. a. elective b. optional c. research d. null

c. research

P (A) = 1 - P(A) is the a. addition rule. b. commutative rule. c. rule of complements. d. rule of opposites.

c. rule of complements

Which term refers to a consecutive series of observations that remain on one side of the base level? a. Outlier b. Random walk c. Run d. Variance

c. run

Which of the following values is a common significance level? a. 0.50 b. 0.40 c. 0.30 d. 0.05

d. 0.05

Which of the following is an example of a nonlinear regression model? a. A quadratic regression equation b. A logarithmic regression equation c. Constant elasticity equation d. All of these choices

d. All of these choices

Which of the following are reasons for why simple random sampling is used infrequently in real applications? a. Samples can be spread over a large geographic region. b. Simple random sampling requires that all sampling units be identified prior to sampling. c. Simple random sampling can result in underrepresentation or overrepresentation of certain segments of the population. d. All of these choices are valid reasons.

d. All of these choices are valid reasons.

In regression analysis, which of the following causal relationships are possible? a. X causes Y to vary. b. Y causes X to vary. c. Other variables cause both X and Y to vary. d. All of these options are possible.

d. All of these options are possible.

Which sign is possible in an alternative hypothesis? a. > b. < c. ≠ d. All of these signs are possible

d. All of these signs are possible

Which of the following statements correctly describe estimation? a. It is the process of inferring the values of known population parameters from those of unknown sample statistics. b. It is the process of inferring the values of unknown sample statistics from those of known population parameters. c. It is the process of inferring the values of known sample statistics from those of unknown population parameters. d. It is the process of inferring the values of unknown population parameters from those of known sample statistics.

d. It is the process of inferring the values of unknown population parameters from those of known sample statistics.

Suppose you run a regression of a person's height on his/her right and left foot sizes, and you suspect that there may be multicollinearity between the foot sizes. What types of problems might you see if your suspicions are true? a. "Wrong" values for the coefficients for the left and right foot size b. Large p-values for the coefficients for the left and right foot size c. Small t-values for the coefficients for the left and right foot size d. Large t-values for the coefficients for the left and right foot size

d. Large t-values for the coefficients for the left and right foot size

Which summary measure for forecast errors does not depend on the units of the forecast variable? a. MAE (mean absolute error) b. MFE (mean forecast error) c. RMSE (root mean square error) d. MAPE (mean absolute percentage error)

d. MAPE (mean absolute percentage error)

_____ is/are especially helpful in identifying outliers. a. Linear regression b. Regression analysis c. Normal curves d. Scatterplots

d. Scatterplots

Of type I and type II error, which is more serious? a. Type I is considered to be more serious. b. Type II is considered to be more serious. c. Type I and Type II are equally serious. d. The situation determines which error is more serious. Sometimes type I errors are more serious and sometimes type II errors are more serious.

d. The situation determines which error is more serious. Sometimes type I errors are more serious and sometimes type II errors are more serious.

. If P(A) = 0.25 and P(B) = 0.65, then P(A and B) is a. 0.25. b. 0.40. c. 0.90. d. This cannot be determined from the information given.

d. This cannot be determined from the information given.

What is not one of the guidelines for including/excluding variables in a regression equation? a. Look at the t-value and associated p-value. b. Check whether the t-value is less than or greater than 1.0. c. The variables are logically related to one another. d. Use economic or physical theory to make the decision.

d. Use economic or physical theory to make the decision.

A researcher can check whether the errors are normally distributed by using a. a t-test or an F-test. b. the Durbin-Watson statistic. c. a frequency distribution or the value of the regression coefficient. d. a histogram or a Q-Q plot.

d. a histogram or a Q-Q plot.

Examples of non-random patterns that may be evident on a time series graph include a. trends. b. increasing variance over time. c. a meandering pattern. d. all of these choices.

d. all of these choices.

Potential sample members, called sampling units, may be a. people. b. companies. c. households. d. all of these choices.

d. all of these choices.

The central limit theorem (CLT) is generally valid for a. n > 5. b. n > 10. c. n > 30. d. any size n.

d. any size n.

The normal distribution is a a. discrete distribution with two parameters. b. binomial distribution with only one parameter. c. density function of a discrete random variable. d. continuous distribution with two parameters.

d. continuous distribution with two parameters.

The weakness of scatterplots is that they a. do not help identify linear relationships. b. can be misleading about the types of relationships they indicate. c. only help identify outliers. d. do not actually quantify the relationships between variables.

d. do not actually quantify the relationships between variables.

The ANOVA table splits the total variation into two parts. They are the _____ variation. a. acceptable and unacceptable b. adequate and inadequate c. resolved and unresolved d. explained and unexplained

d. explained and unexplained

A single variable X can explain a large percentage of the variation in some other variable Y when the two variables are a. mutually exclusive. b. inversely related. c. directly related. d. highly correlated.

d. highly correlated.

The p-value of a sample is the probability of seeing a sample with at _____ hypothesis as the sample actually observed. a. most as much evidence in favor of the null b. most as much evidence in favor of the alternative c. least as much evidence in favor of the null d. least as much evidence in favor of the alternative

d. least as much evidence in favor of the alternative

The chi-square distribution for developing a confidence interval for a standard deviation has degrees of freedom equal to a. n + 2. b. n +1. c. n. d. n - 1.

d. n - 1.

The value k in the number of degrees of freedom, n-k-1, for the sampling distribution of the regression coefficients represents the a. sample size. b. population size. c. number of coefficients in the regression equation, including the constant. d. number of independent variables included in the equation.

d. number of independent variables included in the equation.

The central limit theorem (CLT) is considered to be an important result in statistics because a. the CLT allows us to assume that the population distribution is approximately normal, provided n is reasonably large. b. the CLT allows us to estimate the population mean without knowing the exact form of the population distribution, provided n is reasonably large. c. the CLT allows us to construct confidence intervals for the population mean without knowing the exact form of the population distribution, provided n is reasonably large. d. of all of these choices.

d. of all of these choices.

The form of the alternative hypothesis can be a. one-tailed only. b. two-tailed only. c. neither one nor two-tailed. d. one-tailed or two-tailed.

d. one-tailed or two-tailed.

An informal test for normality that utilizes a scatterplot and looks for clustering around a 45° line is known as a(n) a. Lilliefors test. b. empirical cumulative distribution function. c. p-test. d. quantile-quantile plot.

d. quantile-quantile plot.

. If the correlation of variables is close to 0, then we expect to see a(n) _____ on the scatterplot. a. upward sloping cluster of points b. downward sloping cluster of points c. cluster of points around a trendline d. random scatter of points with no apparent relationship

d. random scatter of points with no apparent relationship

The random walk model is written as: Yt = Yt-1 + m + e . In this model, e represents the a. average of the Y's. b. average of the X's. c. forecasted value. d. random series with mean 0 and some constant standard deviation.

d. random series with mean 0 and some constant standard deviation.

The null hypothesis usually represents the a. theory the researcher would like to prove. b. preconceived ideas of the researcher. c. perceptions of the sample population. d. status quo of the situation being studied.

d. status quo of the situation being studied.

Determining which variables to include in regression analysis by estimating a series of regression equations by successively adding or deleting variables according to prescribed rules is referred to as _____ regression. a. elimination b. forward c. backward d. stepwise

d. stepwise

Two independent samples of sizes 20 and 25 are randomly selected from two normal populations with equal variances. In order to test the difference between the population means, the test statistic is a. a standard normal random variable. b. an approximately standard normal random variable. c. t-distributed with 45 degrees of freedom. d. t-distributed with 43 degrees of freedom.

d. t-distributed with 43 degrees of freedom.

Which of the following is not one of the techniques that can be used to identify whether a time series is truly random? a. a graph (plot the data) b. the runs test c. a control chart d. the autocorrelations (or a correlogram)

d. the autocorrelations (or a correlogram)

The adjusted R2 adjusts R2 for a. non-linearity. b. outliers. c. low correlation. d. the number of explanatory variables in a multiple regression model.

d. the number of explanatory variables in a multiple regression model.

Sampling error is evident when a. a question is poorly worded and results in bias. b. the sample is too small. c. the sample is not random. d. the sample mean differs from the population mean.

d. the sample mean differs from the population mean.

The term autocorrelation refers to a. the analyzed data refers to itself. b. the sample is related too closely to the population. c. the data are in a loop (values repeat themselves). d. time series variables are usually related to their own past values.

d. time series variables are usually related to their own past values.

The term autocorrelation refers to the observation that a. analyzed data refers to itself. b. sample is related too closely to the population. c. data are in a loop (values repeat themselves). d. time series variables are usually related to their own past values.

d. time series variables are usually related to their own past values.

The standard normal distribution has a mean and a standard deviation respectively equal to a. 0 and 0. b. 1 and 1. c. 1 and 0. d. 0 and 1.

d. 0 and 1

If the random variable X is exponentially distributed with parameter = 3, then P(X 2) , up to 4 decimal places, is a. 0.3333. b. 0.5000. c. 0.6667. d. 0.0025.

d. 0.0025

Given that the random variable X is normally distributed with a mean of 80 and a standard deviation of 10, P(85 X 90) is a. 0.5328. b. 0.3413. c. 0.1915. d. 0.1498.

d. 0.1498

If the mean of an exponential distribution is 2, then the value of the parameter is a. 4. b. 2. c. 1. d. 0.5.

d. 0.5

When you calculate the sample size for a proportion, you use an estimate for the population proportion; namely. A conservative value for n can be obtained by using P= a. 0.01. b. 0.05. c. 0.10. d. 0.50.

d. 0.50

The probabilities shown in a table with two rows, A1 and A1 and two columns, , B1 and B2 are as follows: P( A1 and B1 ) = 0.10, P( A2 and B2 ) = 0.30, P(A2 and B1 ) = 0.05, and P(A2 and B2 ) = 0.55. Then P(A2 I B1), calculated up to two decimals, is a. 0.33. b. 0.35. c. 0.65. d. 0.67

d. 0.67

A multiple regression analysis including 50 data points and 5 independent variables results in 40. The multiple standard error of estimate will be a. 0.901. b. 0.888. c. 0.800. d. 0.953.

d. 0.953

The data below represents sales for a particular product. If you were to use the moving average method with a span of 3 periods, what would be your forecast for period 5? Period Sales (in units) 1 90 2 120 3 110 4 100 a. 90 b. 100 c. 105 d. 110

d. 110

The number of degrees of freedom needed to construct 90% confidence interval for the difference between means when the data are gathered from paired samples, with 15 observations in each sample, is a. 30. b. 15. c. 28. d. 14.

d. 14

We study relationships among numerical variables using a. pie charts. b. counts. c. scatterplot charts. d. percentages.

d. percentages

A variable is classified as ordinal if a. there is a natural ordering of categories. b. the data is randomly selected. c. the data arise from continuous measurements. d. we track the variable through a period of time.

a. there is a natural ordering of categories

. Which of the following characteristics can be used to describe the skewness of a distribution? a. The mean b. Kurtosis c. The median d. The standard deviation

b. Kurtosis

Coding males as 1 and females as 0 in a data set illustrates the use of _____ variables. a. nominal b. dummy c. numerical d. ordinal

b. dummy

The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The average score of the 10 female students in the class is _____ the males. a. the same as b. greater than c. significantly less than d. little less than

b. greater than

What is the most common type of chart for showing the distribution of a numerical variable? a. Column chart b. Histogram c. Two-way table d. Pie chart

b. histogram

In a box plot, the vertical line inside the box indicates the location of the a. mean. b. median. c. mode. d. standard deviation.

b. median

The interquartile range (IQR) encompasses what percent of the observations? a. Lower 25% b. Middle 50% c. Upper 75% d. Upper 90%

b. middle 50 %

Researchers may try to gain insight into the characteristics of a population by examining a(n) _____ from the population. a. model b. sample c. exemplar d. replica

b. sample

A useful way of comparing the distribution of a numerical variable across categories of some categorical variable is with a. a side-by-side box plot. b. a side-by-side pivot table. c. a side-by-side plot or side-by-side pivot table. d. neither a side-by-side box plot nor side-by-side pivot table.

c. a side-by-side plot or side-by-side pivot table.

Tables used to display counts of a categorical variable are called a. crosstabs. b. contingency tables. c. either crosstabs or contingency tables. d. neither crosstabs nor contingency tables.

c. either crosstabs or contingency tables.

The limitation of covariance as a descriptive measure of association is that it a. only captures positive relationships. b. does not capture the units of the variables. c. is very sensitive to the units of the variables. d. is invalid if one of the variables is categorical.

c. is very sensitive to the units of the variables.

Correlation and covariance measure the a. strength of a linear relationship between two numerical variables. b. direction of a linear relationship between two numerical variables. c. strength and direction of a linear relationship between two numerical variables. d. strength and direction of a linear relationship between two categorical variables.

c. strength and direction of a linear relationship between two numerical variables.

Categorizing a numeric age variable as "young," "middle-aged," and "elderly" is an example of a. counting. b. ordering. c. quantifying. d. binning.

d. binning

Data that arise from counts are best described as _____ data. a. continuous b. nominal c. counted d. discrete

d. discrete

The mode is best described as the a. middle observation. b. same as the average. c. 50th percentile. d. most frequently occurring value.

d. most frequently occurring value

A sample, selected from a population, taken at one particular point in time is categorized as a. categorical. b. discrete. c. cross-sectional. d. time-series.

c. cross-sectional

. Relationships between two variables are less evident when counts are expressed as percentages of row totals or column totals. a. True b. False

F

. Data can be categorized as cross-sectional or time series. a. True b. False

T

The number of car insurance policy holders is an example of a discrete numerical variable. a. True b. False

T

The scatterplot is a graphical technique used to display the relationship between two numerical variables. a. True b. False

T

. We can use side-by-side boxplots to compare at most 2 distributions of numeric data. a. True b. False

F

Because they represent such extreme values, outliers should always be eliminated from statistical analyses. a. True b. False

F

Categorical variables can be classified as either discrete or continuous. a. True b. False

F

Correlation is affected by the measurement scales applied to the X and Y variables. a. True b. False

F

In an extremely right-skewed distribution, the mean is less than the median. a. True b. False

F

The cutoff for defining a large correlation is 0.5. a. True b. False

F

The median is one of the most frequently used measures of variability. a. True b. False

F

We cannot attempt to interpret correlations numerically, with the one possible exception of indicating whether they are positive or negative. a. True b. False

F

. A sample of 8 observations with a sample standard deviation of 2.50 has a sample variance of 17.50. a. True b. False

T

. Comparing a numerical variable across two or more subpopulations is known as a comparison problem. a. True b. False

T

. If the coefficient of correlation r = 0 .80, the standard deviations of X and Y are 20 and 25, respectively, then Cov(X, Y) must be 400. a. True b. False

T

. The difference between the largest and smallest values in a data set is called the range. a. True b. False

T

A distribution of a numerical variable with no skewness is said to be symmetric. a. True b. False

T

A histogram is based on binning the variable, which means putting the values of the numeric variable into discrete categories. a. True b. False

T

A histogram is used to display categorical data. a. True b. False

T

A variable (or field or attribute) is a characteristic of members of a population, whereas an observation (or case or record) is a list of all variable values for a single member of a population. a. True b. False

T

Age, height, and weight are examples of numerical data. a. True b. False

T

Correlation is a single-number summary of a scatterplot. a. True b. False

T

Counts for a categorical variable are often expressed as percentages of the total. a. True b. False

T

Cross-sectional data are data on a population at a distinct point in time, whereas time series data are data collected over time. a. True b. False

T

If the standard deviations of X and Y are 15.5 and 10.8, respectively, and the covariance of X and Y is 128.8, then the correlation coefficient is approximately 0.77. a. True b. False

T

The core purpose of time series graphs is to detect historical patterns in the data. a. True b. False

T

The median of a data set with 30 values would be the average of the 15th and the 16th values when the data values are arranged in ascending order. a. True b. False

T

. We can infer that there is a strong relationship between two numeric variables when the points on a scatterplot a. cluster tightly around a straight line. b. are randomly scattered in no clear pattern. c. display a positive relationship. d. display a negative relationship.

a. cluster tightly around a straight line.

To examine relationships between two categorical variables, we can use a. counts and corresponding charts of the counts. b. scatter plots. c. histograms. d. boxplots.

a. counts and corresponding charts of the counts.

. In order for the characteristics of a sample to be generalized to the entire population, the sample should be _____ the population. a. symbolic of b. opposite of c. representative of d. different from

c. representative of

The most common data format is a. long. b. short. c. stacked. d. unstacked.

c. stacked

Without performing any calculations, which of the following data sets has the greatest sample standard deviation? a. 1, 2, 3, 4, 5, 6 b. 1, 1, 3, 5, 5, 6 c. 3, 3, 3, 3, 3, 3 d. 1, 1, 1, 5, 5, 5 e. 1, 1, 3, 3, 6, 6

d. 1, 1, 1, 5, 5, 5

Examples of comparison problems include a. salary broken down by male and female subpopulations. b. cost of living broken down by region of a country. c. recovery rate for a disease broken down by patients who have taken a drug and patients who have taken a placebo. d. all of these choices

d. all of these choices.


Ensembles d'études connexes

Notgrass History Lesson 58 Unit 12

View Set

Module 11: Creating and managing deployment images

View Set

Unit 3: Supply and Demand - Unit 4: Module 3: Demand and Supply & Module 4: Elasticity

View Set

11.1 Uncertainties and errors in measurements and results

View Set

Chapter 10: Building an Organization Capable of Good Strategy Execution: People, Capabilities, and Structure

View Set

Combo Exams 1-4 + Last Lecture Cards

View Set

Delivering Project Benefits and Value

View Set

Ch 4-- Newton's 2nd law of motion

View Set