QMB 2 final exam Tarek
*matched samples
* are samples in which each data value of one sample is matched with a corresponding data value of the other sample. The d notation is a reminder that the matched sample provides difference data. Once the difference data are computed, the t distribution procedure for matched samples is the same as the one-population estimation and hypothesis testing.
PV
*A is a probability that provides a measure of the evidence against the null hypothesis provided by the sample. Smaller p-values indicate more evidence against.
larger, smaller
*A large value for s provides____ a margin of error, while a small value for s provides_____ a margin of error.
* n(1 - p) ≥ 5 and np ≥ 5.
*As a rule of thumb, the sampling distribution of the sample proportion can be approximated by a normal probability distribution when:
*stepwise regression
*Because of the nature of the ____________procedure, an independent variable can enter the model at one step, be removed at a subsequent step, and then enter the model at a later step.
*nonprobability sampling technique
*Convenience sampling is a:
Infinite population
*For a(n) _____ , it is impossible to construct a sampling frame
*actual value -forecast.
*Forecast error=
56
*How many simple random samples of size 5 can be selected from a population of size 8?
*reduce the standard error of the estimate
*In a recent Gallup Poll, the decision was made to increase the size of its random sample of voters from 1500 people to about 4000 people. The purpose of this increase is to:
*sampling distribution of a statistic
*It is the distribution of all of the statistics calculated from all possible samples of the same sample size:
*simple random sample
*Statisticians recommend selecting a probability sample when sampling from a finite population because a probability sample allows them to make valid statistical inferences about the population. A of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.
*sampling distribution
*The ______________ of the sample mean is the probability distribution of all possible values of the sample mean. The expected value or mean of the sampling distribution of the sample mean is equal to the mean of the population.
*backward elimination
*The ________________procedure begins with a model that includes all the independent variables. It then deletes one independent variable at a time using the same procedure as stepwise regression.
*moving averages
*The _____________method uses the average of the most recent k data values in the time series as the forecast for the next period. To use moving averages to forecast a time series, we must first select the order, or number of time series values, to be included in the moving average. If only the most recent values of the time series are considered relevant, a small value of k is preferred.
*estimated regression equation
*The _____________y^=b0+b1X can be used to estimate the mean value of y for a given value of x as well as to predict an individual value of y for a given value of x.
*error term
*The ____________∈ is a random variable with mean or expected value of zero; that is, E(∈)=0. The variance of ∈ is denoted by ⌠^2 and is the same for all values of the independent variables. The values of ∈ are independent. The error term ∈ is a normally distributed random variable reflecting the deviation between the y value and the expected value of y given by B0+B1X1+B2X2...
*regression equation.
*The equation that describes how the expected value of y, denoted E(y), is related to x is called the
*multiple coefficient of determination
*The implication is that the estimated multiple regression equation provides a better fit for the observed data. _______________indicates that we are measuring the goodness of fit for the estimated multiple regression equation
*level of significance
*The is the probability of making a Type I error when the null hypothesis is true as an equality. If the cost of making a Type I error is high, small values of a are preferred. If the cost of making a Type I error is not too high, larger values of a are typically used.
minimize the sum of the squares of the deviations between the observed values of the dependent variable yi and the predicted values of the dependent variable y^i.
*The least squares method uses the sample data to provide the values of b0 and b1 that
*a parameter.
*The medical director of a company looks at the medical records of all 50 employees and finds that the mean systolic blood pressure for these employees is 126.07. The value of 126.07 is:
When carrying out an F test to determine if the addition of extra predictor variables results in a significant reduction in the error sum of squares, what are the degrees of freedom of the numerator and denominator of the F statistic?
*The numerator degrees of freedom equals the number of predictors added to the model. The denominator degrees of freedom is n—p—1.
*judgement sampling
*The other nonprobability sampling technique?
1
*The parameters have exponents of
difference between the 2 xbars
*The point estimator of the difference between the two population means is the
normal distribution
*The point estimator of the difference between two population proportions is the difference between the sample proportions of two independent random samples. If the sample sizes are large enough that n1p1, n1(1-p1), n2p2, n2(1-p2) and are all ≥ to 5, the sampling distribution of p bar1 and p bar2 can be approximated by a
*standard error
*The standard deviation of a point estimator is called the
normal distribution
*The standard error of x bar1- x bar 2 is the standard deviation of the sampling distribution of x bar 1- x bar 2. If both populations have a normal distribution, or if the sample sizes are large enough that the central limit theorem enables us to conclude that the sampling distributions of x bar1 and x bar 2can be approximated by a
*sample statistic
*The value of the____________ used to estimate the value of the population parameter.
*F statistic (F test)
*This test is based on a determination of the amount of reduction in the error sum of squares resulting from adding one or more independent variables significance to the model.
*11, 36, 23, 56, and 92
*We wish to draw a sample of size 5 without replacement from a population of 50 households. Suppose the households were numbered 01 to 50. Using the following line from a random number table, the households selected would be: 1 1 3 6 2 3 5 6 9 2 9 6 2 3 7 9 0 8 4 2 4 6 8 4 3 6 2 7 1 9 6 4 0 4
*the error terms are not independent.
*When autocorrelation is present, one of the assumptions of the regression model is violated:
t distribution
*When s is used to estimate ⌠, the margin of error and the interval estimate for the population mean are based on a probability distribution known as the
*central limit theorem
*When the population from which we are selecting a random sample does not have a normal distribution, the is helpful in identifying the shape of the sampling distribution of sample mean. In selecting random samples of size n from a population, the sampling distribution of the sample mean can be approximated by a normal distribution as the sample size becomes large.
*For a lower tail test, the critical value serves as a
*benchmark for determining whether the value of the test statistic is small enough to reject the null hypothesis.
In general, higher confidence levels provide larger confidence intervals. One way to have high confidence and a small margin of error is to:
*increase the sample size
The effect of two independent variables acting together is called:
*interaction.
*Reject H0 when
*p-value ≥ a.
The regression model is a:
*second-order model with one predictor variable.
The standard deviation of a point estimator is called the:
*standard error
A probability sampling method in which we randomly select one of the first k elements and then select every kth element thereafter is:
*systematic sampling.
*Cluster sampling
, the elements in the population are first divided into separate groups called clusters. Probability sampling method.
*Systematic sampling
-a probability sampling method in which we randomly select one of the first k elements and then select every kth element thereafter.
The z value for a 99% confidence interval estimation is:
2.58
Residual plot
:Graphical representation of the residuals that can be used to determine whether the assumptions made about the regression model appear to be valid.
Standard error
:The standard deviation of a point estimator.
*Type I error
; that is, we reject H0 when it is true. However, if Ha is true, rejecting H0 is correct. However, if Ha is true, we make a Type II error; that is, we accept H0 when it is false.
Simple random sample
A ________________ of size n from a finite population of size N is a sample selected such that each possible n has the same probability of being selected.
*matched samples.
A company wants to identify which of the two production methods has the smaller completion time. One sample of workers is selected and each worker first uses one method and then uses the other method. The sampling procedure being used to collect completion time data is based on:
Cyclical pattern
A cyclical pattern exists if the time series plot shows an alternating sequence of points below and above the trend line lasting more than one year.
*observational study.
A doctor would like to determine if there is a difference between the blood pressure of people who walk every day for 60 minutes and those who walk one day per week for 60 minutes. Fifty of her patients who report that they have routinely walked 60 minutes every day for the past two years and 50 who report that they have walked 60 minutes only one day per week will be identified. The doctor will examine their medical records and collect their blood pressure readings over this two-year period. This is an example of a(n):
H0:u=12 ; Ha:u does not = 12
A fast food restaurant has automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, undoubtedly some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?
Moving average
A forecasting method that uses the average of the most recent k data values in the time series as the forecast for the next period.
Scatter diagram:
A graph of bivariate data in which the independent variable is on the horizontal axis and the dependent variable is on the vertical axis.
residual plot
A graph of the standardized residuals plotted against values of the normal scores that helps to determine whether the assumption that the error term has a normal probability distribution appears to be valid is called a:
Time series plot
A graphical presentation of the relationship between time and the time series variable. Time is shown on the horizontal axis and the time series values are shown on the vertical axis.
Horizontal pattern:
A horizontal pattern exists when the data fluctuate around a constant mean.
Two-tailed test
A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in either tail of its sampling distribution.
One-tailed test
A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in one tail of its sampling distribution.
Coefficient of determination:
A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.
Parameter
A numerical characteristic of a population, such as a population mean µ, a population standard deviation σ, a population proportion p, and so on.
Smoothing constant
A parameter of the exponential smoothing model that provides the weight given to the most recent time series value in the calculation of the forecast value.
Degrees of freedom
A parameter of the t distribution. When the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n − 1 degrees of freedom, where n is the size of the sample.
sampling distribution
A probability distribution consisting of all possible values of a sample statistic.
P-Value
A probability that provides a measure of the evidence against the null hypothesis provided by the sample. Smaller p-values indicate more evidence against H0. For a lower tail test, the p-value is the probability of obtaining a value for the test statistic as small as or smaller than that provided by the sample. For an upper tail test, the p-value is the probability of obtaining a value for the test statistic as large as or larger than that provided by the sample. For a two-tailed test, the p-value is the probability of obtaining a value for the test statistic at least as unlikely/more unlikely than that provided by the sample.
Unbiased
A property of a point estimator that is present when the expected value of the point estimator is equal to the population parameter it estimates.
*sample mean, sample proportion
A provides an estimate of a population mean, and a provides an estimate of a population proportion. With estimates such as these, some estimation error can be expected. It is important to realize that sample results provide only estimates of the values of the corresponding
F=2.31
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. Calculate the test statistic.
The numerator has 1 degree of freedom, and the denominator has 24 degrees of freedom.
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. What are the degrees of freedom of the numerator and denominator?
*matched sample design.
A researcher recruits 25 people to participate in a study on alcohol consumption and its interactions with Tylenol. The 25 participants had to come to a check-in center every day at 7:00 a.m. for one week. They were given various amounts of alcohol. Each day, each participant would flip a coin to determine if they also took Tylenol with their alcohol. They found that their BAC was 25% higher on days when they were given Tylenol with their alcohol than when they drank alcohol alone. This is an example of a(n):
*F = 8.02
A researcher would like to know whether or not the addition of three variables to a model will result in a significant reduction in the error sum of squares. She currently has a first-order model with two predictor variables based upon a sample of 25 observations. For this model, SSE = 725. Then, she estimated the relationship with a first-order model with three additional predictor variables x3 , x4 , and x5. The SSE for the new model is 320. Calculate the test statistic.
Sample statistic
A sample characteristic, such as a sample mean bar(x), a sample standard deviation s, a sample proportion bar(p), and so on. The value of the __________ is used to estimate the value of the corresponding population parameter.
*normal because of the central limit theorem
A sample of 92 observations is taken from an infinite population. The sampling distribution of x is approximately:
Seasonal pattern
A seasonal pattern exists if the time series plot exhibits a repeating pattern over successive periods. The successive periods are often one-year intervals, which is where the name seasonal pattern comes from.
Time series
A sequence of observations on a variable measured at successive points in time or over successive periods of time.
*the same probability of being selected.
A simple random sample of size n from an infinite population of size N is to be selected. Each possible sample should have:
*The population standard deviation must be 9 pushups.
A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. Which of the following statements is true?
H0:p=.5 ; Ha:p does not =.5
A student wants to determine if pennies are really fair, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. What are the appropriate null and alternative hypotheses?
Central limit theorem:
A theorem that enables one to use the normal probability distribution to approximate the sampling distribution of bar(x) whenever the sample size is large.
Critical value:
A value that is compared with the test statistic to determine whether H0 should be rejected
Type II error
Accepting H0 when it is false
prediction interval
All things held constant, which interval will be wider: a confidence interval or a prediction interval?
Confidence interval
Another name for an interval estimate:
*significance tests.
Applications of hypothesis testing that only control for the Type I error are called:
standard normal distribution
As the degrees of freedom increase, the t distribution approaches the
*becomes smaller.
As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution:
*decreases
As the sample size increases, the margin of error:
*standard error of the mean decreases.
As the sample size increases, the:
parameters
B1, B2 are
Treatment A.
Consider a completely randomized design involving three treatments: A, B, and C. Assume the dummy variables are assigned as follows: The multiple regression equation can be used to model the data. can be interpreted as the expected value of the response variable for those who were subjected to:
*autocorrelation.
Correlation in the errors that arises when the error terms at successive points in time are related is called:
*reduce the standard error of the mean.
Doubling the size of the sample will:
*nine times as large as the original sample size.
For a fixed confidence level and population standard deviation, if we would like to cut our margin of error to 1/3 of the original size, we should take a sample size that is:
must be larger
For a fixed sample size, n, in order to have a higher degree of confidence, the margin of error and the width of the interval:
*at least as small as that provided by the sample.
For a lower tail test, the p-value is the probability of obtaining a value for the test statistic:
*finite population
For a(n) _____ , it is impossible to construct a sampling frame.
*n-1
For the case where σ is unknown, the test statistic has a t distribution. How many degrees of freedom does it have?
we want the differences between the observed values and the predicted values to be small. SST=SSR+SSE, r^2=SSR/SST. Rxy=(sign of b1)Square root of r^2.
For the estimated regression line to provide a good fit to the data,
56
How many simple random samples of size 5 can be selected from a population of size 8?
*the regression model is not an adequate representation of the relationship between the variables.
If a residual plot of x versus the residuals, y - ŷ, shows a non-linear pattern, then we should conclude that:
*first-order autocorrelation is present.
If the value of y in time period t is related to its value in time period t - 1, we say that:
*can be approximated by a normal distribution.
If two large independent random samples are taken from two populations, the sampling distribution of the difference between the two sample means:
*alternative hypothesis should state p1-p2>0
If we are interested in testing whether the proportion of items in population 1 is larger than the proportion of items in population 2, then the:
0
In a multiple regression model, the error term ε is assumed to have a mean of:
*normally distributed.
In a multiple regression model, the values of the error term, ε, are assumed to be:
0
In a multiple regression model, the variance of the error term, ε, is assumed to be:
reduce the standard error of the estimate
In a recent Gallup Poll, the decision was made to increase the size of its random sample of voters from 1500 people to about 4000 people. The purpose of this increase is to:
the value of the correlation
In a regression analysis, an outlier will always increase:
n/N > .05
In computing the standard error of the mean, the finite population correction factor is used when:
*increases
In general, R2 always _____ as independent variables are added to the regression model.
*the null hypothesis.
In hypothesis testing, the tentative assumption about the population parameter is called:
*becomes narrower.
In interval estimation, as the sample size becomes larger, the interval estimate:
*the distribution of the populations becomes an important consideration.
In most applications of the interval estimation and hypothesis testing procedures, random samples with n1 ≥ 30 and n2 ≥ 30 are adequate. In cases where either or both sample sizes are less than 30:
*-2; 2
In multiple regression analysis, any observation with a standardized residual of less than _____ or greater than _____ is known as an outlier.
*can be used to accommodate curvilinear relationships between the independent variables and the dependent variable.
In multiple regression analysis, the general linear model:
*there can be several independent variables, but only one dependent variable
In multiple regression analysis:
*regression model.
In regression analysis, the equation in the form y = 𝛽0 + 𝛽1x + ε is called the:
*dependent variable
In regression analysis, the variable that is being predicted is the:
*t distribution with 70 degrees of freedom.
Independent simple random samples are taken to test the difference between the means of two populations whose variances are not known, but are assumed to be equal. The sample sizes are n1 = 32 and n2 = 40. The correct distribution to use is the: with how many df?
increase the value of the slope. increase the value of the correlation. increase the value of the y-intercept. *None of the above are correct.*
Influential observations always:
goof predictors
Looking at the sample correlation coefficients between the response variable and each of the independent variables can give us a quick indication of which independent variables are, by themselves,
*.7 for any two of the independent variables.
Multicollinearity can cause problems if the absolute value of the sample correlation coefficient exceeds:
pop mean
N= (Za/2)^2⌠^2/ E^2
*weighted average of Pbar1 and Pbar2
Regarding hypothesis tests about p1-p2 , the pooled estimate of P is a:
*independent samplez
Regarding inferences about the difference between two population means, the alternative to the matched sample design, as covered in the textbook, is:
Simple linear regression
Regression analysis involving one independent variable and one dependent variable in which the relationship between the variables is approximated by a straight line.
Type I error
Rejecting H0 when it is true
the residual plot against y^ and the residual plot against x provide the same information
Simple linear regression
*would not add much more explanatory power to the current model.
Suppose a high correlation existed between variables x 1 and x 2 . If variable x 1 was used as an independent variable, then variable x 2 :
*as the values of x get larger, the ability to predict y becomes less accurate.
Suppose a residual plot of x verses the residuals, y - ŷ, shows a nonconstant variance. In particular, as the values of x increase, suppose that the values of the residuals also increase. This means that:
n1+n2 -2 degrees of freedom.
Suppose we are constructing an interval estimate for the difference between the means of two populations when the standard deviations of the two populations are unknown. Suppose it can be assumed that the two populations have equal variances. If n1 is the size of sample 1 and n2 is the size of sample 2, we must use a t distribution with:
*round the calculated degrees of freedom down to the nearest integer.
Suppose we have a t distribution based upon two sample means with unknown population standard deviations, which we are unwilling to assume are equal. When we calculate the appropriate degrees of freedom, we should:
Mean absolute error (MAE)
The average of the absolute values of the forecast errors.
Mean squared error (MSE):
The average of the sum of squared forecast errors.
*cannot be negative
The coefficient of determination:
Confidence level
The confidence associated with an interval estimate. For example, if an interval estimation procedure provides intervals such that 95% of the intervals formed using the procedure will include the population parameter, the interval estimate is said to be constructed at the 95% confidence level.
narrows as the sample size increases.
The confidence interval for the mean value of y, and the prediction interval for an individual value of y each
Confidence coefficient
The confidence level expressed as a decimal value. For example, .95 is the confidence coefficient for a 95% confidence level.
*the sample
The distribution of values taken by a statistic in all possible samples of the same size from the same population is the sampling distribution of:
Regression model
The equation that describes how the mean or expected value of the dependent variable is related to the independent variable; in simple linear regression, E(y) = β0 + β1x.
Regression model
The equation that describes how y is related to x and an error term; in simple linear regression, the regression model is y = β0 + β1x + ϵ.
Estimated regression equation
The estimate of the regression equation developed from sample data by using the least squares method. For simple linear regression, the estimated regression equation is ŷ = b0 + b1x.
8
The following data show the results of an aptitude test and the grade point average of 10 students. The t test for a significant relationship between GPA and Aptitude Test Score is based on a t distribution with _____ degrees of freedom. n=10
$113,000
The following regression model has been proposed to predict sales at a gas station: yhat=10-4xsub1+7xsub2+18xsub3 where x 1= competitor's previous day's sales (in $1,000s), x 2= population within 5 miles (in 1,000s), x 3= 1 if any form of advertising was used, 0 if otherwise, and = sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $5,000, a population of 15,000 within 5 miles, and five radio advertisements.
Alternative hypothesis
The hypothesis concluded to be true if the null hypothesis is rejected.
Null hypothesis:
The hypothesis tentatively assumed true in the hypothesis testing procedure.
Prediction interval
The interval estimate of an individual value of y for a given value of x
Confidence interval:
The interval estimate of the mean value of y for a given value of x.
*variation between subjects is eliminated because the same subjects are used for both treatments.
The matched sample design often leads to a smaller sampling error than the independent sample design. The primary reason is that in a matched sample design:
*a multiple regression model.
The mathematical equation relating the expected value of the dependent variable to the value of the independent variables, which has the form of, is called:
*estimated regression equation.
The mathematical equation relating the independent variable to the expected value of the dependent variable, , is known as the:
*least squares method.
The method used to develop the estimated regression equation that minimizes the sum of squared residuals is called the:
*estimated regression equation.
The model developed from sample data that has the form is known as the:
np> or equal to 5 n(1-p)> or eq to 5
The normal probability distribution can be used to approximate the sampling distribution of p as long as:
null hypothesis
The p-value is a probability that measures the support (or lack of support) for the
*must be a number between 0 and 1.
The p-value:
Target population
The population for which statistical inferences such as point estimates are made. It is important for the __________ to correspond as closely as possible to the sampled population.
Sampled population
The population from which the sample is taken
*level of significance
The probability of making a Type I error when the null hypothesis is true as an equality is called the:
Level of significance
The probability of making a Type I error when the null hypothesis is true as an equality.
*multiple coefficient of determination
The proportion of the variability in the dependent variable that can be explained by the estimated multiple regression equation is called the:
*0 to 4.
The range of the Durbin-Watson statistic is:
Point estimator
The sample statistic, such as bar(x), s, or bar(p), that provides the __________ of the population parameter.
*normal distribution.
The sampling distribution of Pbar1 - Pbar2 is approximated by a:
Standard error of the estimate:
The square root of the mean square error, denoted by s. It is the estimate of σ, the standard deviation of the error term ϵ.
*multiple regression analysis.
The study of how a dependent variable y is related to two or more independent variables is called:
*error term, e
The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the p independent variables is the:
Finite population correction factor:
The term √((N-1)/(n-1)) that is used in the formulas for standard deviation of x bar and p bar whenever a finite population, rather than an infinite population, is being sampled. the generally expected rule of thumb is to ignore this when n/N ≤ .05.
*the same for all values of x.
The tests of significance in regression analysis are based on assumptions about the error term ɛ . One such assumption is that the variance of ɛ, denoted by 𝝈2, is:
*margin of error
The value added and subtracted from a point estimate in order to develop an interval estimate of the population parameter is known as the:
Point estimate
The value of the estimator used in a particular instance as an estimate of a population parameter.
Dependent variable
The variable that is being predicted or explained. It is denoted by y.
Independent variable:
The variable that is doing the predicting or explaining. It is denoted by x.
Margin of error
The ± value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter.
15
There are 6 children in a family. The number of children defines a population. The number of simple random samples of size 2 (without replacement) that are possible is equal to:
*using .95 as an estimate.
To compute the necessary sample size for an interval estimate of a population proportion, all of the following procedures are recommended when p is unknown except:
=NORM.S.DIST
To compute the p-value for a lower tail test, we enter the formula = .
*becomes wider
Using an α = .04, a confidence interval for a population proportion is determined to be .65 to .75. If the level of significance is decreased, the interval for the population proportion:
.09038
What is the value of sb1 ? (based on data not shown)
*Durbin-Watson test
What test can be used to determine whether first-order autocorrelation is present?
t distribution
When "s" is used to estimate "σ," the margin of error is computed by using the:
*p-value must be doubled
When completing a two-tailed hypothesis test about the difference between two population means, the
n - 2
When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, the appropriate degrees of freedom are:
*stepwise regression.
When determining the best estimated regression equation to model a set of data, the procedure that begins each step by determining whether any of the variables already in the model should be removed is called:
*n1 and n2 can be of different sizes.
When developing an interval estimate for the difference between two sample means, with sample sizes of n1 and n2 :
*match the targeted population
When drawing a sample from a population, the goal is for the sample to:
prediction interval
When studying the relationship between two quantitative variables, whenever we want to predict an individual value of y for a new observation corresponding to a given value of x, we should use a(n):
*overall significance
When we conduct significance tests for a multiple regression relationship, the F test will be used as the test for:
*individual significance.
When we conduct significance tests for a multiple regression relationship, the t test can be conducted for each of the independent variables in the model. Each of those tests are called tests for:
*confidence interval.
When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a(n):
*prediction interval
When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a(n):
*xbar
When xbar is unknown, which of the following is used to estimate xbar ?
*do not reject H0
Whenever the probability of making a Type II error has not been determined and controlled, only two conclusions are possible. We either reject H0 or:
*The level of significance
Which of the following does not need to be known in order to compute the p-value?
II only
Which of the following is(are) true? I. The mean of a population depends on the particular sample chosen. II. The standard deviations of two different samples from the same population may be the same. III. Statistical inferences can be used to draw conclusions about the populations based on sample data.
H0: u does not =10
Which of the following null hypotheses cannot be correct?
*Best-subsets regression
Which of the following options guarantees that the best model for a given number of variables will be found?
*A teacher uses a pretest and then a posttest with her students to see how much they have improved.
Which of the following scenarios follows a matched sample design?
*The standard deviation of the sampling distribution is the standard deviation of the population.
Which of the following statements regarding the sampling distribution of sample means is incorrect?
Gender
Which of the following variables is categorical?
*weighted moving averages
_________________, involves selecting a different weight for each data value and then computing a weighted average of the most recent k values as the forecast. If we believe that the recent past is a better predictor of the future than the distant past, larger weights should be given to the more recent observations.
*Exponential Smoothing
_______________________we select only one weight-the weight for the most recent observation. Exponential smoothing forecast for any period is actually a weighted average of all the previous actual values of the time series.
*forward selection
________________procedure starts with no independent variables. It adds variables one at a time using the same procedure as stepwise regression for determining whether an independent variable should be entered into the model.
*multicollinearity
_______________refers to the correlation among the independent variables. A sample correlation coefficient greater than +.7 or less than −.7 for two independent variables is a rule of thumb warning of potential problems with multicollinearity. When the independent variables are highly correlated, it is not possible to determine the separate effect of any particular independent variable on the dependent variable.
Probability
______________is a numerical measure of the likelihood of occurrence
*Multiple regression analysis,
_____________enables us to consider more factors and thus obtain better predictions than are possible with simple linear regression.
*interaction
___is the effect of two independent variables acting together. When interaction between two variables is present, we cannot study the effect of one variable on the response y independently of the other variable.
sample stats
b0 and b1 are
trend pattern
exists if the time series plot shows gradual shifts or movements to relatively higher or lower values over a longer period of time.
regression analysis
involving one independent variable and one dependent variable in which the relationship between the variables is approximated by a straight line.
*Judgment sampling
is a nonprobability method of sampling whereby elements are selected for the sample based on the judgment of the person doing the study.
*Convenience sampling
is a nonprobability method of sampling whereby elements are selected for the sample on the basis of convenience.
*Null hypothesis
is a tentative assumption about a population parameter such as a population mean or a population proportion. The alternative hypothesis, Ha, is a statement that is the opposite of what is stated in the null hypothesis.
interval estimate
is an estimate of a population parameter that provides an interval believed to contain the value of the parameter. It has the form: point estimate ± margin of error.
Ta/2
is the t value providing an area of a/2 in the upper tail of the t distribution with n − 1 degrees of freedom
The error term ∈ is a random variable with
mean or expected value of zero; that is, E(∈)=0. The variance of ∈ is denoted by ⌠^2 and is the same for all values of the independent variables. The values of ∈ are independent. The error term ∈ is a normally distributed random variable reflecting the deviation between the y value and the expected value of y given by B0+B1X1+B2X2...
pop prop
n= (Za/2)^2 x P bar(1-p bar)/E^2
The Durbin-Watson test
ranges in value from zero to four, with a value of two indicating no autocorrelation is present. If successive values of the residuals are close together (positive autocorrelation), the value of the Durbin-Watson test statistic will be small. If successive values of the residuals are far apart (negative autocorrelation), the value of the Durbin-Watson statistic will be large.
The standard error of xbar1 - xbar2 is given by:
sq rt of o^2/n1 + o^2/n2
The following information regarding the number of semester hours taken from random samples of day and evening students is provided. We would like to know if the difference in the mean semester hours taken by the two groups of students is statistically significant at the ⍺ = .05 level. What statistical test is appropriate for answering this question?
t test for a difference in two means
*Stratified random sampling
the elements in the population are first divided into groups called strata, such that each element in the population belongs to one and only one stratum.
estimated regression equation-
the estimate of the regression equation developed from sample data by using the least squares method.
Stationary time series
whose statistical properties are independent of time. For a stationary time series the process generating the data has a constant mean and the variability of the time series is constant over time.
*Cluster sampling
works best when each cluster provides a small-scale representation of the population.
stratified random sampling
works best when the variance among elements in each stratum is relatively small.
predicted value of the dep variable
y^
is the observed value of the dep variable for the ith observation.
yi is