BA test 2
With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the
smoothing constant
The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as
statistical inference
The graph of the simple linear regression equation is a(n)
straight line
Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as
supervised learning
The basis for using a normal probability distribution to approximate the sampling distribution of the same means and population mean is
the central limit theorem
A procedure for using sample data to find the estimated regression equation is
the least squares method
A set of observations on a variable measured at successive points in time or over successive periods of time constitute a
time series
A parameter is a numerical measure from a population, such ass
u.
When the expected value of the point estimator equals the population parameter, we say the point estimator is
unbiased
A positive forecast error indicates that the forecasting method _________ the dependent variable.
underestimated
The moving averages method refers to a forecasting method that
uses the average of the most recent data values in the time series as the forecast for the next period
A characteristic or quantity of interest that can take on different values is a
variable
A pizza shop advertises that they deliver in 30 minutes or less or it is free. People who live in homes that are located on the opposite side of town believe it will take the pizza shop longer than 30 minutes to make and deliver the pizza. A random sample of 50 deliveries to homes across town was taken and the mean time was computed to be 32 minutes. What is the appropriate symbol to represent the value, 32?
x=32
In the graph of the simple linear regression equation, the parameter Bo represents the ______ of the true regression line.
y-intercept
When the mean value of the dependent variable is independent of variation in the independent variable, the slope of the regression line is
zero
In interval estimation, as the sample size becomes larger, the interval estimate
becomes narrower
As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution.
becomes smaller
Using an a= 0.04, a confidence interval for a population proportion is determined to be 0.65 to 0.75. If the level of significance is decreased, the interval for the population proportion
becomes wider
Which is not true regarding trend patterns?
can result when business conditions shift to a new level at some point in time
The ___________ is a measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.
coefficient of determination
a ___________ matrix displays a model's correct and incorrect classification.
confusion
Assessing the regression model on data other than the sample data that was used to generate the model is known as
cross-validation
A ________ refers to a model input that can be controlled in a spreadsheet model.
decision variable
The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by
determining how well a particular forecasting method is able to reproduce the time series data that are already available.
A time series with a seasonal pattern can be modeled by treating the season as a
dummy variable
A variable used to model the effect of categorical independent variables in a regression model is known as a
dummy variable
Classifying a record as belonging to one class when it belongs to another class is referred to as a
error
In the simple linear regression model, the ______ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables
error term
A test set is the data set used to
estimate performance of the final model on unseen data
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of
estimation of a continuous outcome
The ______ is the range of values of the independent variables in the data used to estimate the regression model
experimental region
Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2,,,,, xq that are outside the experimental range is caled
extrapolation
Prediction of the value of the dependent variable outside the experimental region is called
extrapolation
Regression analysis involving one dependent variable and more than one independent variable is known as
multiple regression
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size
n has the same probability of being selected
The set of recorded values of variables associated with a single entity is a
observation
The percent of misclassified records out of the total records in the validation data is known as the
overall error rate
Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as
overfitting
Two approaches to drawing a conclusion in a hypothesis test are
p-value and critical value
With reference to a spreadsheet model, an uncontrollable model input is known as a
parameter
A simple random sample of 31 observations was taken from a large population. The sample mean equals 5 Five is a
point estimate
The population parameter value and the point estimate differ because a sample is not a census of the entire population, but it is being used to develop the
point estimate
A forecast is defined as a
prediction of future values of a time series.
Which statement is not true
rejecting the null hypothesis when it is true is a type II error.
Casual models
relate a time series to other variables that are believed to explain or cause its behavior
The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the
residual
The value of the ______ is used to estimate the value of population parameter.
sample statistic
A ______ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.
scatter chart
A time series that shows a recurring pattern over one year or less is said to follow a
seasonal pattern
A regression analysis involving one independent variable and one dependent variable is referred to as a
simple linear regression
In the graph of the simple linear regression equation, the parameter B1 is the ________ of the true regression line.
slope
In a simple linear regression analysis the quantity that gives the amount by which the dependent variable changes for a unit change in the independent variable is called the
slope of the regression line
In a simple linear regression model, y B0+Bx +E the parameter B1 represents the
slope of the true regression line
Which of the following regression models is used to model a nonlinear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model?
Quadratic regression model
Spreadsheet models are referred to as what-if models because they
allow easy instantaneous recalculation for a change in model inputs
A normally distributed error term with a mean of zero would
allow more accurate modeling
The ________ button provides an automatic means of checking for mathematical errors within formulas of a worksheet.
Error checking
___________ uses a weighted average of past time series values as the forecast.
Exponential smoothing
_____________ is the amount by which the predicted value differs from the observed value of the time series variable.
Forecast error
The purpose of statistical inference is to make estimates or draw conclusions about a
Population based upon information obtained from the sample
A random sample selected from an infinite population is a sample selected such that each element selected comes from the same ___ and each element is selected_____
Population; independently
For a population with an unknown distribution, the form of the sampling distribution of the sample mean is
approximately normal for large sample sizes
A one-way data table summarizes.
A single input's impact on the output of interest
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?
.35
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with a 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the standard error of the proportion.
0.039
What would be the value of the sum of squares due to regression SSR if the total sum of squares SST is 25.32 and the sum of squares due to error SSE is 6.89?
18.43
Demand for a product and the forecasting department's forecast for a product are shown below compute the mean absolute error.
2
If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25 what is the forecast error in period 2?
2.5
In order to determine an interval for the mean of a population with unknown standard deviation, a sample of 24 items is selected. The mean of the sample is determined to be 23. The number of degrees of freedom for reading the t value is.
23
If the expected value of a sample statistic is equal to the population parameter being estimated, the sample statistic is said to
be an unbiased estimator of the population parameter
_______ is used to test the hypothesis that the values of the regression parameters B1, B2,,,,, Bq are all zero
An F test
A ________ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.
Classification tree
The modeling process begins with the framing of a _______ model that shows the relationships between the various parts of the problem being modeled.
Conceptual
___________ involves descriptive statistics, data visualization, and clustering.
Data exploration
___________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.
Data preparation
___________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.
Data preparation
____________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.
Data preparation
_________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.
Data sampling
In a linear regression model, the variable that is being predicted or explained is known as __________. It is denoted by y and is often referred to as the response variable.
Dependent variable
A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses?
H0: P=0.5, Hz:p doesn't not equal 0.5
A pizza shop advertises that they deliver in 30 minutes or less or it is free. People who live in homes that are located on the opposite side of town believe it will take the pizza shop longer than 30 minutes to make and deliver the pizza. Write the null and alternative hypotheses that can be used to conduct a significance test.
H0: u<or equal to 30, Ha: u>30
The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?
H0: u>or equal to 12, Hz: u<12
The average number of hours for a random sample of mail order pharmacists from Company A was 50. 1 hour last year. It is believed that changes to medical insurance have led to a reduction in the average work week. To test the validity of this belief, the hypotheses are
H0: u>or equal to 50.1, u<50.1
The __________ function is used for the conditional computation of expressions in Excel.
IF
In a linear regression model, the variable used for predicting or explaining values of the response variable are known as the ________. It is denoted by x.
Independent variable
_________ is a generalization of linear regression for predicting a categorical outcome variable.
Logistic regression
__________ attempts to classify a categorical outcome as a linear function of explanatory variables.
Logistic regression
Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another?
Mean forecast error
The degree of correlation among independent variables in a regression model is called.
Multicollinearity
__________ refers to the degree of correlation among independent variables in a regression model.
Multicollinearity
________ refers to the scenario in which the the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.
Overfitting.
___________ is a statistical procedure used to develop an equation showing how two variables are related.
Regression analysis
What are the two decisions that you can make from performing a hypothesis test?
Reject the null hypothesis; Fail to reject the null hypothesis.
Determine whether the alternative hypothesis is left-tailed, right-tailed, or two-tailed: H0: u=11, Ha: u>11.
Right tailed
The __________ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results.
SUMPRODUCT
The _______ button in the Formula Auditing group allows the user to inspect each formula in detail in its cell location.
Show Formulas
The ______ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in the sample.
Sum of squares due to error SSE
_______is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.
Supervised learning
With reference to the SUMPRODUCT function, which of the following statements is true?
The arrays that appear as arguments must be of the same dimension.
Which of the following approaches is a good way to proceed with the influence diagram building for a problem?
The influence diagram for a portion of the problem is built first and then expanded until the total problem is conceptually modeled.
__________ is used to test the hypothesis that the values of the regression parameter B1, B2,,,,, Bq are all zero
The least squares method
Trend refers to
The long-run shift or movement in the time series observable over several periods of time.
Which of the following statements is the objective of the moving averages and exponential smoothing methods.
To smooth out random fluctuations in the time series.
Which of the following states the objective of time series analysis?
To uncover a pattern in a time series and then extrapolate the pattern into the future.
Larger values of a have the disadvantage of increasing the probability of making a
Type I error
The ___________ function allows the user to pull a subset of data from a larger table of data based on some criterion.
VLOOKUP
A sample of 37 AA batteries had a mean lifetime of 584 hours. A 95% confidence interval for the population mean was 579.2<u < 588.8. Which statement is the correct interpretation of the results?
We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours.
The proportion of dental procedures that are extractions is 0.16. Which of the following exemplifies a Type I error in this situation?
We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16
A Type I error is committed when
a true null hypothesis is rejected
The SUM function in Excel
adds up all the numbers in a range of cells
The conceptual model
helps in organizing the data requirements
A one-tailed test is a hypothesis test in which the rejection region is
in one tail of the sampling distribution
A _________ is a visual representation that shows which entities affect others in a model.
influence diagram
An estimate of a population parameter that provides an interval of values believed to contain the value of the parameter is known as the
interval estimate
The coefficient of determination
is used to evaluate the goodness of fit
A null and alternative hypothesis for a one proportion z test are given as H0: p=0.8, Ha: p<0.8 This hypothesis test is
lower-tailed.
A ________ decision is one in which companies have to decide whether they should manufacture a product or outsource production to another firm.
make-versus buy
Statistical significance at the 0.01 level is _______ than significance at the 0.05 level.
more difficult to achieve
You are _____ to commit a a Type I error using the 0.05 level of significance than using the 0.01 level of significance.
more likely