Business Analytics Final
A mathematical model that gives the best decision, subject to the situation's constraints, is an a(n)
Optimization Model
An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a
clustered column chart
A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a
column chart
When working with data sets in Excel, __________ can be used to automatically highlight cells that meet specified requirements.
conditional formatting
In a linear regression model, y = b0 + b1x + e the parameter b1 represents the
slope of the regression line
A __________ is a graphical summary of data previously summarized in a frequency distribution
histogram
Bar charts use
horizontal bars to display the magnitude of the quantitative variable.
The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as
statistical inference
The graph of the simple linear regression equation is a(n)
straight line
The __________ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in the sample.
sum of squares due to error (SSE)
In the k-nearest neighbors method, when the value of k is set to 1
the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
A procedure for using sample data to find the estimated regression equation is
the least squares method
The "logit" is
the natural log of an odds ratio.
The sum of frequencies for all classes will always equal _____.
the number of elements in a data set
The adjusted R2 adjusts R2 for
the number of explanatory variables in a multiple regression model
In the k-Nearest Neighbor technique in data mining, the "k" refers to
the number of neighbors used.
In linear regression, the fitted value is
the predicted value of the dependent variable
The equation that describes how the dependent variable (y) is related to the independent variable (x) is called _____.
the regression model
If we use linear regression for predicting a categorical variable y (event happens y=1, event not happens y=0), then
the results are not interpretable as predicted y is not restricted to 0 and 1
In a cumulative frequency distribution, the last class will always have a cumulative frequency equal to _____.
the total number of elements in the data set
If the probability is 0.75, then log odds measure is
odds that event happens is 3:1 1/1-p
A __________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable
parallel-coordinates plot
Multicollinearity
refers to the degree of correlation among independent variables in a regression model.
A _____________ is a graphical presentation of the relationship between two quantitative variables
scatter chart
If a data set produces SSR = 400 and SSE = 100, then the coefficient of determination is
.80
What would be the coefficient of determination if the total sum of squares (SST) is 23.29 and the sum of squares due to regression (SSR) is 10.03?
0.43 The coefficient of determination r2 = SSR/SST. Substituting the given values, we get r2 = 0.43
A regression model between sales (y in $1000), unit price (x1 in dollars), and television advertisement (x2 in dollars) resulted in the following function:ŷ = 7 - 3x1 + 5x2For this model, SSR = 3500, SSE = 1500, and the sample size is 18. The adjusted R-squared or adjusted coefficient of determination for this problem is
0.66
k-nearest neighbor uses the features of the closest distance neighbor to classify a new observation. This distance is
Euclidean distance
refers to the use of sample data to calculate a range of values that is believed to include the value of the population parameter
Interval estimation
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.
Logistic regression
Which of the following is true about Residuals ?
Lower is better
Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data?
Number of nonoverlapping bins, width of each bin, and bin limits
Which of the following gives the proportion of items in each bin?
Relative frequency
In a regression analysis, if r2 = 1, then _____.
SSE must be equal to 0
R^2
SSR/SST
__________ merges maps and statistics to present data collected over different geographies
The geographic information system
Big data is data that is too large or too complex. What are the four Vs that describe such data?
Volume, Veracity, Velocity, Variety
The difference between the observed value of the dependent variable and the value predicted by using the estimated regression equation is called _____.
a residual
We ran a regression to study the effects of education in years (x) on wages in thousand dollars (y). We fit the regression line: ŷ = b0+b1x and found b0=1.2 and b1=0.35. Which of these statements are true?
b1 is the slope of regression line and it suggests that 1 more year of education increases wages by 0.35 thousand dollars.
Frequency distributions can be made for _____.
both categorical and quantitative data
In order to visualize three variables in a two-dimensional graph, we use a
bubble chart
The U.S. Internal Revenue Service uses _____________ to identify patterns that distinguish questionable annual personal income tax filings.
data mining
Data dashboards are a type of _________ analytics.
descriptive
The __________ is the range of values of the independent variables in the data used to estimate the regression model
experimental region
ŷ = b0 +b1x1 + b2x2 is called
least square regression line equation
It is possible for the coefficient of determination to be
less than 1
If the log odds measure suggest that log odds = 0.67 + 0.45 x1 then which statement is true?
log odds of y=1 or event happening increases by 0.45 due to one unit increase in x1
A least squares regression line ______.
may be used to predict a value of y if the corresponding x value is given
In multiple regression analysis
there can be several independent variables, but only one dependent variable
A __________ is useful for visualizing hierarchical data along multiple dimensions
treemap
In the graph of the simple linear regression equation, the parameter b0 represents the ___________ of the true regression line
y-intercept