BSIS Ch. 3 & 7 Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

When there are many independent variables to consider, special procedures are sometimes employed to select the independent variables to include in the regression model. All of the following are examples of variable selection procedures except for

All of the following are examples of variable selection procedures except for overfitting.

The multiple regression equation based on the sample data, which has the form "y^=b0 + b1x1 + b2x2...+ bqxq", is called

An estimated multiple regression equation.

The following regression model has been proposed to predict sales at a gas station: where x1 = their previous day's sales (in $1,000s), x2 = population within 5 miles (in 1,000s), x3 = 1 if any form of advertising was used, 0 if otherwise, and sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within 5 miles, and twenty radio advertisements.

The predicted sales is equal to $86,000. The value to be substituted for the advertising factor is "1" regardless of how many advertisements play.

The phenomenon by which the value of an estimate generally gets closer to the value of the parameter being estimated as the sample size grows is called the _____.

law of large numbers

Regression analysis involving one independent variable and one dependent variable is referred to as

simple linear regression.

In the simple linear regression equation, the parameter is the _______ of the true regression line.

slope

Increasing the "white space" in a table by removing unnecessary lines increases all of the following except the

table's size. Increasing the "white space" in a table by removing unnecessary lines increases all of the following except the size of the table. Removing unnecessary lines does not affect the size of the table.

Construct a scatterplot for the following set of data using Var1 for the vertical axis and Var2 for the horizontal axis. Describe the relationship between the two variables.

As Var1 increases, Var2 decreases.

Which of the following methods would NOT improve the readability of the table shown below?

Put the numerical values in italic font.

_____ is a statistical procedure used to develop an equation showing how two variables are related.

Regression analysis is a statistical procedure used to develop an equation showing how variables are related.

The following regression model has been proposed to predict sales at a gas station: FORMULA GIVEN, where x1 = their previous day's sales (in $1,000s), x2 = population within 5 miles (in 1,000s), x3 = 1 if any form of advertising was used, 0 if otherwise, and sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within 5 miles, and six radio advertisements.

86,000

Natalie needs to compare values across different categories. Which of the following charts should Natalie use?

column (bar) chart

When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a

confidence interval.

A tabular summary of data for two variables is referred to as a

crosstabulation.

A variable used to model the effect of categorical independent variables is called a(n)

dummy variable.

In the simple linear regression model, the ________ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between x and y.

error term

The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the q independent variables is the

error term, E.

The _____ is the range of values of the independent variables in the data used to estimate the regression model.

experimental region

Prediction of the value of the dependent variable outside the experimental region is called

extrapolation.

A bubble chart is a graphical presentation that

has two axes that represent two variables, and the magnitude of the third variable is given by the size of the bubble.

The graph of the simple linear regression equation is a(n) _____.

line

DJ needs to display data over time. Which of the following charts should DJ use?

line chart

A Geographic Information System (GIS)

merges maps and statistics to present data collected over different geographies.

The study of how a dependent variable y is related to two or more independent variables is called

multiple linear regression.

The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the error term ε follows a ________ distribution for all values of x.

normal

Which of the following can be used to show overall trend?

sparklines

The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as

statistical inference.

The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the variance of ε, denoted by σ 2 , is

the same for all values of x.

Which of the following types of graphs is useful for visualizing hierarchical data along multiple dimensions?

treemap

A bar chart is a graphical presentation that

uses horizontal bars to display the magnitude of quantitative data.

Of the options below, which graphical display can be used to compare categorical data?

bar chart

The KPIs displayed in a data dashboard should do all of the following except

be displayed across multiple screens.

What would be the coefficient of determination if the total sum of squares (SST) is 30 and the sum of squares due to regression (SSR) is 27?

.9. The coefficient of determination = SSR/SST = 27/30 = 0.9.

The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the error term ε is a random variable with a mean or expected value of

0.

A data dashboard is a visualization tool that

A data visualization tool that updates in real time and gives multiple outputs is a data dashboard.

The Golf Course manager must report the number of visits to the course over the last 12 months. These data are shown in the table. How could these data be best displayed?

A line chart is a graphical presentation of time-series data in which the data points are connected by a line.

The graph below displays sales revenues for the last eleven years. Which of the following would improve the layout and display of this line chart?

Add a title to the chart.

Which of the following charts is appropriate for displaying the data given below?

Bar Graph

Based upon the p-values for "Month" and "Month Squared," what can we conclude about the current model?

Because both p-values are substantially less than 0.05, we can conclude that adding Months Squared to the model involving Months is significant. Because both p-values are substantially less than 0.05, we can conclude that adding Months Squared to the model involving Months is significant.

Which of the following options guarantees that the best model for a given number of variables will be found?

Best subsets regression guarantees that the best model for a given number of variables will be found because it compares all possible models with a specified number of independent variables.

Which of the following options is NOT an iterative variable selection procedure?

Best subsets regression is not an iterative variable selection procedure. The other three procedures are iterative, which means at each step of the procedure, a single independent variable is added or deleted and the new model is evaluated.

Danah is responsible for reporting the status of sales for his company. The following pie chart shows the percentages of closed sales in each of the top 7 cities, but a scatter chart would be preferable to show data of this type.

False. Pie charts are useful for displaying categorical data.

Time series data for regular gasoline prices in the United States for 24 consecutive months is shown below. It would be appropriate to model this relationship with a line because the graph displays a strong linear relationship between months and gas price.

False. The graph does not display a strong linear relationship between months and gas price, so it is not recommended to model this relationship with a line.

Looking at the following chart showing time series data for regular gasoline prices in the United States for 24 consecutive months, you can make this interpretation about gas prices: Overall the price of gasoline was extremely consistent during the time period shown.

False. This chart shows that overall the price of gasoline was not extremely consistent during the time period shown.

If a residual plot of x verses the residuals, , shows a non-linear pattern, then we should conclude that

If a residual plot of x verses the residuals, , shows a non-linear pattern, then we should conclude that the regression model is not an adequate representation of the relationship between the variables.

If GPA and Aptitude Test Scores are linearly related, which of the following must be true?

If x and y are linearly related, then β ≠ 0.

_____ refers to the use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.

Interval estimation

Danah is responsible for reporting the status of sales for his company. The following pie chart shows the percentages of closed sales in each of the top 7 cities. Which of the following is NOT a problem with using a pie chart to display these data?

It provides general information at a glance.

What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 22.21 and the sum of squares due to error (SSE) is 6.89?

SST = SSR + SSE. Substituting the values, we get SSR = 15.32.

Which of the following gives the correct quadratic regression model?

Sales = INTERCEPT - 24.55(Months Employed) + 8.06(Months Employed)2

The coefficient of determination

The coefficient of determination is defined as SSR/SST. The coefficient of determination must be a number between 0 and 1. It is not used to measure the slope of the estimated regression line and is interpreted as the percent of the variation in the values of y that are explained by the estimated regression line.

Suppose an estimated regression equation has a coefficient of determination (r2) of 0.866. Interpret this value.

The estimated regression equation explains approximately 86.6% of the variation in the dependent variable. The coefficient of determination (r2) can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.

In a regression analysis, if SSE = 200 and SSR = 300, then the coefficient of determination is

The value of r2 = SSR/(SSE + SSR) = 300/(200 + 300) = 0.6.

Construct a scatterplot for the following set of data using Var1 for the vertical axis and Var2 for the horizontal axis. Describe the relationship between the two variables.

There is a positive, linear correlation between the two variables, but the correlation is not perfect.

Interpret the coefficient of determination for this regression model. R Square =.955

This regression model explains approximately 95.5% of the variation in cars sold for our sample data. The correct value is listed above as "R Square." This value describes the amount of variation in the dependent variable (number of cars sold) that is explained by the regression model.

The mathematical equation relating the expected value of the dependent variable to the value of the independent variables, which has the form of "y = Bo + B1x1 + B2x2 +...+ Bqxq + E", is called

a multiple regression model.

An approximation of the linear relationship between variables in a chart can be represented with a

a trendline.

Edward Tufte introduced the idea of the data-ink ratio, as a way of quantifying the proportion of "data-ink" to the total amount of ink used in a table or chart. Which of the following options would increase the data-ink ratio of a table?

adding a title to the table

Suppose a residual plot of x verses the residuals, y - y^, shows a nonconstant variance. In particular, as the values of x increase, suppose that the value of the residuals also increases. This means that

as the values of x get larger, the ability to predict y becomes less accurate.

To better understand the relationship between advertising dollars spent and the subsequent sales, you could create a _____ chart.

scatter chart

The process of making conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is known as

hypothesis testing.

The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the values of ε are

independent.

A PivotTable

is a crosstabulation created in Excel that is interactive.

A PivotChart

is a graphical presentation created in Excel that functions similar to a PivotTable.

A key performance indicator (KPI)

is a metric that is crucial for understanding the current performance of an organization.

A heat map

is a two-dimensional graphical presentation of data in which color shadings indicate magnitudes.

A graphical presentation used to examine more than two variables in which each variable is represented by a different vertical axis is called a

parallel coordinates plot.

A(n) ________ refers to a measurable factor that defines a characteristic of a population, process, or system.

parameter

A crosstabulation in Excel is called a

pivot table.

A _____ interval is an interval estimate of an individual y value given values of the independent variables.

prediction

When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a

prediction interval.

If developing a regression model to make future predictions, the selection of the independent variables to include in the regression model should be based on the _____ on observations that have not been used to train the model.

predictive accuracy

What type of regression model should be used when there is a nonlinear relationship between the independent and dependent variables that is fit by including the independent variable and the square of the independent variable?

quadratic regression model

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the

residual.

A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.

scatter chart

When determining the best estimated regression equation to model a set of data, the procedure that uses an iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step is called

stepwise selection.

Simple linear regression refers to the type of regression analysis for which the relationship between the independent variable and dependent variable are approximated by a(n)

straight line.

The _____ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in a sample.

sum of squares due to error (SSE)

Does the t test indicate a significant relationship between GPA and Aptitude Test Score? State the test statistic, and then state your conclusion. Use ∝ = 0.05.

t = 6.25. The p-value is less than 0.05, so there is evidence is sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores.

The procedure of using sample data to find the estimated regression equation is better known as

the least squares method.

When the mean value of the response variable is independent of variation in the predictor variable, the slope of the regression line is

zero.


Ensembles d'études connexes

World geography final review pt 2

View Set

MC Ch 2 homework problems and answers

View Set

Chapter 14: Bureaucracy in a Democracy

View Set

A&C Med Surg Iggy Ch 10, Chapter 11: Concepts of Care for Patients with Common Environmental Emergencies, Chapter 12: Concepts of Disaster Preparedness, chapter 34 test bank, test 7 extra medical surg

View Set

Series 66 Unit 7 Checkpoint Exam - Financial Statements

View Set

Vocabulary Workshop- Level E- Unit 14 definitions

View Set

Psychology unit 4- sensation and perception

View Set

Basic Human Nutrition: Chapter 9 Quiz

View Set

Praxis Interactive Test ESOL 5362

View Set