General Business 307 Exam 1
True or False? The following statement would return all records from a hypothetical table named "impressions" SELECT * FROM impressions;
True
True or False? The p-value is the probability that a researcher is making a type-1 error if they decide to reject the null hypothesis
True
Which of the following is an example of a continuous variable? Select all that apply. -Amount of beer produced by MillerCoors in 2016 -Google's Earnings Per Share -Number of store employees scheduled to work from 12:00pm-3:00pm -Total visitors to Disney Parks on 12/31/2016
-Amount of beer produced by MillerCoors in 2016 -Google's Earnings Per Share
Which of the following are examples of autocorrelation? Select all that apply. -Beer sales at the terrace each week are highly correlated with beer sales from the previous week -Beer sales at the terrace each week are highly correlated with beer sales two weeks prior -Beer sales at the terrace each week are highly correlated with brat sales that same week -Beer sales at the terrace each week are highly correlated with ice cream sales from the prior week
-Beer sales at the terrace each week are highly correlated with beer sales from the previous week -Beer sales at the terrace each week are highly correlated with beer sales two weeks prior
Which of the following statements regarding Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) is correct? Select all that apply. -Outliers have a greater impact on RMSE than on MAE -When a model poorly fits the time-series data, the MAPE is large -RMSE is a good measure of model accuracy, but its magnitude depends on the scale of the outcome variable. -RMSE and MAPE are common types of forecast errors, while MAE is not
-Outliers have a greater impact on RMSE than on MAE -When a model poorly fits the time-series data, the MAPE is large -RMSE is a good measure of model accuracy, but its magnitude depends on the scale of the outcome variable.
Undestanding the difference between causal and correlational statements can be critical when consuming (or creating) media. Which of the following statements implies causality? Select all that apply. -You should move to a large Californian city if you want to live longer -Performing household chores may boost brain health in the elderly -Exposure to display advertising is associated with a 10% increase in the odds that a user purchases from the advertiser -People living in large Californian cities tend to have the longest life expectancies -Exposure to display advertising increases the odds that an individual purchases from the advertiser by 10% -Higher levels of daily activity are linked to better mental health
-You should move to a large Californian city if you want to live longer -Performing household chores may boost brain health in the elderly -Exposure to display advertising increases the odds that an individual purchases from the advertiser by 10%
Which of the following is an excel function commonly used to create categorical variables? =MAX(...) =WHEN(...) =SUMPRODUCT(...) =IF(...)
=IF(...)
When there is no autocorrelation in the residuals, the Durbin Watson Statistic equals what number?
2
We frequently create moving averages to reduce the amount of noise in a data set, and make the trend and potential cycles more apparent. If you create a 25 period moving average, how many observations would you have to eliminate? 24 12 23 13 0
24; Whenever we create a moving average to smooth our data, we lose W-1 observations, where W is the length of the window. We lose (W-1)/2 observations from both the beginning and end of our time series, because we don't know what the prior or following values would have been.
Which of the following pieces of information would require the least amount of space to store within a database? -a binary indicator of whether a restaurant accepts reservations -the average cost of an item on a restaurant's menu -the date on which a restaurant opened -a picture of a restaurant's entrance -a restaurant's name
A binary indicator of whether a restaurant accepts reservations
Which of the following would likely be a required field in a retailer's database? Select all that apply. A customer's mobile phone number when creating a new customer record. A coupon code when recording a transaction in the orders table. A customer's shipping address when creating a new e-commerce order. The employee ID number when creating a new entry in a table tracking hours worked.
A customer's shipping address when creating a new e-commerce order. The employee ID number when creating a new entry in a table tracking hours worked.
The following residual plot shows evidence of what? Select all that apply. It's a plot that has a fan shape
A violation of the equal variance assumption
Which of the following statements about forecasting is NOT correct? -After you select a particular forecasting model, you do not need to continually monitor your forecasts -Two common approaches to forecasting are qualitative and quantitative forecasting -when a time series increases at a rate such that the percentage difference from value to value is constant, an exponential trend is present -extrapolation and econometric models are both types of quantitative forecasting methods
After you select a particular forecasting model, you do not need to continually monitor your forecasts
True or False? Performing a best subsets regression will help us better understand why certain parameters matter
False; best subsets regression produces a set of models that include combinations of parameters that are highly predictive of the outcome, but it doesn't tell us why certain variables matter
True/False: a primary key is always a single field that determines a unique record within a table
False; primary keys can span multiple fields, in which case they are known as composite keys
Which of the following is not a component of time series? -Baseline -Seasonality -Random Effect/Noise -Trend -Cycles
Baseline Trend, seasonality, cycles, and noises are the four components of a times series
True or False? The Root Mean Square Error (RMSE) is the average percent difference between observed and predicted values.
False; the MAPE is the average percent difference between observed and predicted values
Which of the following is NOT a model selection method? -Stepwise Regression -Best Fit Competition -Best Subsets -Model Validation
Best Fit Competition
What is it called when two independent variables in a regression model are highly correlated? -Strong Inter-P -Intervariance -Entropy -Collinearity or Multi-Collinearity
Collinearity or Multi-Collinearity
Which of the following is NOT a statistic we use to compare models? -Combined p-value -Adjusted R^2 Cp Statistic
Combined p-value
Which of the following is not a component of a relational database? -Record -Cousin Object -Table -Primary Key -Field
Cousin Object
Models recommended by the Cp-statistic should be equal to
Cp<=k+1
Adding new variables to your model (ie HouseStart, GDP, Unemployment) improves your model by predominantly controlling for: Noise Seasonality Trend Cyclicality
Cyclicality
A clothing retailer is calculating the number of shirts sold at each of their locations. Is this a discrete or continuous variable? Note, stores cannot sell partial shirts.
Discrete
In this class, we will focus on using SQL to extract information from a database using queries. However, it is much more powerful than that. Which of the following can SQL do? Select all that apply. Enter new records into a database. Change the structure of an existing database, including adding tables, deleting tables, deleting the entire database, etc. Extract information from a database. Update records already entered in a database. Define the database structure, including creating tables, fields, relationships, etc.
Enter new records into a database. Change the structure of an existing database, including adding tables, deleting tables, deleting the entire database, etc. Extract information from a database. Update records already entered in a database. Define the database structure, including creating tables, fields, relationships, etc.
When we are examining the equal variance assumption, we want to see that the spread of points is (roughly) equal for all vaules of X
Equal Variance Assumption
Which of the following is an example of seasonality? Select all that apply. -Terrace beer sales have increased steadily for the past 10 years -Every year between Thanksgiving and Christmas, retail sales increase significantly -Last tuesday, shoe sales at Dick's Sporting Goods were higher than expected -Ice Cream Sales decrease every winter
Every year between Thanksgiving and Christmas, retail sales increase significantly Ice cream sales decrease every winter
True or False: Stepwise and Best Subsets regression will always recommend the same model. Note, we are not asking you to run a Stepwise or Best Subsets regression. This is intended to be a general question.
False
True or False? Even when a relationship is not statistically significant, there is still a good chance that it is practically significant
False
True or False? The order of rows in a table matters
False
True or false? Outliers always have a significant effect on the best fit line
False
True or false? Regardless of the context of your analysis, a threshold of p < 0.05 is appropriate for determining statistical significance
False
You have run a regression based on the following model: l n ( Y ^ ) = α + l n ( X ) β. The parameter estimate for β came back as 0.10. Which of the following is the correct interpretation of this parameter estimate?
For every one percent increase in X, we expect Y to increase by 0.10%
The _______ clause allows you to filter results based on an aggregation function
HAVING
A _____ clause specifies a condition on which we can filter an aggregated value
HAVING; the having clause allows us to filter our results based on aggregated values such as averages or sums
When joining two tables, which of the following join types requires a match between columns in both tables to return a record? -LEFT OUTER JOIN -INNER JOIN -FULL OUTER JOIN -RIGHT OUTER JOIN
INNER JOIN
Which of the following is NOT a method commonly used to account for cycles in time series forecasting models? -Instrumental Variables -Lagged Variables -Leading Indicators
Instrumental Variables
In a regression equation, which of the following allows the effect of one independent variable to depend on the value of another? -Interaction term -Adjusted r^2 -Categorical Variable -Indicator Variable
Interaction Term
Imagine that you have run two regressions. In the first, you regressed Y on X1. Call this model 1. In the second, you regressed Y on X1 and X2. Call this model 2. The adjusted r-squared value for model 1 is 0.58, while that of model 2 is 0.65. Based only on these in sample fit statistics, which is preferred?
Model 2
Which of the following are examples of seasonality? Select all that apply. Netflix web traffic increases every evening between 7 pm and 10 pm. Stock market prices are currently much higher than we would expect based on historical trends and averages. Sales of ice skates at Dick's Sporting Goods are highest in the winter. Consumers are increasingly turning to alternative energy sources, driving the sales of solar panels.
Netflix web traffic increases every evening between 7 pm and 10 pm. Sales of ice skates at Dick's Sporting Goods are highest in the winter.
Which of the following is not an assumption about the population errors in linear regression? -No collinearity -Independence -Normally Distributed -Equal Variance -Linearity
No Collinearity
When predicting time series values such as sales, commodity prices, or resource utilization, which component of the time series will contain unpredictable, rare events? -Seasonality -Noise -Trend -Cycle
Noise
Which data type requires the largest amount of memory to store? -OLE Object -Text -Numeric -Binary
OLE Object
Suppose you add a new variable to an existing regression model. Which of the following can be true? Select all that apply. R square increases, Adjusted R square decreases R square increases, Adjusted R square increases R square decreases, Adjusted R square increases R square decreases, Adjusted R square decreases
R square increases, Adjusted R square decreases R square increases, Adjusted R square increases
Which of the following statements regarding MAE, RMSE, and MAPE is correct? -RMSE and MAPE are common types of forecast errors, while MAE IS NOT -RMSE is a good measure of model accuracy, but its magnitude depends on the scale of the outcome variable -outliers have a greater impact on MAE than on RMSE -When a model poorly fits the time-series data, the MAPE is small
RMSE is a good measure of model accuracy, but its magnitude depends on the scale of the outcome variable
Imagine you're working for a retailer, and you've developed several models to forecast store traffic. You're getting ready to test these models out of sample, and are deciding which metric to use. While you want a model that is as accurate as possible, you're willing to take one that is slightly less accurate on average if it significantly reduces the risk of large forecast errors. Which would be the appropriate metric by which to compare your models? Root Mean Square Error (RMSE) Mean Absolute Percentage Error (MAPE) Mean Absolute Error (MAE)
Root Mean Square Error (RMSE)
The SELECT clause lists the fields that you want returned from your query
SELECT
Which of the following is NOT an aggregation function in SQL? -SUM -TOP -AVERAGE -MIN
TOP
You regress a dependent variable, Y, on an independent variable, X. When examining the resulting residual plots, you notice that the residuals exhibit unequal variance with respect to X, such that the variance increases as the value of X increases as in the plot below. How could we transform Y and/or X in a new regression to resolve this? -Take the natural log of X Square X Take the natural log of X and Y Take the natural log of Y Square Y
Take the natural log of Y
You are working at a car dealership, and decide to run a regression to predict the sales price for the cars in your inventory. Which of the following independent variables would need to be modeled using a dummy variable? Note, dummy variables are also sometimes called categorical variables. Please select ALL correct answers below: -The number of miles the car can travel on a gallon of gas -The car's color -The total miles the car has traveled -Whether the car is a convertible
The car's color Whether the car is a convertible
Which of the following is a legitimate concern when applying a square transformation to an independent variable? Select all that apply. The intuitive interpretation of slope estimates can be more difficult You cannot compare the adjusted r 2 between otherwise identical models with and without the square transformation. The transformation requires an additional covariate, challenging model parsimony You cannot apply a square transformation when the independent variable takes values less than zero.
The intuitive interpretation of slope estimates can be more difficult The transformation requires an additional covariate, challenging model parsimony
What does a p-value tell us? The p-value is the percent chance that our statistical relationship has practical significance. The p-value is the probability, given the data, of getting a parameter estimate at least as extreme as what we observe if the true population estimate were zero. The p-value is the percent of variation in the dependent variable explained by variation in the independent variable(s). If we declare the relationship between our independent and dependent variables to be statistically significant, the p-value is the probability that we're making a type-1 error.
The p-value is the probability, given the data, of getting a parameter estimate at least as extreme as what we observe if the true population estimate were zero. If we declare the relationship between our independent and dependent variables to be statistically significant, the p-value is the probability that we're making a type-1 error.
When two independent variables in a regression model exhibit collinearity, which of of the following is NOT a likely result of removing one of the offending covariates? Select all that apply. The range of the confidence intervals for the slope coefficients increase. r 2 changes only slightly. The adjusted r 2increases. The p-values for the slope coefficients decrease.
The range of the confidence intervals for the slope coefficients increase.
True or False? The area under the curve for a complete probability density function is always equal to one
True
Which of the following are true about best subsets and stepwise regression? They may help identify why certain parameters matter. They are guaranteed to return the same set of candidate "best" models. They will capitalize on any spurious correlation present in the data, leading to concerns about over fitting. They can help sift through a large number of variables to build a predictive model.
They will capitalize on any spurious correlation present in the data, leading to concerns about over fitting. They can help sift through a large number of variables to build a predictive model.
If not properly modeled, auto-correlation in the dependent variable can cause violations of the independent errors assumption in regression
True
True or False? When performing out of sample validation, you must estimate your model using a subset of your data, and then compare your model's predictive accuracy on an entirely separate subset of your data.
True
Which of the following is true with respect to databases? Select all that apply. Validation rules help prevent erroneous data entry by defining the requirements for a valid entry. We cannot join tables unless a primary key / foreign key relationship has been specified between them. User profiles define what actions a user can take within a database, including which tables they can view and which they can edit. A primary key must be a single field that uniquely identifies each record.
Validation rules help prevent erroneous data entry by defining the requirements for a valid entry. User profiles define what actions a user can take within a database, including which tables they can view and which they can edit.
In our current model, the predicted impact of adding free breakfast (FreeBreakfast) on expected hotel profit (Profit) does not depend on whether the hotel has conference space (ConferenceSpace). Explain how we could change our model so that the effect of FreeBreakfast on expected Profit would depend on whether the hotel has conference space.
We would need to add an interaction between FreeBreakfast and Conference Space
What metric do we use to compare the in sample fit between two models, given that one of the models "nests" the other? -multiple r -adjusted r^2 -r^2 -p-value
adjusted r^2
Which variable type is the smallest? -text -numeric -binary -date
binary
As compared to worksheets, which of the following is not an inherent advantage of relational databases? -Reduces redundancy in data entry -Allows capabilities to be assigned by user or user type -integrates easily with other programs and applications -mitigates the risk of invalid data being created or stored -contains straight forward tools for visualizing and statistically analyzing patterns in the data
contains straight forward tools for visualizing and statistically analyzing patterns in the data
As noise increases, seasonality and cycles become more difficult to discover, but trend will be easier to find
false
Which one is NOT a trait of good models? -predict well out of sample -has an adjusted r^2 greater than 0.5 -has coefficients that are easy to interpret -uses the fewest possible independent variables to adequately predict the outcome
has an adjusted r^2 greater than 0.5
A primary key does which of the following? -identifies a unique record in a table -allows a user to access a given table -ensures that only certain types of information can be entered in a table -allows users to upload many records into a table simultaneously
identifies a unique record in a table
In linear regression, what are we doing to determine the parameter estimates for the best fit line? -minimizing the average difference between our observed and predicted values -minimizing the average of the residuals -minimizing the sum of the absolute values of the residuals -minimizing the sum of the squared residuals
minimizing the sum of the squared residuals