CAP questions answer (Certified Analytics)
10. You have simulated the net present value (NPV) of a decision. It ranges between -$10 million and +$10 million. To BEST present the likelihood of possible outcomes, you should: a)present a single NPV estimate to avoid confusion. b)present a histogram to show likelihood of various NPV ranges. c)trim all outliers to present the most balanced diagram. d)relax constraints associated with extreme points in the simulation.
10b
A company ships products from a single dock at their warehouse. The time to load shipments depends on the experience of the crew, products being shipped and weather. The company is considering building another dock in order to meet unmet demand. Which is the MOST appropriate modeling approach to determine if the revenue from the additional products sold will cover the cost of the second dock within two years of it becoming operational? a)Optimization because it is a transportation problem. b)Optimization because the company's objective to maximize profit and capacity at the dock is a limited resource. c)Forecasting because you can determine the throughput at the dock, calculate the net revenue and compare this with the cost of the new dock. d)Discrete event simulation because there are a sequence of discrete random events through time.
11d
12. Two investors who have the same information about the stock market buy an equal number of shares of a stock. Which of the following statements must be true? a)The risks for the two investors are statistically independent. b)Both investors have the same risk profile. c)Both investors are subject to the same uncertainty. d)If the investors are optimistic, they should have borrowed, rather than bought the shares.
12c
13. A project seeks to build a predictive data-mining model of customer profitability based upon a series of independent variables including customer transaction history, demographics, and externally purchased credit-scoring information. There are currently 100,000 unique customers available for use in building the predictive model. Which of the following strategies would reflect the BEST allocation of these 100,000 customer data points? a)Use 70,000 randomly selected data points when building the model, and hold the remaining 30,000 as a test dataset. b)Use all 100,000 data points when building the model. c)Build four separate models and randomly partition the data into 4 separate datasets of 25,000 data points each. d)Use 1,000 randomly selected data points when building the model.
13a
14. Conjoint analysis in market research applications can: a)give its best estimates of customer preference structure based on in-depth interviews with a small number of carefully chosen subjects. b)only trade off relative importance to customers of features with similar scales. c)allow calculation of relative importance of varying features and attributes to customers. d)only trade off among a limited number of attributes and levels.
14c
15. One of the main advantages of tree-based models and neural networks is that they: a)are easy to interpret, use, and explain. b)build models with higher R squared than other regression techniques. c)reveal interactions without having to explicitly build them into the model. d)can be modeled even when there is a significant amount of missing data.
15c
16. The monthly profit made by a clothing manufacturer is proportional to the monthly demand, up to a maximum demand of 1000 units, which corresponds to the plant producing at full capacity. (Any excess demand over 1000 units will be satisfied by some other manufacturer, and hence yield no additional profit.) The monthly demand is uncertain, but the average demand is reliably estimated at 1000 units. At this level of demand the monthly profit is $3,000,000. Which of the following statements must be true of the expected monthly profit, P? a)P can have any positive value. b)P is possibly greater than $3,000,000. c)P is equal to $3,000,000. d)P is less than $3,000,000
16d
17. After building a predictive model and testing it on new data, an under prediction by a forecasting system can be detected by its: a)negative-squared. b)bias. c)mean absolute deviation. d)mean squared error
17b
18. All times in the decision tree below are given in hours. What is the expected travel time (in hours) of the optimal (minimum travel time) decision? a)7.8 b)6.9 c)7.4 d)7.0
18d
19. An analytics professional is responsible for maintaining a simulation model that estimates system throughput given different staffing levels required for a specific operational business process.Assuming that the operational team always uses the number of staff determined by the model, which of the following is the MOST important maintenance activity? a)Ensure that all of the model input data items are available when needed. b)Determine if there has been a change in model accuracy over time. c)Ensure that all users are reviewing the model results in a timely fashion. d)Determine that the model's reports are understood by the users.
19b
1. Which of the following BEST describes the data and information flow within an organization? a)Information assurance b)Information strategy c)Information mapping d)Information architecture
1d
20. A segmentation of customers who shop at a retail store may be performed using which of the following methods? a)Monte Carlo Markov Chain and ANOVA b)Clustering, factor and control charts c)Decision tree and recursive function analyses d)Clustering and decision tree
20d
22. Each month you generate a list of marketing leads for direct mail campaigns. Which of the following should you do before the list is used? a)Exclude people who were on the list the previous month. b)Retain x% of the leads as control for performance measurement. c)Remove opt-outs. d)Exclude people who were never on the list.
22c
23. When analyzing responses of a survey of why people like a certain restaurant, factor analysis could reduce the dimension in which of the following ways? a)Collapse several survey questions regarding food taste, health value, ingredients and consistency into one general unobserved "food quality" variable. b)Condense similar survey respondent answers into clusters of like-minded customers for market segment analysis. c)Reduce the variability of individual subject ratings by centering each respondent's ratings around his or her average rating. d)Decrease variability by analyzing inter-rater reliability on the question items before offering the survey to a wide number of respondents
23a
24. A preferred method or best practice for organizing data in a data warehouse for reporting and analysis is: a)transactional-based modeling. b)multidimensional modeling. c)relation-based modeling. d)tuple-based modeling
24b
2. A multiple linear regression was built to try to predict customer expenditures based on 200 independent variables (behavioral and demographic). 10,000 rows of data were fed into a stepwise regression, each row representing one customer. 1,000 customers were male, and 9,000 customers were female. The final model had an adjusted R-squared of 0.27 and seven independent variables. Increasing the number of rows of data to 100,000 and rerunning the stepwise regression will most likely: a)have no impact upon the adjusted R-squared. b)increase the impact of the male customers. c)change the heteroskedasticity of the residuals in a favorable manner. d)decrease the number of independent variables in the final model.
2a
3. A clothing company wants to use analytics to decide which customers to send a promotional catalogue in order to attain a targeted response rate. Which of the following techniques would be the most appropriate to use for making this decision? a)Integer programming b)Logistic regression c)Analysis of variance d)Linear regression
3b
4. Which of the following is an effective optimization method? a)Analysis of variance (ANOVA) b)Generalized linear regression model (GLM) c)Box-Jenkins Method (ARIMA) d)Mixed integer programming (MIP)
4d ============================= *ANOVA* - one way analysis of variance, --> to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups. - What is the ANOVA used for? The one-way ANOVA compares the means between the groups you are interested in , and determines whether any of those means are *statistically significantly different* from each other. e.g. you have 3 groups of running, one is start low then fast, one is start fast then slow speed, the other is steady pace. Is there any significant difference between 3 groups? =============================
5. A box and whisker plot for a dataset will MOST clearly show: a)the difference between the second quartile and the median. b)the 90% confidence interval around the mean. c)where the [actual-predicted] error value is not zero. d)if the data is skewed and, if so, in which direction
5d
6. In the kickoff meeting with a client for a new project, which of the following is the MOST important information to discuss? a)Timeline and implementation plan b)Analytical model to use c)Business issue and project goal d)Available budget
6c
7. Which of the following statements is true of modeling a multi-server checkout line? a)A queuing model can be used to estimate service rates. b)A queuing model can be used to estimate average arrivals. c)Variability in arrival and service times will tend to play a critical role in congestion. d)Poisson distributions are not relevant.
7c
A company is considering designing a new automobile. Their options are a design based on current gasoline engine technology or a government proposed "Green" technology. You are a government official whose job is to encourage automakers to adopt the "Green" technology. You cannot provide funding for development costs, but you can provide a subsidy for every car sold. The development costs and the wholesale price, in thousands of dollars, of the cars are shown in the table below: ----------------------------------------------------- Gasoline Technology(numbers in $ thousands) || "Green" Technology(numbers in $ thousands) -------------------------------------------------- Wholesale Price/vehicle || 25 ||40 Variable Cost/vehicle || 15 || 35 Fixed Cost || 100,000 || 200,000 ------------------------------------------------------- How large a subsidy per vehicle sold will be required, assuming there will be enough demand to motivate the switch? a)Greater than $5000 b)Less than $5000 c)Cannot be determined d)Equal to $5000
8a
9. A furniture maker would like to determine the most profitable mix of items to produce. There are well-known budgetary constraints. Each piece of furniture is made of a predetermined amount of material with known costs, and demand is known. Which of the following analytical techniques is the MOST appropriate one to solve this problem? a)Optimization b)Multiple regression c)Data mining d)Forecasting
9a
What is autoregressive?
A statistical model is autoregressive if it predicts future values based on past values (i.e., predicting future stock prices based on past performance).