CAP Study Questions ALL

Ace your homework & exams now with Quizwiz!

Q 55: the mean of exponential distribution with lambda equal to six is A) 3 B) .6 C) .167 D) 720.

C) .167 Mean of exponential distribution is one divided by lambda.

Q 80: from architectural point of view, which of these is not a data warehouse model? A) Enterprise warehouse. B) data Mart C) physical warehouse. D) virtual warehouse

C) physical warehouse. There is no such physical warehouse architecture for a data warehouse.

Q 108: which of the following is not a survey reliability testing method? A) Cronbach alpha B) split half method. C) test retest. D) Shapiro Wilk test

d) Shapiro Wilk Test Explanation Cronbach alpha, Split-half method and test-retest are common methods to test survey reliability. Shapiro Wilk test is used to check normality of the data. (Source: https://notendur.hi.is/adg11/Proffraedi/Reliability %20of%20measurement%20scales.pdf)

Q 28: most appropriate symptom to conclude that multicollinearity exists in a regression model is: A) low pairwise correlations among regressor's B) R squared obtained from auxiliary regression is lower than overall R squared C) condition index is less than 10. D) there is a large VIF value.

D) there is a large VIF value. There exist high pairwise correlation; R squared from auxiliary regression is higher; condition index is very large, generally greater than 30; and large VIF, are some of the indicators of multicollinearity in the model.

Q182 : Net working capital is based on a ) Current Assets b ) Current Liabilities c ) Both Current Assets and Liabilities d ) Fixed Capital

Q182: Correct Answer c) Both Current Assets and Liabilities Explanation Net working capital is calculated as Current Assets - Current liabilities. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 7

Q241: Genetic algorithm, Tabu search, and ant colony optimization are examples of optimization algorithms that are inspired by natural phenomenon and are examples of the following type of analytics methodologies a) Metaheuristics b) Simulation c) Pattern recognition d) Visualization

Q241: Correct Answer a) Metaheuristics Explanation Metaheuristics algorithms are based on nature and natural phenomena like movement of bees, ant, BAT, swarm, etc. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q146: State transition probabilities in the Markov chain should a) sum to zero b) sum to one c) be less than one d) be greater than one

0146: Correct Answer b) sum to one Explanation The sum of state transition probabilities in the Markov chain is always equal to one. In the square matrix, the sum of all the elements in a row represents probabilities for one state and are summed up to one. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q120 : When total supply is equal to the total demand in a transportation problem , the problem is said to be

Q120: Correct Answer d) balanced Explanation Balanced transportation problem will always have the total supply equal to the total demand. If it is not then the problem is called unbalanced problem. Degenerate problem has no feasible solution. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 2

Q161 : Influence Diagram is majorly used for a ) Optimization b ) Simulation c ) Decision Making d ) Dynamic Programming

Q161: Correct Answer c) Decision Making Explanation Influence diagram is a visual chart having decision nodes and their branches to help the analyst in taking decisions. Common node shapes are rectangle, circle, diamond, etc. (Source: Quantitative Methods for Business by Anderson, Sweeney, Williams, Cengage learning) CAP Domain: 1

Q162 : Which of the following is NOT a quality control technique ? a ) Benchmarking b ) Just - in - time System c ) Six Sigma d ) Delphi Method

Q162: Correct Answer d) Delphi Method Explanation Important quality control techniques are benchmarking, six sigma, Just-in-time, team building and re-engineering. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 7

Q163 : Cause and Effect analysis is BEST represented by a ) Q - Q plot b ) Fishbone Chart c ) Flowchart d ) Influence diagram

Q163: Correct Answer b) Fishbone Chart Explanation Fishbone diagram is used to depict the cause and effects for a particular problem and is represented in the form of a fish skeleton. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 1

Q243: How often should model maintenance be done? a) When underlying assumptions change b) When it is ported to a new system c) When the data it uses changes its format d) When it is transferred to a new owner

Q243: Correct Answer a) When underlying assumptions change Explanation While maintenance is continual over the life of a model, maintenance is required when the underlying assumptions change. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 7

Q245: If an event cannot take place, then its probability will be? a) 1 b) 0 c) -1 d) Infinity

Q245: Correct Answer b) 0 Explanation If an event can't take place, then its probability will always be zero. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 3

Q 61: which of the following is true for a normal distribution? A) it is unimodal. B) it is bi modal. C) it is multimodal. D) it is asymmetrical.

A) it is unimodal. Normal distribution has only one mode and that's why it is unimodal.

Q103: skewness measure of a normal distribution is. A) 1 B) infinite. C) 0 D) 2

Q103: Correct Answer c) Zero Explanation Normal distribution is symmetric in nature and hence the measure of skewness is zero in this case. (Source: Statistics for Business and Economics by Anderson, Sweeney, Williams, Cengage Learning) CAP Domain: 3

Q 50: for which type of decisionmaker is the following statement true - on the utility curve, as the monetary value increases, the utility increases at an increasing rate A) risk seeker. B) risk avoider C) risk in difference. D) utility curve cannot define risk-taking ability.

A) risk seeker. Utility curve is a graph that reveals the relationship between utility and monetary values. When this curve has been constructed, utility values from the curve can be used in the decision-making process. Increasing rate of utility is for risk seekers, in decreasing rate of utility is for risk avoiders

Q 45: which of these is most relevant to the application of discrete event simulation? A) solving single server queuing system. B) managing inventory model in which the stock is inspected only once a week. C) predicting the house rent based on house characteristics like number of rooms, size of house, etc. D) comparing the before, and after impact of a TV advertisement on brand recall of a product for the consumers.

A) solving single server queuing system. Inventory model is example of only discrete simulation and not discrete event simulation; house rent prediction can be done using multiple linear regression; TV ad impact can be compared using paired t-test statistic.

Q 90: which of the following nonparametric tests is an alternative to ANOVA test? A) Mann-Whitney. B) Kruskal-Wallace. C) Willcoxon signed rank. D) sign test

B) Kruskal-Wallis Mann-Whitney is an alternative to two sample t test. Also known as a U test or Wilcoxen test. Sign test is alternative to a one sample t test. Kruskal-Wallace is an alternative to ANOVA.

Q 30: equal variance of error. Terms in a regression model is termed as: A) multicollinearity. B) homoscedasticity. C) heteroscedasticity. D) autocorrelation.

B) homoscedasticity. Unequal variance in error terms is known as heteroscedasticity, while equal variance of error terms is known as homoscedasticity. If regressors are perfectly correlated to each other, it is known as multicollinearity.

Q 62: when is the normal distribution referred to as standard normal distribution? A) mu = 1, sigma = 0 B) mu = 0, sigma = 1 C) mu = 0, sigma = 0 D) mu = 1, sigma = 1

B) mu = 0, sigma = 1 Based on the moment, generating function of the normal distribution, standard normal form can be derived as a special case when mu = 0, sigma = 1.

Q 15: nonparametric statistics is most probably based on the following assumption. A) the confidence interval is very narrow. B) the population is distribution free. C) the population is normally distributed. D) the standard deviation of the sample is close to zero.

B) the population is distribution free. Nonparametric statistics are based on fewer assumptions about the population, and the parameters, then are parametric statistics. They are sometimes referred to as the distribution free statistics, as the population is not assumed to be having any distribution, which is not the case with parametric test were population is generally assumed to have normal distribution.

Q 59: which of the following is a rectangular distribution? A) binomial. B) uniform C) normal. D) Poisson

B) uniform. The area under uniform distribution is that of a rectangle, which is why it is also referred to as a rectangular distribution.

Q 16: suppose five weeks of average prices for a stock are 57, 68, 64, 71, and 62 with a standard deviation of 4.84. What is the coefficient of variation (CV) for this stock? A) 2.50% B) 5.00%, C) 7.50% D) 10.00%.

C) 7.50% CV is defined as (standard deviation/mean)*100. so, (4.84/64.4)*100 = 7.5%.

Q 5: Which of the following decision tree algorithm is based on chi-squared statistical test? A) cart. B) C4.5 C) CHAID D) ID3.

C) CHAID. CHAID is chi-squared automatic interaction detector based on statistical test, chi-Square. It is popularly used in the marketing field.

Q 63: exponential distribution has the following failure rate? A) increasing. B) decreasing. C) constant. D) first increasing then decreasing.

C) constant. Exponential distribution is one of the rare distributions which has a constant failure rate.

Q 13: SVM is a type of algorithm for which of the following categories? A) artificial intelligence B) simulation. C) machine learning. D) evolutionary algorithms.

C) machine learning. SVM and random forest or a type of machine learning algorithms

Q 37: consider the following data set and suggest the most appropriate nature of the data. Data: [1, 2, 3, 2, 5, 6, 3, 1, 8, 9, 3, 2, 1, 7, 6, 8, 9, 10] A) unimodal B) bimodal. C) multimodal. D) no modal

C) multimodal. There are three terms having highest frequency in the data set, therefore, it is multi modal in nature.

Q 83: how large should the sample size (n) be for central limit theorem to hold? A) n > 100 B) n > 50 C) n > 30 D) n > 5

C) n > 30 As per the central limit, theorem, sufficiently large sample size has been defined as anything greater than or equal to 30.

Q 94: parametric statistics cannot be applied on which type of data? A) interval level data. B) ratio level data. C) nominal level data. D) scalar data.

C) nominal level data. Parametric statistics can be applied on the data type which is of continuous type. Nominal level is categorical in nature. Scaler data can be considered as both continuous and categorical, depending on use cases.

Q 12: A tailor wants to apply analytical models to lower down the waste cloth when he makes garments out of big cloth rolls. The data for the past five years is available in the form of a customer order sheet (having dimensions, color, size, pattern, details, etc.). The most obvious approach used for this should be: A) forecasting the demand of orders generated in the future. B) using the machine learning techniques to find out significant predictors of stock management. C) performing statistical analysis on available data sets and applying optimization models to reduce cloth waste. D) using simulation to find the probability distribution for cloth, used in garments and waste generated.

C) performing statistical analysis on the available data set an applying optimization models to reduce cloth waste. The problem does not require to predict the orders generated in the future, or finding significant predictors of stock management, or even simulating the set up for waste management. The usage of bigger rolls and making different types of garments and how the sequencing should be done needs to be analyzed, which can be done using statistical analysis on past order sheets and applying optimization models.

Q 89: which of the following nonparametric tests is an alternative to a one sample t-test? A) Mann-Whitney. B) Kruskal-Wallace. C) Willcoxon signed rank. D) sign test

D) Sign test Mann-Whitney is an alternative to two sample t test. Also known as a U test or Wilcoxen test. Sign test is alternative to a one sample t test. Kruskal-Wallace is an alternative to ANOVA.

Q119 : For a project management , in time cost - trade off function analysis a ) cost decreases linearly as time increases b ) cost at normal time is zero c ) cost increases linearly as time increases d ) cost cannot be determined Compa

Q119: Correct Answer a) cost decreases linearly as time increases Explanation Trade-off functions always have a diminishing effect of one dimension on the other. For project management, cost decreases linearly as time increases as per the time cost-trade-off function analysis. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 7

Q121 : If an opportunity cost value is used for an unused cell to test optimality in transportation problem , it should be a ) equal to zero b ) most negative number c ) most positive number d ) any value

Q121: Correct Answer b) most negative number Explanation Most negative number is used as an opportunity cost value in for an unused cell to test optimality as the value will not have impact on existing solutions.

Q123 : Every basic feasible solution of a general assignment problem , having a square pay - off matrix of order ' n ' , should have assignments equal to a ) 2n + 1 b ) 2n - 1 c ) n ^ 2 d ) 2n

Q123: Correct Answer b) 2n - 1 Explanation In transportation problem, there are (m + n) constraints, one for each source of supply, and distinction and m x n variables. Since all (m + n) constraints are equations, and since the transportation model is always balanced (total supply = total demand), one of these equations is extra (redundant). The extra constraint equation can be derived from the other constraint equations, without affecting the feasible solution. It follows that any feasible solution for transportation problem must have exactly (m + n - 1) non-negative basic variables or allocations. Assignment problem is a special case of transportation problem. So replacing m=n, we get 2n-1 as the correct answer (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q126: If there are 'n' jobs to be performed, one at a time, on each of the 'm' machines, the possible sequences would be a) (n!)^m b) (m!)^n c) (n)^m d) (m)^n

Q126: Correct Answer a)(n!)^m Explanation If there are n jobs to be performed, one at a time, on each of m machines and the actual or expected time required by the jobs on each of the machines is also given, then the general sequencing problem is to find the sequence out of the (n!)^m possible sequences, which minimize the total elapsed time between the start of the job in the first machine and the completion of the last job on the last machine.

Q127: Which of the following is NOT a characteristic of Linear Programming? a) Resources must be limited b) Only one objective function c) Parameters value remains constant during the planning period d) The problem must be of minimization type

Q127: Correct Answer d) The problem must be of minimization type Explanation Linear Programming has only one objective function which can be either minimization or maximization problem. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q183 : The variability of which of the following metric is a prime indicator of the business risk for a firm ? a ) Net Present Value b ) Net Sales c ) Operating Profit Margin d) Operating Profit

Q183: Correct Answer c) Operating Profit Margin Explanation Operating profit margin is operating profit divided by Net sales. The variability of Operating Profit Margin is a prime indicator of the business risk for a firm. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 1

Q186 : For any analytical project , what should be the FIRST step of the project ? a ) Understanding the business objectives b ) Identifying key stakeholders c ) Assessing the various Business Units related to the project d ) Finding the internal sponsor of the project

Q186: Correct Answer a) Understanding the business objectives Explanation Understanding the business objective is the first stage of any project. Once objectives are clearly formulated, then only the other things come into consideration. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 1

Q197 : Reframing of the business problem is required majorly due to a ) Investment need changes continuously b ) Survival of business due to changes in market and technology c ) Change in the management team d ) Lack of resources for the project execution

Q197: Correct Answer b) Survival of business due to changes in market and technology Explanation Reframing problems is not a luxury. On the contrary, all companies need to continually reframe their businesses in order to survive as markets and technology change. Framing and reframing of problems also opens up the door to innovative new ventures. (Source: Stanford Blog, https://stvp.stanford.edu/blog/shift-your-lens-the-power-of-re-framing-problems) CAP Domain: 7

Q200 : Which is the best way to communicate with client ? a ) Through email b ) Through Whatsapp c ) Through telephonic call d ) Through VoIP

Q200: Correct Answer a) Through email Explanation Written communication through formal channel with client is most preferred way of communication. VoIP should be used for presentations but formal communication is best through email. (Source: Understanding the Predictive Analytics Lifecycle by Alberto Cordoba, Wiley Publication) CAP Domain: 1

Q201 : Which of the following BEST describes the data and information flow within an organization? a ) Information assurance b ) Information strategy c ) Information mapping d ) Information Architecture

Q201: Correct Answer d ) Information Architecture Explanation Information architecture refers to the analysis and design of the data stored by information systems, concentrating on entities, their attributes, and their interrelationships. It refers to the modeling of data for an individual database and to the corporate data models that an enterprise uses to coordinate the definition of data in several (perhaps scores or hundreds) distinct databases. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q203 : A clothing company wants to use analytics to decide which customers to send a promotional catalogue in order to attain a targeted response rate . Which of the following techniques would be the most appropriate to use for making this decision? a ) Integer programming b ) Logistic regression c ) Analysis of variance d ) Linear regression

Q203: Correct Answer b) Logistic regression Explanation This type of classification model is often used to predict the outcome of a categorical dependent variable (response vs. no response) based on one or more predictor variables, so this is the most appropriate answer. The goal of the analytics in the stated problem is to determine who is most likely to respond, and the binary nature of this predicted outcome is provided by logistic regression. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q209 : A furniture maker would like to determine the most profitable mix of items to produce . There are well - known budgetary constraints . Each piece of furniture is made of a predetermined amount of material with known costs , and demand is known . Which of the following analytical techniques is the MOST appropriate one to solve this problem? a ) Optimization b ) Multiple regression c ) Data mining d ) Forecasting

Q209: Correct Answer a) Optimization Explanation The problem statement describes an optimization problem: the furniture maker's objective function is to maximize his profit. The decision variables are the amount of each item to produce, and the constraints are that he must meet demand and be within his budget. Optimization is the most appropriate technique to solve this problem. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q214 : Conjoint analysis in market research applications can a ) give its best estimates of customer preference structure based on in - depth interviews with a small number of carefully chosen subjects b ) only trade off relative importance to customers of features with similar scales c) allow calculation of relative importance of varying features and attributes to customers d) only trade off among a limited number of attributes and levels

Q214: Correct Answer c) allow calculation of relative importance of varying features and attributes to customers. Explanation Conjoint analysis by definition maps consumer preference structures into mathematical tradeoffs, and was designed to allow a marketer to compare the relative utility of varying features and attributes. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CaP Domain: 2

Q216: The monthly profit made by a clothing manufacturer is proportional to the monthly demand, up to a maximum demand of 1000 units, which corresponds to the plant producing at full capacity. (Any excess demand over 1000 units will be satisfied by some other manufacturer, and hence yield no additional profit.) The monthly demand is uncertain, but the average demand is reliably estimated at 1000 units. At this level of demand the monthly profit is $3,000,000. Which of the following statements must be true of the expected monthly profit, P? a) P can have any positive value. b) P is possibly greater than $3,000,000. c) P is equal to $3,000,000 d) P is less than $3,000,000

Q216: Correct Answer d) P is less than $3,000,000 Explanation When the demand is 1000 or greater, the profit is $3,000,000. But when the demand is less than 1000, the profit is less than $3,000,000. Given this and that the average demand is 1000 units, the expected monthly profit must be less than $3,000,000. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q224: A preferred method or best practice for organizing data in a data warehouse for reporting and analysis is a) transactional-based modeling b) multi-dimensional modeling c) relation-based modeling d) tuple-based modeling

Q224: Correct Answer b ) multi-dimensional modeling Explanation Multi-dimensional modeling is the optimum way to organize data in a data warehouse for analysis. It is associated with OLAP (On-line Analytical Processing). OLAP data is organized in cubes that can be taken directly from the data warehouse for analysis. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q240: You are given a problem by a client in which you need to determine the right amount to be purchased from what location so the total cost of manufacturing, transportation, and duties is minimized. The first methodology to come in mind to model this problem is a) Step-wise regression b) Mixed-integer programming c) Linear programming d) Logistic regression

Q240: Correct Answer b) Mixed-integer programming Explanation Regression will predict dependent variable based on independent variables while based on constraints it has to be Mixed-integer programming (MIP). (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q247: If small orders are placed frequently, instead of placing large orders infrequently, then total inventory cost is a) reduced b) increased c) either reduced or increased d) minimized

Q247: Correct Answer c) either reduced or increased Explanation Frequency of small orders may increase o inventory cost depending on the price and related to the product. (Source: Operation Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 6

Q248: Queue can form only when a) arrivals exceed service capacity b) arrivals equals service capacity c) service facility is capable to serve all the arrivals at a time d) there are more than one service facilities

Q248: Correct Answer a) arrivals exceed service capacity Explanation Arrivals can accumulate if they are not provided service on time and thus queues can be formed. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 5

Q128: Which of the following is NOT a limitation of Linear programming based model? a) The decision variables can solve only maximization problem b) The relationship among decision variables is linear c) No guarantee to get integer valued solutions d) No consideration of effect of time and uncertainty on the model

a) The decision variables can solve only maximization problem Explanation Linear Programming has only one objective function which can be either minimization or maximization problem. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q 112: which of these is not used for post hoc analysis after conducting ANOVA or MANOVA? A) Tukey, LSD B) Tukey HSD C) Newman Keuls D) Mann-Whitney.

d) Mann-Whitney Explanation Mann-Whitney test is non-parametric version of t-test and is not used for post-hoc analysis after ANOVA or MANOVA. (Source: Multivariate Data Analysis by Hair, Black, Babin, Anderson and Tatham, Pearson) CAP Domain: 5

Q 14: supervised and unsupervised learning techniques are part of A) machine learning techniques. B) simulation techniques. C) data visualization techniques. D) group communication techniques.

A) machine learning techniques. Supervised and unsupervised learning are two categories of machine learning algorithms.

Q 88: which of the following nonparametric tests is an alternative to two sample t-test? A) Mann-Whitney. B) Kruskal-Wallace. C) Willcoxon signed rank. D) sign test

A) Mann-Whitney Mann-Whitney is an alternative to two sample t test. Also known as a U test or Wilcoxen test. Sign test is alternative to a one sample t test. Kruskal-Wallace is an alternative to ANOVA.

Q 27: existence of a perfect or exact linear relationship among some or all explanatory variables of a regression model is termed as: A) multi-collinearity B) homoscedasticity. C) heteroscedasticity. D) autocorrelation.

A) Multi-collinearity If regressors are perfectly correlated to each other, then there is multicollinearity in the regression model.

Q 58: given N number of trials, P as the probability of success of one trial, and Q as the probability of failure of one trial, what will be the mean of the binomial distribution? A) NP V) N/P C) NPQ D) N/PQ

A) NP Mean is given by NP and variance is given by in PQ in binomial distribution

Q 38: which of the following is most appropriate relation for the data having symmetric distribution? A) mean = median = mode B) mean > median > mode C) mean < median < mode. D) mean > median < mode

A) mean = median = mode For symmetric distribution, approximately mean = medium = mode, and all are present in the center of the distribution.

Q 43: what is the shape of the normally distributed data? A) bell curve B) U curve C) T curve D) sinusoidal wave

A) bell curve Normally distributed data assumes a bell shaped curve with mean, median, and mode being placed, approximately at the center of the distribution.

Q 91: cross tabulation generally holes which of the following? A) frequency of the data. B) actual value of the data. C) mean of the data D) standard deviation of the data

A) frequency of the data. Cross tabulations are generally carried out on categorical data on two or more dimensions to store the frequency of the data.

Q 96: what is an OGIVE for data visualization? A) graph of cumulative distribution. B) graph showing relationship between two variables. C) graph showing quartiles of the data. D) graph showing a box and whisker plot.

A) graph of cumulative distribution. OGIVE is a graph of a cumulative distribution showing data value on the horizontal axis, and either the cumulative frequencies, or the cumulative relative frequencies, or the cumulative percent frequencies on the vertical axis.

Q 11: An online retail company wants to deliver products to their customers as per the following constraints - (A) there are limited numbers of courier staff (B) the courier staff can have limited weight to be carried in a single day (C) there is priority associated with every order (D) maximum number of orders must be delivered on time. Which of the following techniques would be best suited for solving this problem? A) linear programming optimization. B) Monte Carlo simulation. C) multiple linear regression. D) CHAID classification tree model

A) linear programming optimization. This is a clear case of optimization where delivery of the orders needs to be maximized with given constraints of quantity that can be delivered with their given priorities. No simulation of the situation is required. No prediction model needs to be generated to predict the orders delivered and nothing needs to be classified to see order has been delivered or not. It is a case of linear programming.

Q4: Which of the following is not a smoothing method for timeseries data? A) logarithmic. B) exponential. C) moving averages. D) weighted moving averages

A) logarithmic. Three popular smoothing methods are moving averages, weighted moving averages, and exponential smoothing. The smoothing methods are appropriate for a stable timeseries i.e. one that exhibits no significant trend, cyclical, or seasonal effects, because they adapt well to changes in the level of the time series.

Q 3: Which component of the time series results in periodic above trend and below train behavior of the time series lasting more than one year? A) trend component B) cyclical component. C) seasonal component. D) irregular component.

B) cyclical component. Irregular component is random pattern; seasonal component shows a periodic pattern over one year or less; trend is the long run shift or movement in time series observable over several periods of time.

Q 2: Collaborative filtering is a type of A) image editing process. B) data mining based recommender system. C) data cleaning tool. D) team building activity.

B) data mining based recommender system. Collaborative filtering is a type of data mining based recommender system in which user similarity is measured based on their transactions, and an item is recommended based on that user similarity. It is generally used on online retail websites like Amazon.

95: a new company wants to enter electronics industry. As per their CEO, the first step is to understand the market trends for sale of electronic goods in different regions, and across different product categories. What should be the most appropriate set of analytical techniques adopted by them for this step? A) prescriptive analytics B) descriptive analytics C) predictive analytics. D) operations research.

B) descriptive analytics. The first step is to describe the data collected from various sources, therefore, descriptive analytics should be applied at the first stage. No predictions or prescriptions are being done at the first stage. No operation strategy is required.

Q49: for which type of decisionmaker is the following statement true - on the utility curve, as the monetary value increases, the utility increases at a decreasing rate. A) risk seeker. B) risk avoider C) risk indifference. D) utility curve cannot define risk-taking ability.

B) risk avoider Utility curve is a graph that reveals the relationship between utility and monetary values. When this curve has been constructed, utility values from the curve can be used in the decision-making process. Increasing rate of utility is per risk seekers, and decreasing rate of utility is for risk avoiders.

Q 77: which of these is the most suitable difference between online analytical processing (OLAP), and online transaction processing (OLTP)? A) OLAP is highly detailed while OLTP is consolidated. B) OLTP functions for long-term informational requirements. Well OLAP is used for day-to-day operations. C) OLTP has thousands of user support. While OLAP has just a few hundred user support. D) 0LTP is analyzed on query throughput while OLAP is assessed on transaction throughput

C) OLTP has thousands of user support while 0LAP has got a few hundred user support. Since OLAP is consolidated and used for long term informational requirements, it requires much less users. The mid and top layer of management uses OLAP well bottom layer uses OLTP.

Q 53: a process control chart, the tracks the range within a sample and indicates that a gain or loss of uniformity has occurred in a production process is. A) C chart B) P chart C) R chart D) X chart

C) R chart A quality control chart that is used to control the number of defects per unit of output is a C chart. A quality control chart that is used to control attributes is P chart. A process control chart that tracks the range within a sample; indicates that heat gain or loss of uniformity has occurred in the production process is R chart. A quality control chart for variables that indicates when changes occur in the central tendency of a production process is an X chart.

Q 93: which of the following is nonmetric data? A) ordinal level. B) nominal level. C) both. D) none.

C) both. Those are categorical variables, and our hence known as nonmetric data types.

Q 44: which of these is not true for difference between system dynamics (SD) and discrete event simulation (DES)? A) SD model C behavior of systems using differential equations while DES models using a simulation clock that advances time in fixed increments. B) SD model attempts to capture all of the aspects of a process within a close system, while DES models more often reflects systems where entities are processed in a linear fashion. C) DES models are used when the goal is a statistically valid estimate of system performance; SD is more often the tool of choice for a training vehicle, D) a major part of the DES modeling effort is associated with capturing the mental models, while SD models are often built from a process map, or flow chart

D) a major part of the DES modeling effort is associated with capturing the mental models, while SD models are often built from a process map, or flow chart. A major part of the SD modeling effort is associated with capturing the mental models, while, DES models are often built from a process map, or flow chart.

Q179 : Based on Net Present Value ( NPV ) , the project is definitely rejected when a ) NPV > 0 b ) NPV = 0 c ) NPV < 0 d ) NPV - 100

0179: Correct Answer c) NPV < 0 Explanation When NPV is less than zero, it is definitely a scenario of loss making. At NPV equal to zero, the project may or may not be accepted. NPV greater than zero is mostly accepted unless it is not feasible to execute the project. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 1

Q 66: what happens to the width of the confidence interval as we increase the confidence level of our study? (E.g., change in confidence level from 90% to 95%) A) width increases B) width decreases C) width remains constant. D) first increases and decreases.

A) width increases Rise on confidence level increases the width of the confidence interval as more items can be included in the interval now

Q 79: how many dimensions of data can be mapped to the data cube in OLAP? A) 1 B) 2 C) 3 D) any

D) any Datacube can model n-dimensional data.

Q220: A segmentation of customers who shop at a retail store may be performed using which of the following methods? a) Monte Carlo, Markov Chain and ANOVA b) Clustering, factor and control charts c) Decision tree and recursive function analysis d) Clustering and decision tree

Q 220: correct answer d ) clustering and decision tree. Explanation Customer segmentation consists of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, e.g., age, gender, interests, spending habits and so on. The purpose of customer segmentation is to allow a company to target specific groups of customers effectively and allocate marketing resources to best effect. Two ways to do this segmentation are clustering and decision trees. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q102: the most important challenge for an ETL tool is. A) to handle big data. B) to create prediction models. C) to collect data from different sources. D) to create a metadata repository.

Q102: Correct Answer c) To collate data from different sources Explanation ETL tool initially extracts the data and doing it from multiple sources is their biggest challenge. ETL tools are now proficient enough to handle big data or creating metadata repository. Creating prediction models is never a primary role of ETL tool. (Source: Kimball & Casserta http://users.itk.ppke.hu/~szoer/DW/Kimball%20& %20Caserta%20-The%20Data%20Warehouse%20ETL %20Toolkit%20%5BWiley%202004%5D.pdf)

Q104: a detergent company wants to statistically compare the satisfaction level of their product across three cities, through a survey conducted in the cities on product satisfaction, being measured on a scale of 1 to 5, respectively. Which is the most appropriate method to be used for this purpose? A) ANCOVA B) ANOVA C) MANOVA D) MANCOVA.

Q104: Correct Answer b) ANOVA Explanation ANOVA is used for measuring the equality of three or more population means using data obtained from an observational study. In this case, there is a scalar variable being measured, which can be compared across three cities using ANOVA. (Source: Statistics for Business and Economics by Anderson, Sweeney, Williams, Cengage Learning) CAP Domain: 4

Q105: as per the following figure, which is the most appropriate comment on bias variance trade-off? There is an image of a bull's-eye with multiple arrows very close to the bull's-eye clustered close together. A) hi bias hi variance B) high bias low variance. C) low bias low variance. D) low bias hi variance.

Q105: Correct Answer d) Low Bias High Variance Explanation The model is predicting values close to the desired values but the variability is high as the predicted values are scattered. (Source: An Introduction to Machine Learning http://web.ipac.caltech.edu/staff/fmasci/home/ astro_refs/ML_inR.pdf) CAP Domain: 5

Q106: an ideal prediction model developed using machine learning algorithms should have. A) hi bias hi variance B) hi bias low variance. C) low bias low variance. D) low bias hi variance.

Q106: Correct Answer c) Low Bias Low Variance Explanation The ideal model should predict values close to the desired values and the variability should be low as the predicted values must remain as close as possible to the actual values. (Source: An Introduction to Machine

Q107: which of the following is a quantitative survey reliability testing method? A) Little's MCAR test B) Cronbach alpha C) Durbin Watson D) Shapiro Wilk test

Q107: Correct Answer b) Cronbach Alpha Explanation Cronbach alpha is a common method to test survey reliability. MCAR test is used for Random Value analysis. Durbin Watson is used to check auto-correlation. Shapiro Wilk test is used to check normality of the data. (Source: https://notendur.hi.is/adg11/Proffraedi/ Reliability%20of%20measurement%20scales.pdf) CAP Domain: 3

Q 109: an experiment has gotten three groups in independent variables and there are three dependent variables. Which test is most suited for this experiment? A) ANOVA B) MANOVA C) T test. D) Hotelling T-square.

Q109: Correct Answer b) MANOVA Explanation T-Test used for two groups and one dependent variable. ANOVA is used for two or more groups and one dependent variable. Hoteling T-Square is used for two groups and two or more dependent variable. MANOVA is used for multiple groups and multiple dependent variables. (Source: Multivariate Data Analysis by Hair, Black, Babin, Anderson and Tatham, Pearson) CAP Domain: 4

Q110: in an experiment involving ANCOVA analysis, there were 100 respondents and five groups given. What is the maximum number of covariance possible for this experiment? A) 9 B) 8 C) 7 D) six.

Q110: Correct Answer d) 6 Explanation Maximum no of covariates = (0.10*Sample Size) - (Number of Groups - 1) = (0.10*100)-(5-1)=6

Q 111: which of the following is not an assumption of MANOVA analysis? A) observations must be independent. B) the residuals must follow normal distribution. C) Variance covariance matrices must be equal for all treatment groups. D) the set of dependent variables must follow a multi variant normal distribution.

Q111: Correct Answer b) The residuals must follow normal distribution Explanation There is no concept of residuals in MANOVA analysis. (Source: Multivariate Data Analysis by Hair, Black, Babin, Anderson and Tatham, Pearson) CAP Domain: 4

Q 113: a marketer from a soap company, wants to assess the acceptance of their products feature like cleaning ingredients, form of product, and brand name. The marketer constructs the respondence preference structure, depicting how the different levels within a factor influence the information of an overall performance. Which of the following is most suitable technique for the marketer to decide about the features of the product? A) conjoint analysis B) regression analysis. C) discriminant analysis. D) analysis of variances

Q113: Correct Answer a) Conjoint Analysis Explanation Conjoint analysis is a multivariate technique developed specifically to understand how respondents develop preferences for any type of object. It is based on the simple premise that consumers evaluate the value of an object by combining the separate amounts of value provided by each attribute. (Source: Multivariate Data Analysis by Hair, Black, Babin, Anderson and Tatham, Pearson) CAP Domain: 2

Q 114: confirmatory factor analysis is generally used under which of the following techniques? A) multi dimensional scaling B) conjoint analysis. C) structural equation modeling. D) discriminant analysis.

Q114: Correct Answer c) Structural Equation Modelling Explanation Confirmatory Factor Analysis (CFA) is usually used for SEM. (Source: Multivariate Data Analysis by Hair, Black, Babin, Anderson and Tatham, Pearson) CAP Domain: 4

Q 115: what is the expected value (EV) of node one? A) 62.5 B) 167.5 C) 222.5 D) 155.5.

Q115: Correct Answer d) 155.5 Explanation EV(Node 2) = 0.75*50 + 0.25*100 = 62.5; EV(Node 3)=0.05*500 + 0.95*150=167.5; EV(Node 4)=0.45°250 + 0.55*200=222.5; EV(Node 1)=0.35°EV(Node 2) + 0.20 EV(Node 3) + 0.45 EV(Node 4) = 0.35 62.5 + 0.20*167.5 + 0.45*222.5 = 21.875+33.5+100.125=155.5 (Source: Statistics for Business and Economics by Anderson, Sweeney, Williams, Cengage Learning) CAP Domain: 1

Q 116: which of the following nodes has the highest expected value (EV)? A) node 1 B) node 2 C) node 3 D) node 4

Q116: Correct Answer d) Node 4 Explanation EV(Node 2) = 0.75°50 + 0.25°100 = 62.5; EV(Node 3)=0.05 500 + 0.95°150=167.5; EV(Node 4)=0.45*250 + 0.55°200=222.5; EV(Node 1)=0.35*EV(Node 2) + 0.20°EV(Node 3) + 0.45°EV(Node 4) = 0.35*62.5 + 0.20°167.5 + 0.45°222.5 = 21.875+33.5+100.125=155.5 (Source: Statistics for Business and Economics by Anderson, Sweeney, Williams, Cengage Learning) CAP Domain: 1

Q117 : The main objective of Network Analysis in Project Management is to a ) minimize total project duration b ) minimize total project cost c ) minimize production delaysiad sites d ) minimize production interruption and conflicts

Q117: Correct Answer a) minimize total project duration Explanation Main objective of Network Analysis is to minimize the total project duration. Others things are not analyzed Project Network Analysis (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 2

Q118 : If an activity has zero slack in critical path method , it implies that a ) it is a dummy activity b ) it lies on the critical path c ) the project is progressing well d ) the project will not complete on time

Q118: Correct Answer b) it lies on the critical path Explanation Slack is used to place an activity on the critical path and zero slack activity is mostly placed on the critical path. It is not related to dummy activity and does not relate to the project completion. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q122 : The solution to a transportation problem with m - rows ( supplies ) and n - columns ( destination ) isfeasible if number of positive allocations is a ) m + n b ) mxn c ) m + n - 1 d ) m + n + 1

Q122: Correct Answer c) m+n-1 Explanation In transportation problem, there are (m + n) constraints, one for each source of supply, and distinction and m x n variables. Since all (m + n) constraints are equations, and since the transportation model is always balanced (total supply = total demand), one of these equations is extra (redundant). The extra constraint equation can be derived from the other constraint equations, without affecting the feasible solution. It follows that any feasible solution for transportation problem must have exactly (m + n - 1) non-negative basic variables or allocations. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q124 : If there were ' n ' workers and ' n ' jobs , there would be a ) n ! Assignment solutions b ) ( n - 1 ) ! Assignment solutions c ) ( n ! ) ^ n Assignment solutions d ) n Assignment solutions

Q124: Correct Answer a) n! Assignment solutions Explanation First worker can be assigned n jobs, second worker can be assigned from (n-1) jobs, third worker can be assigned from (n-2) jobs, ..., nth worker will have last job left. This implies total solutions are n! (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q125 : For a salesman who has to visit ' n ' cities , which of the following are the ways of his tour plan as per the travelling salesman problem a ) n ! b ) ( n - 1 ) ! c ) ( n + 1 ) ! d ) n

Q125: Correct Answer b) (n-1)! Explanation For a travelling salesman problem, the salesman is assumed to start from his home city (allocated from the given n cities), thus the total solution reduces to (n-1)!

Q129: The mathematical model of a Linear Programming problem is important because a) It enables the use of algebraic technique b) Decision-makers prefer to work with formal models c) It helps in converting the verbal description and numerical data into mathematical expression D) it captures the relevant relationship among decision factors

Q129: Correct Answer c) It helps in converting the verbal description and numerical data into mathematical expression Explanation The most relevant answer is that LP model helps in converting the verbal description and numerical data into mathematical expression. Usage of algebraic technique is not primary goal. Decision makers need appropriate solutions which may be through formal or informal methods. Capturing relevant relationships among decision factors is also the next stage. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 2

Q130: While solving a Linear Programming Model graphically, the area bounded by the constraints is called a) Unbounded Region b) Redundant region c) Feasible Region d) Infeasible Region

Q130: Correct Answer c) Feasible Region Explanation Graphically, the Feasible region is the area bounded by the constraints. Beyond that, it is the infeasible region. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 5

Q131: A redundant constraint in Linear Programming model will a) Affect the feasible solution region b) Not affect the feasible solution region c) Cause difficulty in solving a Linear Programming problem graphically d) Convert the maximization problem into a minimization problem

Q131: Correct Answer b) Not affect the feasible solution region Explanation A redundant constraint does not affect the feasible solution region or space and thus redundancy of any constraint does not cause any difficulty in solving an LP problem graphically. A constraint is said to be redundant when it may be more binding (restrictive) than another.

Q132: Shadow price indicates how much one unit change in the resource value will change the a) optimality range of an objective function b) optimal value of the objective function c) value of the basic variable in the optimal solution d) sign of the objective function

Q132: Correct Answer b) optimal value of the objective function Explanation Shadow price is defined as the rate of change in the optimal objective function value with respect to the unit change in the availability of the resources. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 1

Q133: The shadow price for a resource is the a) price that is paid for the purchase of a resource b) saving by eliminating one of the excess quantities of resource c) increase in the objective function value by providing one additional unit of resource d) cost incurred for removing a resource

Q133: Correct Answer c) increase in the objective function value by providing one additional unit of resource Explanation Shadow price is defined as the rate of change in the optimal objective function value with respect to the unit change in the availability of the resources. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 1

Q134: If dual has an unbounded solution, primal has a) no feasible solution b) unbounded solution c) feasible solution d) bounded solution

Q134: Correct Answer a) no feasible solution Explanation If either the primal or dual problem has an unbounded objective function value, the other problem has no feasible solution, respectively. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 6

Q135: The entering variable in the post-optimality analysis of the objective function coefficients is always a a) Decision Variable b) Non-basic Variable c) Basic Variable d) Slack Variable

Q135: Correct Answer d) Slack Variable Explanation In post-optimality analysis or sensitivity analysis, the slack variable enters in the objective function coefficients. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q136: Post-optimality analysis is used for a) Finding the feasible solution to a Linear Programming Problem b) Formulating the linear programming problem c) Knowing the effect on optimal solution of Linear Programming model due to variations in the input coefficients one at a time d) conducting the shadow price analysis on a linear programming model

Q136: Correct Answer c) Knowing the effect on optimal solution of Linear Programming model due to variations in the input coefficients one at a time Explanation Sensitivity analysis or post-optimality analysis is the study of knowing the effect on optimal solution of Linear Programming model due to variations in the input coefficients one at a time. The study of duality does not tell the magnitude in improvement of objective function value, but sensitivity analysis does that. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q137: The Binary (0-1) integer programming problem a) requires the decision variables to have values between zero and one b) requires that all the constraints have coefficients between zero and one c) requires that all the decision variables have coefficients between zero and one d) requires that objective function have only two values - zero or one

Q137: Correct Answer a) requires the decision variables to have values between zero and one Explanation The Binary (0-1) integer programming problem requires the decision variables to have values between zero and one. Coefficients need not to be zero or one. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q138: In Goal programming problem, goals are assigned priorities such that a) goals of greatest importance are given lowest priority b) goals may not have equal priority c) higher priority goals must be achieved before lower priority goals d) lower priority goals must be achieved before higher priority goals

Q138: Correct Answer c) higher priority goals must be achieved before lower priority goals Explanation An important feature of Goal Programming is that the goals are satisfied in ordinal sequence, i.e. the solution to goal problem involves achieving some higher order/ priority goals first, before the lower order/priority goals are considered. Each goal is achieved in sequential order. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 1

Q139: Which of the following criterion is not applicable to decision making under risk? a) Maximize expected return b) Maximize return c) Minimize expected regret d) knowledge of likelihood occurrence of each state of nature

Q139: Correct Answer b) Maximize return Explanation Under risk the decision makers has less than complete knowledge of the consequences of every decision choice. So, only absolute return maximization is not an appropriate criterion. Expected utility is the best way to make a decision. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 1

Q140: Which of the following criterion is not applicable to decision making under uncertainty? a) maximin b) maximax c) minimax d) minimize expected opportunity loss

Q140: Correct Answer d) minimize expected opportunity loss Explanation Under uncertainty, the decision maker is unable to specify the probabilities with which the various states of nature will occur. Therefore, decision is arrived only on the actual conditional payoff values. So opportunity loss is not applicable under uncertainty but is applicable under risk. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 1

Q141: If the unit cost rise, then optimal/economic order quantity a) increase b) decrease c) either increases or decreases d) no change

Q141: Correct Answer B) decrease Explanation Increasing the cost of the item will decrease the order quantity for a given budget. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 7 Q142: Correct Answer

Q143: In queuing theory, a calling population or input source is considered to be infinite when a) all customers arrive at once b) arrivals are independent of each other c) capacity of the system is infinite d) service rate is faster than the arrival rate

Q143: Correct Answer b) arrivals are independent of each other Explanation The customer arrival is independent of each other, then the calling population is considered to be infinite.

Q144: Service mechanism in a queuing system is characterized by a) Server's Behaviour b) Customer's Behaviour c) Customers in the System d) Calling population

Q144: Correct Answer a) Server's Behavior Explanation Server behavior determines the service mechanism in a queuing system. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 5

Q147: There are 3 dairies A, B and C in a small town which supplies all the milk consumed in the town. Assume that the initial consumer sample is composed of 1000 respondents distributed over the three dairies. It is known by all the dairies that consumers switch from one dairy to another due advertising, price and dissatisfaction. All these dairies maintain records of the number of their customers and the dairy from which they obtained each new customer. Assume that the matrix of transition probabilities remain fairly stable and at the beginning of period one market shares are 25%, 45% and 30%, respectively. The information on the result of flow of customers over an observation period of one month is given, and also the manner in which customers were gained or lost by dairies is also given. Which of the following is MOST appropriate technique to analyse the problem? a) Monte Carlo Simulation b) Markov Chain Analysis c) Linear Programming Model d) Assignment Model

Q147: Correct Answer b) Markov Chain Analysis Explanation Markov chain analysis is best because there are finite number of possible states in the problem which are both collectively exhaustive and mutually exclusive. The transition probabilities are dependent only on the current state of the system. The long-run probability of being in a particular state will be constant over time. Most of the characteristic of Markov Chain are getting fulfilled, so this is the BEST approach to analyse the given problem. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 2

Q148: Dynamic programming approach a) optimizes the sequence of interrelated decision over a period of time b) provides optimal solution to single period decision-problem c) provides optimal solution to long-term corporate planning problems d) provides optimal solution to short-term corporate

Q148: Correct Answer a) optimizes the sequence of interrelated decision over a period of time Explanation The decision-making process often involves several decisions to be taken at different times, e.g. problems of inventory control, evaluation of investment opportunities, long term corporate planning, etc., requiring sequential decision making. The mathematical technique of optimizing a sequence of interrelated decisions over a period of time is called dynamic programming. It uses the idea of recursion to solve a complex problem, broken into a series of interrelated (sequential) decision stages where the outcome of a decision at one stage affects the decision at each of the following stages. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 2

Q149: A function is said to achieve its maximum value at a point, X=X_0, if (for a very small number 'h' in the vicinity of X= X_0,) a) F(X_0) = f(X_0 + h) b) f( X_0)> f(X_0 + h) c) f(X_0) < f(X_0 + h) d) f(X_0) = f(X_0 + 2h)

Q149: Correct Answer b) f(X_0)> f(X_0 * h) Explanation Mathematically, a function y f(x) is said to achieve its maximum value at a point, X=X_0, if f(X_0)> f(X_0 + h), where h is a sufficiently small number in the neighbourhood of the point X-X_0. In other words, the point X_0, is a local maximum if the value of f(x) at every point in the neighbourhood of X_0, does not exceed f(X_0). (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 3

Q151 : Which of the following approach can be used for outlier detection ? a ) Classification b ) Decision Trees c ) Clustering d ) Regression

Q151: Correct Answer c) Clustering Explanation Clustering is prominently used for outlier detection where values that are "far away" from the cluster centers are studied as outliers. (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 5

Q152 : Which of the following is an unsupervised learning technique ? a ) Clustering b ) Classification c ) Decision Trees d ) Regression

Q152: Correct Answer a) Clustering Explanation Clustering is an unsupervised learning technique as the learning does not rely on predefined classes and class labeled training. It is a form of learning by observation rather than learning by examples. (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 4

Q153 : In K - means clustering algorithm , ' K ' represents a ) Number of iterations b ) Number of objects c ) Number of partitions d ) Number of testing parameters

Q153: Correct Answer c) Number of partitions Explanation K-means algorithm takes the input parameter k, and partitions a set of n objects into k clusters so that the resulting intra-cluster similarity is high but the inter-cluster similarity is low. (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 4

Q154 : Which of the following is a Soft - clustering technique ? a ) Fuzzy C - Means b ) K - Means c ) K - Medoids d ) DBSCAN

Q154: Correct Answer a) Fuzzy C-Means Explanation Fuzzy clustering is soft clustering as it does not necessarily partitions an object into a particular cluster. Rather it assigns a likelihood factor or probability with the object for every cluster that it can be assigned. Contrary to this, hard clustering involves partitioning an object into a cluster. (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 4

Q155 : What is moving average of order 3 for the following data set : ( 3,7,2,0,4,5,9,7,2 ) ? a ) 3,7,2,0,4,5,9,7,2 b ) 4,3,2,3,6,7,6 c ) 3,3,3,3,3,3,3 d ) 5.5,2.5,1,3.5,5.5,8,6.50

Q155: Correct Answer b) 4,3,2,3,6,7,6 Explanation Moving average of order 3 can be calculated as considering the data sets in groups of three, summing them up and dividing by 3. (3+7+2)/3, (7+2+0)/3, (2,0,4)/3, (0+4+5)/3, (4+5+9)/3, (5+9+7)/3, (9+7+2)/3 (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 3

Q156 : ARIMA model is used as a a ) Classification technique b ) Clustering Technique c ) Forecasting Technique d ) Optimization Technique

Q156: Correct Answer c) Forecasting Technique Explanation Auto-Regressive Integrated Moving Average (ARIMA) is a model used for Time-Series Forecasting. (Source: Data Mining Concepts & techniques by Han & Kamber, Elsevier) CAP Domain: 4

Q157 : Which of the following is a Qualitative forecasting technique ? a ) ARIMA b ) Logistic Regression c ) CHAID d ) Delphi Method

Q157: Correct Answer d) Delphi Method Explanation Delphi method attempts to develop forecasts through "group consensus" through the members of a panel of experts, all of who are physically separated from and unknown to each other, are asked to respond to a series of questionnaires. Rest others are quantitative forecasting techniques. (Source: Quantitative Methods for Business by Anderson, Sweeney, Williams, Cengage learning) CAP Domain: 4

Q158 : What is depicted by a circle in following diagram representing a Time Series dataset ? a ) Cyclic Component b ) Trend Component c ) Seasonal Component d ) Cyclic , Trend , Seasonal Component

Q158: Correct Answer c) Seasonal Component Explanation The circular portion represents seasonal component of a time series. The line shows the trend component along with cyclic component as the trends is repeating over time. (Source: Basic Econometrics by Gujarati, Porter and Gunasekar, Macgraw Hills) CAP Domain: 6

Q159 : If historical data is not available with the manager , what should be the MOST appropriate step forward to develop forecasts for a complex problem ? a ) He should use Qualitative forecasting techniques b ) He should simulate data and develop forecasting model c ) He should abandon the project and return client's money d ) He should resign from the project

Q159: Correct Answer a) He should use Qualitative forecasting techniques Explanation When the problem is complicated and historical data is not available, one should take expert opinion and develop forecasts. Simulating data for a complex problem is not easy and sometimes even not feasible, so simulation should be second option after qualitative forecasts. If the qualitative techniques and simulation does not work, then the project should be abandoned. (Source: Quantitative Methods for Business by Anderson, Sweeney, Williams, Cengage learning) CAP Domain: 2

Q160 : When the conditions in the past are not likely to hold in future , which qualitative forecasting method should be used ? a) Scenario Writing b) Expert Judgement c) Intuitive Approach d) Group Discussions

Q160: Correct Answer b) Expert Judgement Explanation In Expert Judgement, the experts individually consider information that they believe will influence the model and then they combine their conclusion into a forecast. No formal method is used and no two experts are likely to consider the same information in same way. Expert Judgement is often used when conditions in the past are not likely to hold in future. (Source: Quantitative Methods for Business by Anderson, Sweeney, Williams, Cengage learning) CAP Domain: 4

Q164 : Quantitative tallying of the number and types of defects that occur with a product or service is known as a ) Parametric Analysis b ) Six Sigma Analysis c ) Pareto Analysis d ) Benchmarking Analysis

Q164: Correct Answer c) Pareto Analysis Explanation Pareto Analysis is the quantitative tallying of the number and types of defects that occur with a product or service. A vertical bar chart that displays the most common type of defects, ranked in order of occurrence frm left to right is called as Pareto Chart. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 6

Q165 : Which of the following is MOST appropriate technique to test whether a batch or lot of goods will be accepted or not ? a ) Acceptance Sampling b ) Cluster Sampling c ) Snowball Sampling d ) Stratified Random Sampling

Q165: Correct Answer a) Acceptance Sampling Explanation Acceptance sampling is the inspection of a sample from a batch or lot of goods to determine whether the batch or lot will be rejected or accepted. In this sampling, the lot is the population. Different methods of acceptance sampling are - Single sample plan, Double sample plan, Multiple sample plan. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 6

Q166 : The chart used to plot the quantiles of two data sets against each other is known as a ) Box - plot b ) Stem and Leaf Plot c ) Q - Q Plot d ) Scatter plot

Q166: Correct Answer c) Q-Q Plot Explanation In statistics, a Q-Q (quantile-quantile) plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 5

Q167 : Outlier detection can be done BEST using a ) Box - plot b ) Stem and Leaf Plot c ) Q - Q Plotos d ) Scatter plot

Q167: Correct Answer a) Box-plot Explanation In descriptive statistics, a box plot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 5

Q168: Which is the BEST chart to represent multivariate data in 2-D form? a ) Ogive Chart b ) Radar Chart c ) Pareto Chart d ) Pie Chart

Q168: Correct Answer b) Radar Chart Explanation A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 6

Q169 : Which chart is BEST to represent correlations among two variables ? a ) Scatter plot b ) Pie Charte c ) Histogram d ) Q - Q plot

Q169: Correct Answer a) Scatter plot Explanation A scatter plot (also called a scatterplot, scatter graph, scatter chart, scatter-gram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 6

Q170 : Which visual tool is BEST to describe the logical relations between a finite collection of different datasets ? a ) Scatter plot b ) Venn Diagram c ) Pareto Chart d ) Q - Q plot

Q170: Correct Answer b) Venn Diagram Explanation A Venn diagram (also called primary diagram, set diagram or logic diagram) is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves. A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. (Source: Business Statistics for Contemporary Decision Making by Ken Black, Wiley Publication) CAP Domain: 6

Q171 : Which is the BEST tool used to diagram a business process that involves more than one department ? a ) Swim lane b ) Fishbone c ) Stem and Leaf d ) Ogive

Q171: Correct Answer a) Swim lane Explanation A swim lane Cor swim lane diagram) is used in process flow diagrams, or flowcharts, that visually distinguish job sharing and responsibilities for sub-processes of a business process. Swim lanes may be arranged either horizontally or vertically. When used to diagram a business process that involves more than one department, swim lane often serve to clarify not only the steps and who is responsible for each one, but also how delays, mistakes or cheating are most likely to occur. CAP Domain: 1

Q172 : Which visualization tool uses the individual values of a matrix to generate the chart ? a ) Bubble chart b ) Scatterplot c ) Heatmap d ) Ogive

Q172: Correct Answer c) Heatmap Explanation A heat map (or heatmap) is a graphical representation of data where the individual values contained in a matrix are represented as colors. "Heat map" is a newer term but shading matrices have existed for over a century. Heat maps originated in 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black squares (pixels) and smaller values by lighter squares.

Q173 : The graph that represents the set of portfolios that has the maximum rate of return for every given level of risk or the minimum risk for every level of return is known as a ) Pareto Chart b ) Efficient Frontier c ) Box - plot d ) Radar Chart

Q173: Correct Answer b) Efficient Frontier Explanation Efficient Frontier represents the set of portfolios that has the maximum rate of return for every given level of risk or the minimum risk for every level of return. It is an envelope curve containing the best of all the possible combinations of portfolio. It is generally represented by scatter plot which are covered by the envelope curve. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 7

Q174 : Given the following Efficient Frontier , which point is the worst in terms of risk and return ? a ) A b ) B c ) C d ) A & B

Q174: Correct Answer c) C Explanation A dominates C because it has an equal rate of return but substantially less risk. B dominates C because it has equal risk but a higher expected return. Therefore, C is the worst out of given three. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 7

Q175: The two end points of curve in Efficient Frontier represent a) Lowest risk, Highest return b) Lowest risk, Lowest return c) Highest risk, Highest return d) Same risk and return

Q175: Correct Answer a) Lowest risk, Highest return Explanation On the curve, bottom most point shows the asset with lowest risk and the top most point shows the asset with highest return. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 7

Q176 : A portfolio consists of two positively correlated stocks with average returns of 10 and 20 percent , respectively and standard deviation of 0.07 and 0.10 . If equal weights are considered for the stocks , what is the average return of the portfolio ? a ) Less than 10 percent b ) More than 10 percent c ) More than 15 percent d ) More than 20 percent

Q176: Correct Answer b) More than 10 percent Explanation Average return of portfolio of the stocks (irrespective of their negative or positive correlation) is given by the summation of weights multiplied by returns for all the stocks. So (0.5*10)+(0.5*20)=15%, more than 10 percent is the most relevant option. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 7

Q177 : A portfolio consists of two perfectly negatively correlated stocks with average returns of 10 and 20 percent , respectively and standard deviation of 0.07 and 0.10 . If equal weights are considered for the stocks , what is the average standard deviation of the portfolio ? a ) Less than 0 b ) Between 0 and 0.5 c ) Between 0.5 and 1 d ) More than 1

Q177: Correct Answer b) Between 0 and 0.5 Explanation Standard deviation of portfolio is given by (w1^2*stdev1^2 + w2^2*stdev2^2 + 2*w1*w2*corr*stdev1*stdev2)^(1/2). Putting the values as w1=0.5, w2=0.5, stdev1=0.07, stdev2=0.10, corr= -1.0, we get answer as 0.015. So between 0 and 0.5 is the correct answer. (Source: Investment Analysis and Portfolio Returns by Reilly and Brown, Cengage Learning) CAP Domain: 7

Q178 : Two mutually exclusive projects are having positive Net Present Value ( NPV ) . Which of them should be selected for the execution ? a ) Project with lower NPV b ) Project with higher NPV c ) Project with NPV higher than twice the other one d ) Project with NPV lower than twice the other one

Q178: Correct Answer b) Project with higher NPV Explanation Ideally when the projects are mutually exclusive i.e. not dependent on each other and have a positive NPV, with no other information being given, the project with higher NPV should be selected. It does not matter how much high the NPV of the project is, as compared to the NPV of other project. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 1

Q180 : Which of these methods seek to find out the amount that can be invested in a given project so that its anticipated earnings will exactly suffice to repay this amount with interest at the market rate ? a ) Net Present Value b ) Internal Rate of Return c ) Payback Period d ) Operating Margin

Q180: Correct Answer a) Net Present Value Explanation Net Present Value method seeks to find out the amount that can be invested in a given project so that its anticipated earnings will exactly suffice to repay this amount with interest at the market rate. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 7

Q181 : Which of these methods seek to find the maximum rate of interest at which the funds invested in the project could be repaid out of the cash inflows arising out of that project ? a ) Net Present Value b ) Internal Rate of Return c) Payback Period d) Operating Margin

Q181: Correct Answer b) Internal Rate of Return Explanation Internal rate of return method seeks to find the maximum rate of interest at which the funds invested in the project could be repaid out of the cash inflows arising out of that project. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 7

Q184 : Which method calculates the number of years required to return the original investment from the net cash flows ? a ) Net Present Value b ) Internal Rate of Return c ) Payback Period d ) Operating Margin

Q184: Correct Answer c) Payback Period Explanation Payback period calculates the number of years required to return the original investment from the Net cash flows. (Source: Managerial Economics and Financial Accounting by Reddy and Saraswathi, PHI learning) CAP Domain: 1

Q185 : A group of experts on analytics may not be given all the historical results in order to ? a ) Avoid Anchoring Bias b ) Make them work hard c) Make full utilization of money spent on them d) Give a feeling of inferiority to them

Q185: Correct Answer a) Avoid Anchoring Bias Explanation Anchoring is term used to explain the way the brain takes mental shortcuts to arrive at conclusions. In short, there is a tendency for us to over-rely on the first piece of information that enters our brain, This piece of information is the anchor. Once the anchor is set, all future decisions revolve around the anchor, contaminating rational thinking. If you ever not certain of a correct answer, most likely you will fall victim to the anchoring bias and guess an answer based on the most recent information. (Source: The 5 Mistakes every investor makes and how to avoid them by Peter Mallouk, Wiley Publishing) CAP Domain: 2

Q187 : Once the problem area of the business has been identified , which is the MOST appropriate next step for the project execution ? a ) Clarify pre - requisites of the project b ) Check the current status of the project c ) Identify target group for the project result d ) Describe the problem in general terms to the management

Q187: Correct Answer d) Describe the problem in general terms to the management Explanation After the analyst have identified the problem area, it is always mandatory to describe the problem in general terms to the management so that it can be validated that the problem identified is correct or not. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 1

Q188 : Business Objectives Identification also includes a ) specifying the data sources b ) specifying the criteria for successful outcome c ) specifying the choice of tool and technique d ) assessing the cost of project execution

Q188: Correct Answer b) specifying the criteria for successful outcome Explanation Specifying the criteria for successful outcome for each business objective is an important activity during the Business Objective Identification phase. Data source, tools, techniques or cost of project is not considered while framing the business objectives. (Source: CRISP-DM, https://www.the- modeling-agency.com/crisp-dm.pdf) CAP Domain: 1

Q189: After framing the Business Problem and communicating it to the Analytics Manager, what should be the NEXT step for the project? a) Finding the right source of data b) Framing the business problem as Analytical Problem c) Contacting the management team for stakeholder's identification d) Choosing the model evaluation parameter for assessing the model performance

Q189: Correct Answer b) Framing the business problem as Analytical Problem Explanation After Business Problem Framing stage, the next important task is to frame the analytical problem corresponding to the business problem. Stakeholders have already been identified and data source and model evaluation parameter are not required at this stage. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 2

Q190 : If you have formulated the analytical problem and finalised the evaluation parameters for model performance , what should be NEXT step of the project execution ? a ) Identifying the data source b ) Benchmarking the evaluation criteria c ) Assessing the various tools available for model development d ) Assessing the various techniques to be used for model development

Q190: Correct Answer b) Benchmarking the evaluation criteria Explanation Once the model's evaluation parameters are finalized, it is important to understand and benchmark these evaluation criteria. In case the range of evaluation is not clear, the final evaluation report may go wrong. After this the tools, techniques and data sources needs to be worked upon. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 2

Q191 : What can be said BEST about business success criteria and analytics success criteria ? a ) Business Success Criteria and Analytics Success Criteria are absolutely same b ) Business Success Criteria and Analytics Success Criteria are different from each other c ) Business Success Criteria and Analytics Success Criteria are not related to each other d ) Analytics Success Criteria can be determined before the Business Success Criteria

Q191: Correct Answer b) Business Success Criteria and Analytics Success Criteria are different from each other Explanation Both Business Success Criteria and Analytics Success Criteria are related to each other but are different from each other. Out of these two, Business Success Criteria are defined first and then on its basis the Analytics Success Criteria is defined. (Source: CRISP-DM, https:// www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 1

Q192 : An important criterion for selecting the Analytics Modelling technique is to consider a ) client's approval on each technique b ) the performance of each technique c ) cost of implementing each technique d ) the assumptions of each modelling technique and map it to the data format

Q192: Correct Answer d) the assumptions of each modelling technique and map it to the data format Explanation An important step for selecting the modelling technique is to consider the assumptions of each technique and then mapping it to the data format. Once they are satisfactory as per Data Scientist, then only other things like cost and client approval should be considered. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 4

Q193 : Generating test design of an analytical technique involves a ) decision about data set division into train set and test set , as well as considering number of iterations and folds b ) client approval on the choice of the technique c ) consensus amongst data scientist on the choice of technique testing d ) the hardware for implementation of the analytical model

Q193: Correct Answer a) decision about data set division into train set and test set, as well as considering number of iterations and folds Explanation Test design generation requires a decision over test set, train set, number of iterations, number of folds, etc. Model testing strategy is important for decision making over its performance. (Source: CRISP-DM, https:// www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 5

Q194 : Which is the MOST important decision after Model Deployment phase ? a ) When will the client give approval to the solution ? b ) What will be development tool and technique ? c ) How the use of the model will be monitored and measured ? d ) What data will be taken as input into the analytical model ?

Q194: Correct Answer c) How the use of the model will be monitored and measured? Explanation Once the model has been deployed, it is important to monitor the model and its performance must be measured continuously. Sometimes the model built on test data and its performance in real environment varies. Data input. tool and techniques are already considered during model development. Client's approval comes after performance of model is satisfactory and is solving the business objectives. (Source: CRISP-DM, https:// www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 6

Q195 : Data preparation phase of Analytics Project does NOT include a) Data Cleaning b) Data Formatting c) Data Integration d) Data Division into test set and training set

Q195: Correct Answer d) Data Division into test set and training set Explanation Data division is not part of data preparation phase but that of test design phase. Cleaning, formatting and integration are all important phases of data preparation. (Source: CRISP-DM, https://www.the- modeling-agency.com/crisp-dm.pdf) CAP Domain: 7

Q196 : Model parameter setting is done during which of the following phase ? a ) Model building b ) Model testing c ) Model deployment d ) Model maintenance

Q196: Correct Answer a) Model building Explanation There are various model parameters which needs to be tuned and checked during model building phase. (Source: CRISP-DM, https://www.the-modeling-agency.com/crisp-dm.pdf) CAP Domain: 5

Q198 : Which of the following is MOST appropriate for technology selection for Analytics Model building ? a ) Open source is best as there is no licensing cost involved in it b ) Licensed software is best as it is more secured and robust c ) The best software is the one which have got the provision for various techniques to solve the given analytical problem d ) The best technology should be based on the skillset of the available resources

Q198: Correct Answer c) The best software is the one which have got the provision for various techniques to solve the given analytical problem Explanation Technology or software must have the provision to solve the analytical problem. It does not matter if the software is licensed or open source or whether resources are available or not. (Source: Understanding the Predictive Analytics Lifecycle by Alberto Cordoba, Wiley Publication) CAP Domain: 2

Q199 : If client is NOT happy with the performance of the model after deployment , what should be the FIRST step by the team? a ) Fire the data science team and hire a new team for new model development b ) Verify the business requirements and validate the approaches used for solving business problem c ) Check the model parameters , verify the model assumptions and compare the test data with the real data used as input in the model d ) Discuss with all the stakeholders for performance of the model and take a consensus on its under

Q199: Correct Answer c) Check the model parameters, verify the model assumptions and compare the test data with the real data used as input in the model Explanation Before taking any strategic decision or contacting any of the stakeholder, just check the model parameters, assumptions and the data set on which it is working. It is sometimes possible that test conditions and actual conditions may vary and thus model may not perform well. (Source: Understanding the Predictive Analytics Lifecycle by Alberto Cordoba, Wiley Publication) CAP Domain: 6

Q202 : A multiple linear regression was built to final model try to predict customer expenditures based on 200 independent variables ( behavioral and demographic 10,000 rows of data were fed into a stepwise regression , each row representing one customer . 1,000 customers were male , and 9,000 customers were female . The final model had an adjusted R - squared of 0.27 and seven independent variables . Increasing the number of rows of data to 100,000 and rerunning the stepwise regression will MOST likely a ) have negligible impact upon the adjusted R - squared b ) increase the impact of the male customers c ) change the heteroscedasticity of the residuals in a favorable manner d ) decrease the number of independent variables in the

Q202: Correct Answer a) have negligible impact upon the adjusted R-squared Explanation The increase in size of the data will not impact the adjusted R-squared calculation because both samples are sufficiently large randomly selected subsets of data. (Source: INFORMS Certified Analytics Professional (CAP) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q204 : Which of the following is an effective optimization method? a ) Analysis of variance ( ANOVA ) b ) Generalized linear regression model ( GLM ) c ) Box - Jenkins Method ( ARIMA ) d ) Mixed integer programming ( MIP )

Q204: Correct Answer d) Mixed integer programming (MIP) Explanation This is a mathematical optimization technique used when one or more of the variables are restricted to be integers. It is an effective optimization model. (Source: INFORMS Certified Analytics Professional (CAP®) Professional Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q205 : A box and whisker plot for a dataset will MOST clearly show a ) the difference between the second quartile and the median b ) the 90 % confidence interval around the mean c ) where the [ actual - predicted ] error value is not zero d ) if the data is skewed and , if so , in which direction

Q205: Correct Answer d) if the data is skewed and, if so, in which direction Explanation A box and whisker plot, sometimes just called a "box plot," was invented by John Tukey as a way to graphically display the distribution of data. The ends of the box are at the first and third quartiles, and there is a line somewhere in the box representing the median value. The whiskers extend either to the minimum and maximum values in the data set, or possibly less if they do not include points identified as outliers. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q206 : In the initial project meeting with a client , which of the following is the MOST important information to discuss? a ) Timeline and implementation plan b ) Analytical model to use c ) Business issue and project goal d ) Available budget

Q206: Correct Answer c) Business issue and project goal Explanation Understanding the business issue and project goal provides a sound foundation on which to base the project. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q207 : Which of the following statements is true of modeling a multi - server checkout line? a) A queuing model can be used to estimate service rates b) A queuing model can be used to estimate average arrivals c) Variability in arrival and service times will tend to play a critical role in congestion d) Poisson distributions are not relevant

Q207: Correct Answer c) Variability in arrival and service times will tend to play a critical role in congestion Explanation Arrival and service time distributions are inputs to a queuing model that would be used to model a checkout line and directly influence congestion. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q208 : A company is considering designing a new automobile . Their options are a design based on current gasoline engine technology or a government proposed " Green " technology . You are a government official whose job is to encourage automakers to adopt the " Green " technology . You cannot provide funding for development or production costs , but you can provide a subsidy for every car sold. The development costs and the wholesale price, in USD ($), of the cars are shown in the table. How large a subsidy per vehicle sold will be required, assuming there will be enough demand to motivate the switch? a ) Greater than $ 5000 b ) Less than $ 5000 c ) Cannot be determined d ) Equal to $ 5000

Q208: Correct Answer a) Greater than $5000 Explanation If we consider the profit from an individual vehicle to be the wholesale price minus the variable cost, we see that the profit from a Gasoline Technology vehicle is $25K-$15K = $10K. Similarly, the profit from a "Green" Technology vehicle is $40K-$35K = $5K. In order to make up for this difference in lost profit, the subsidy provided to the automaker would have to be at least $ 5K (the difference between $10K and $5K). In addition, the subsidy would need to be greater than $5000 so that the automakers would be able to recover their increased fixed costs at a reasonable level of demand. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q210 : You have simulated the net present value ( NPV ) of a decision . It ranges between -$10 million and +$10 million . To BEST present the likelihood of possible outcomes , you should : a ) present a single NPV estimate to avoid confusion b ) present multiple NPV estimates c ) trim all outliers to present the most balanced diagram d ) relax constraints associated with extreme points in the simulation

Q210: Correct Answer a ) present a single NPV estimate to avoid confusion Explanatior Net Present Value (NFV) takes as input a time series of cash flow (both incoming and outgoing) and a discount and outputs a price. By showing a histogram (a graphical representation of the distribution of data), possible to see how likely various NPVs (beyond the given minimum and maximum) are to occur. This would be useful information to have when considering a decision, especially since the range of outcomes includes SO, meaning the decision could result in a profit or a loss. (Source: INFORMS Certified Analytics Professional (CAP%) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q211 : A company ships products from a single dock at their warehouse . The time to load shipments depends on the experience of the crew , products being shipped and weather . The company thinks there is significant unmet demand for their products and would like to build another dock in order to meet this demand . They ask you to build a model and determine if the revenue from the additional products sold will cover the cost of the second dock within two years of it becoming operational . Which of the following is the MOST appropriate modeling approach? a ) Optimization because it is a transportation problem b ) Optimization because the company's objective to maximize profit and capacity at the dock is a limited resource c) Forecasting because you can determine the throughput at the dock, calculate the net revenue and compare this with the cost of the new dock d) Discrete event simulation because there are a sequence of discrete random events through time

Q211: Correct Answer d) Discrete event simulation because there are a sequence of discrete random events through time Explanation The time to load shipments depends on the experience of the crew, products being shipped, and weather. Given that there is a sequence of random events through time, discrete event simulation is the most appropriate modeling approach. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q212 : Two investors who have the same information becoming about the stock market buy an equal number of shares of a stock . Which of the following statements MUST be true? a ) The risks for the two investors are statistically independent b ) Both investors are subject to the same risks c ) Both investors are subject to the same uncertainty d) If the investors are optimistic, they should have borrowed, rather than bought the shares

Q212: Correct Answer c) Both investors are subject to the same uncertainty Explanation Both investors are subject to the same uncertainty regarding the stock market. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q213 : A project seeks to build a predictive data mining model of customer profitability based upon a series of independent variables including customer transaction history , demographics , and externally purchased credit - scoring information . There are Q214 currently 100,000 unique customers available for use in building the predictive model . Which of the following strategies would reflect the BEST allocation of these 100,000 customer data points? a ) Use 70,000 randomly selected data points when building the model , and hold the remaining 30,000 as a test dataset b) Build the model using all 100,000 data points c) Randomly partition the data into 4 datasets of equal size, build four models and take their average d) Use 1,000 randomly selected data points when building the model.

Q213: Correct Answer a) Use 70,000 randomly selected data points when building the model, and hold the remaining 30,000 as a test dataset. Explanation This split provides sufficient data to build the n and sufficient data to test the model. This is the allocation of the customer data points. (A common of thumb' is to use about two thirds of the data to the model and one third to test it) (Source: INF‹ Certified Analytics Professional (CAP®) Examin EXAMINATION STUDY GUIDE) CAP Domain: 5

Q215: One of the main advantages of tree-based models and neural networks is that they a) are easy to interpret, use, and explain b) build models with higher R squared than other regression techniques c) reveal interactions without having to explicitly build them into the model d) can be modeled even when there is a significant amount of missing data

Q215: Correct Answer c) reveal interactions without having to explicitly build them into the model. Explanation Tree-based models and neural networks are employed to find patterns in the data that were not previously identified (or input into the model building process). (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 5

Q217: After building a predictive model and testing it on new data, an under prediction by a forecasting system can be detected by its a) negative-squared b) bias c) mean absolute deviation d) mean squared error

Q217: Correct Answer b ) bias Explanation The bias measures the difference, including the direction of the estimate and the right answer. Depending on whether it's positive or negative, it will show whether there is an over or under estimate. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 6

Q218: All times in the decision tree below are given in hours. What is the expected travel time (in hours) of the optimal (minimum travel time) decision? a ) 7.8 b) 6.9 c) 7.4 d) 7

Q218: Correct Answer d ) 7 Explanation To answer this question, one needs to solve the decision tree using the "roll back" technique. Continuing back the bottom branch of the tree, the expected time if you fly is (0.5)(9.0) + (0.5)(5) = 7.0 hours. Now, when faced with the "drive or fly" decision, you should choose to fly (since 7.0 hours is less than 7.35 hours). Thus, answer d) 7.0 hours is the expected travel time of the optimal (or minimal travel time) decision. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 5

Q219: An analytics professional is responsible for maintaining a simulation model that is used to determine the staffing levels required for a specific operational business process. Assuming that the operational team always uses the number of staff determined by the model, which of the following is the MOST important maintenance activity? a) Ensure that all the model input data items are available when needed b) Determine if there has been a change in model accuracy over time. c) Ensure that all users are reviewing the model results in a timely fashion d) Determine that the model's reports are understood by the users

Q219: Correct Answer b) Determine if there has been a change in model accuracy over time. Explanation The most important maintenance activity for the analytics professional responsible for maintaining the simulation model is to monitor the accuracy of the model over time. If there has been a change in accuracy, the analytics professional may need to revisit the assumptions of the model. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 7

Q222: Each month you generate a list of marketing leads for direct mail campaigns. Which of the following should you do before the list is used? a) Exclude people who were on the list the previous month b) Retain ×% of the leads as control for performance measurement c) Remove opt-outs d) Exclude people who were never on the list

Q222: Correct Answer c ) Remove opt-outs Explanation The list of marketing leads should not include people or organizations that have opted out. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 6

Q223: When analyzing responses of a survey of why people like a certain restaurant, factor analysis could reduce the dimension in which of the following ways? a) Collapse several survey questions regarding food taste, health value, ingredients and consistency into one general unobserved "food quality" variable b) Condense similar survey respondent answers into clusters of like-minded customers for market segment analysis c) Reduce the variability of individual subject ratings by centering each respondent's ratings around his or her average rating d) Decrease variability by analyzing inter-rater reliability on the question items before offering the survey to a wide number of respondents

Q223: Correct Answer a) Collapse several survey questions regarding food taste, health value, ingredients and consistency into one general unobserved "food quality" variable Explanation Factor analysis is a statistical method used to describe variability among observed variables in terms of a potentially lower number of unobserved variables called factors. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q225: Suppose that the business problem is that the organization wants to increase sales by increasing cross-selling to existing customers. Your project sponsor looks to you to tell her how the organization can get there based on the data at hand. What's your first move? a) Dive into existing customer interaction data b) Ask your sponsor if she has a particular customer segment in mind c) Talk with marketing to see what they have planned for the next sales campaign d) Ask your sponsor what the actual numeric target of increased sales is overall

Q225: Correct Answer d) Ask your sponsor what the actual numeric target of increased sales is overall Explanation Note that your sponsor didn't give you much information to go on, and you don't know what your goal really is, except that you know you're looking to get more sales per customer. There's not enough to go on here to start to formulate the problem. Choice D would be the best response to start to get some numbers to go with the business' goal. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q226 : Your sponsor sales from an average of $10,000 has come back with a numeric goal of increasing to $11,000 per customer in the next 12 months, what's your next move? a ) See what price / sales volume data exist to see if the organization's prices match value b ) See what sales by customer data exist c ) Create hypotheses of which customer segments could be cross-sold d ) Explore whether there are any other related business goals

Q226: Correct Answer d) Explore whether there are any other related business goals Explanation Note that your sponsor didn't give you much information to go on, and you don't know what your goal really is, except that you know you're looking to get more sales per customer. There's not enough to go on here to start to formulate the problem. Choice D would be the best response to start to get some numbers to go with the business' goal. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q227 : You now have a little more information from the project sponsor , along with several rumors from other sources . You know that you should base the cost of increased sales over current levels at the marginal cost , rather than the fully allocated cost ; that the company has to maintain at least the same return on sales as it currently has as the sales increase from 10,000 per customer to 11,000 per customer and that top - line revenue must also increase by 10 % ( i.e. , you can't get there by dropping your lowest performing customers). Once you've listed these assumptions or rules in your project charter, what's next? a) Start creating your input/output diagrams about what drives current customers to buy more b) Talk with your marketing and data groups to see what data exist c) Figure out how the increased sales goal should be broken down into metrics d) Run a conjoint analysis to see if existing products can be tweaked to be worth more money

Q227: Correct Answer d ) Run a conjoint analysis to see if existing products can be tweaked to be worth more money Explanations Even given the statement above, you don't yet have a complete view of the business problem. You don't know why the organization has chosen to focus its attention on increasing sales per customer. Without that, you don't know what margins are acceptable on those sales. You may assume that general business rules apply and that you should assume that any sales under a 20% margin are inherently unprofitable and should be rejected. But without surfacing and clarifying that assumption and many others, you don't know if it is valid or not. You have to ask and keep asking until you know what assumptions are valid. Again, choice

Q228 : Speaking of reviews , which of these groups should NOT be invited? a ) Data Group b ) Sales and marketing c ) Manufacturing d ) Contracts

Q228: Correct Answer a ) Data Group Explanation Here the most appropriate answer is choice A. This is important because if you go straight to looking at data, your hypotheses about what's important will be inherently biased by the existing data and explanations. If the answer were in your existing explanations, you probably wouldn't have the problem in the first place. But now that you have the initial set of drivers, you can start talking with your data group and decomposing your metrics to allocate the increased performance to performing groups. Any group with changing goals needs to be on your stakeholder list and part of the reviews. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 1

Q229 : A post office area manager received many complaints that the only branch she has in the north side of the town has a very long waiting time . She hired you as a consultant to recommend justifying opening new positions in her branch . What would be a relevant methodology to use? a ) Monte Carlo simulation b ) Queuing theory c ) Data mining d ) Linear programming

Q229: Correct Answer b) Queuing theory Explanation Queueing theory refers to the mathematical study of waiting lines or the queues formed by customers or processes. A queueing model is constructed so that queue lengths and waiting time can be predicted. This will make the Queuing theory most relevant methodology for management of queues in Post Office. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q230: A major aircraft manufacturing company is intending to determine the main causes for fatal failures in their battery system. The best methodology to be used to pin point the root cause is a) conduct a well-prepared design of experiments b) use historical data to relate failures to potential causes c) simulate the process with all the failure modes d) Choice B or C

Q230: Correct Answer d) Choice B or C Explanation It is better to use the historical data to study the failure and simulate the process to validate the failure modes. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q231: In mapping different X's to a Y, the advantage of using linear regression to a backpropagation artificial neural network (ANN) is: a) regression is more accurate in predicting Y's given X's compared to ANN b) regression can handle more variables than ANN c) regression handles data in a visible and transparent manner compared to ANN, which is perceived to be a black-box methodology d) regression is more able to handle outliers

Q231: Correct Answer c) regression handles data in a visible and transparent manner compared to ANN, which is perceived to be a black-box methodology Explanation In machine learning, there are a set of analytical techniques know as black box methods. What is meant by black box methods is that the actual models developed are derived from complex mathematical processes that are difficult to understand and interpret. This difficulty in understanding them is what makes them mysterious. One black method is artificial neural network (ANN). This method tries to imitate mathematically the behavior of neurons in the nervous system of humans. (Source: INFORMS Certified Analytics Professional (CAP®) Examination EXAMINATION STUDY GUIDE) CAP Domain: 4

Q232: You are given three months to solve an analytics problem and the needed data will require two months to collect. What would be the strategy with the best outcome? a) Wait until the data are available to choose the best methodology b) Refuse to work on this project c) Ignore the data and design a tool that fits all possible scenarios d) Start developing the model with a template containing approximate numbers

Q232: Correct Answer d) Start developing the model with a template containing approximate numbers Explanation When the majority of time is getting consumed in the process of data collection, it is advisable to do the simulation of the dataset or work on the approximations. Ignoring the data or refusal to work on it or to wait for the data are not the best of the practices in the field of Analytics. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 2

Q233: One good methodology to reduce the dimensionality of a set of data is to use a) principal component analysis (PCA) b) linear programming c) discrete-event simulation d) artificial intelligence

Q233: Correct Answer a) Principal component analysis (PCA) Explanation Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. (Source: INFORMS Certified Analyties Professional (CAPS) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 9

Q234: You are given a set of data to be utilized for a model. Their level of accuracy is within +/- 20%. What approach and/or software would you use for the problem? a) Approach and/or software that deals with data at + /-1% accuracy level b) Approach and/or software that deals with data at + /-0.01% accuracy level c) Approach and/or software that deals with data at +/-10% accuracy level d) Approach and/or software that deals with data at + /-30% accuracy level

Q234: Correct Answer c) Approach and/or software that deals with data at +/-10% accuracy level Explanation Approach and/or software that deals with data at +/- 10% accuracy level are always preferred. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q235: You are asked to establish a model to map many independent variables (X's) to one dependent variable (Y). The model should explain the level of significance of the X's to Y and their level of correlation. What is the first methodology to come to mind in this situation? a) Stepwise regression b) Fuzzy logic c) Artificial neural network d) Monte Carlo simulation

Q235: Correct Answer a) Stepwise regression Explanation Stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. It will help in mapping multiple independent variables on one dependent variable. It helps in computing the significant variables, level of correlation and the strength of the model. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q236: A factory has skilled workers that operate complicated equipment and there is a need to transfer the knowledge to new hires. The procedure cannot be explained in a crisp manner with exact numbers. For example, the operator cannot explain what the right temperature and pressure are to maximize the strength of the material at a certain condition. They simply just know by experience. One good candidate approach to model the variables and rules is a) fuzzy logic b) neural network c) linear regression d) logistic regression

Q236: Correct Answer a) fuzzy logic Explanation Fuzzy logic is a form of ma the truth values of variables between 0 and 1 inclusive. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. The above problem also gives an approximation rather than providing absolute values, therefore fuzzy logic is a better approach. (Source: INFORMS Certified Analvtics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q237: Visualization is more closely related to which of the following analytics methodology categories a) Prescriptive b) Descriptive c) Soft skills d) Predictive

Q237: Correct Answer b) Descriptive Explanation Descriptive analysis requires results to be visualized as compared to others techniques. Generally, slicing and dicing of the data is done during the descriptive analysis and that involves lots of visualizations to take place. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q238: A proper methodology to handle missing data is a) Principal component analysis b) Stepwise regression c) Decision tree d) Markov chain

Q238: Correct Answer b) Stepwise regression Explanation Missing data can be handled through various methods of data imputation. Data can be imputed with the help of measures of central tendency like mean, median or mode, depending on the type of data. Regression is a predictive method for computing the most likely value for the missing value in a particular column or row. Other techniques are not well-used methods for handling missing data. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 3

Q239: A chemical plant is under study to identify the bottleneck in its operation to facilitate scheduling. One proper methodology to model the plant is: a) System dynamics b) Discrete-event simulation c) Markov chain d) Fuzzy logic

Q239: Correct Answer b) Discrete-event simulation Explanation A discrete-event simulation (DES) models the operation of a system as a discrete sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in the system is assumed to occur; thus the simulation can directly jump in time from one event to the next. So discrete-event scheduling is best to identify the bottleneck in its operation to facilitate scheduling. (Source: INFORMS Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 4

Q242: When should you retire a model? a) When its replacement has been validated b) When a change in business conditions invalidate its assumptions c) Both a and b d) Neither a nor b

Q242: Correct Answer c ) Both a and b Explanation If a change in business conditions has occurred that invalidate the assumptions of the original model, a new or revised model should be fielded and tested and validated before being deployed as a replacement. (Source: INFORMS Certified Analytics Professional (CAP®) Examination, EXAMINATION STUDY GUIDE) CAP Domain: 7

Q244: In the context of decision-trees, which of the following is not correct? a) The decision-tree approach to decision-making is appropriate in those situations where a sequence of decisions is involved. b) In decision-trees, the probabilities of all events at chance nodes and the monetary evaluations of different must all be known in advance. c) In the decision-trees, no more than two alternatives courses of action can emanate from a decision node. d) The 'Rollback' technique of analyzing decision-trees involves working through it from right to left, evaluating the best course of action at later stages so as to decide on such action at the earlier stages.

Q244: Correct Answer c) In the decision-trees, no more than two alternatives courses of action can emanate from a decision node. Explanation More than two alternatives courses of action can emanate from a decision node in a decision tree. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 5

Q246: The measure of variation that is least affected by extreme observations is a) range b) mean deviation c) standard deviation d) all are equally affected

Q246: Correct Answer b) mean deviation Explanation Mean deviation is least affected by extreme observations. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; SChand & Sons Publication, India) CAP Domain: 3

Q249: The objective of Network Analysis is to a) minimize total project cost b) minimize total project duration c) minimize production delays d ) minimize interruptions in the process

Q249: Correct Answer b) minimize total project duration Explanation Total project duration is minimized as the prime objective of the Network Analysis. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 2

Q250: A project scheduling problem a) cannot be formulated as a linear programming problem b) can be formulated as a linear programming problem c) gives the least duration of the time in which it is completed d) updating of the project is not possible

Q250: Correct Answer b) can be formulated as a linear programming problem Explanation Project scheduling problem can be very well formulated as a linear programming problem. (Source: Operations Research By Kanti Swarup, PK Gupta and Manmohan; S.Chand & Sons Publication, India) CAP Domain: 2

Q145: According to the replacement theory, replace an item, tool or model when a) average annual cost for n years becomes equal to current/annual running cost b) next year running cost is more than the average cost of n* year c) present year's running cost is less than the previous year's average cost d) trend has diminished

a) average annual cost for n years becomes equal to current/annual running cost Explanation As per the replacement theory, an item should be replaced when the average annual cost previous n years becomes equal to the current/annual running cost. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 7

Q101: primary role of ETL tool is to perform following tasks, except. A) data extraction. B) data visualization. C) data transformation. D) data loading.

b) Data Visualization ETL stands for extract, transform and load, which are also the primary tasks of ETL tool. Data Visualization is also done by ETL tools but as a secondary option not primary.

Q150: The secretary of a school is taking bids on the city's four school bus routes. Four companies have made the bids for different routes. Assuming that each bidder will get only one route from the school, which of the following model is BEST to derive a solution to this problem? a) Transportation Model b) Job Sequencing Model c) Assignment Model d) Linear Programming Model

c) Assignment Model Explanation An assignment problem is a particular case of a transportation problem where the given resources are allocated to an equal number of activities with an aim of either minimizing total cost, distance, and time, or maximizing profits. The given problem can be solved by transportation and linear programming model, BUT Assignment Model is BEST suited for this problem based on its nature, constraints and assumptions. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 4

Q142: If Economic Order Quantity is calculated, but an order is then placed which is smaller than this, the variable cost will...? a) increase b) decrease c) either increase or decrease d) not change

d) no change Explanation Variable cost will not be a function of Economic Order Quantity and hence there will be no change. (Source: Operations Research by JK Sharma, Macmillan Publishing) CAP Domain: 5

Q 65: failing to reject a false null hypothesis is. A) type I error. B) type II error. C) type III error. D) type IV error

B) type II error With a type I error, the no hypothesis is true, but the researcher decides that it is not. In type II error the null hypothesis is false, but the decision is made to not reject it. As such there is no existence of type III and type IV error

Q 92: which of the following types of data does not have a fixed zero point? A) interval level data. B) ratio level data. C) both. D) none.

A) interval level data. Interval level data does not have a fixed zero. While ratio level data has a fixed zero..

Q 68: in a certain state, 25% of all the cars emit excessive amounts of pollutants. If the probability is. 99 that a car emitting excessive amounts of pollutans will fail the states vehiculary, mission test, and the probability is. 17 that a car not emitting excessive amounts of pollutants will never the less fail the test, what is the probability that a car that fails the test actually em amount of pollutans? A) .66 B) .33 C) .22 D) .11.

A) .66 Probability = (.25*.99)/((.25*.99) (.17*.75)) = .66.

Q 23: given that the sum of squares total is 22.5, sum of squares due to regression is 15.625, and sum of squares error is 6.875, for a regression model. The coefficient of determination, for this model would be: A) .6944 B) .3056 C) .44 D) .56.

A) .6944 R squared or the coefficient of determination is calculated as SSR/SST or 1 - (SSE/SST).

Q 34: perfect positive correlation among two variables is determined by which value of Pearson correlation coefficient are? A) 1 B) 0 C) infinite. D) 3.14

A) 1 R varies from -1 to 1 and has perfect positive correlation for value 1, perfect negative correlation for value -1, and no correlation for value of 0. Infinite and 3.14 are out of bound for R.

Q 22: which of the following is an analogous metric of adjusted R squared and logistic regression? A) AIC B) ROC curve C) accuracy. D) residual deviance.

A) AIC The analogous metric of adjusted R squared in logistic regression is AIC (Akaike information criteria). AIC is the measure of fit, which penalize the model for the number of model coefficients. Therefore, the model with the minimum AIC value is always preferred.

Q 51: A quality control chart that is used to control the number of defects per unit of output is. A) C chart B) P chart C) R chart D) X chart

A) C chart A quality control chart that is used to control the number of defects per unit of output is a C chart. A quality control chart that is used to control attributes is P chart. A process control chart that tracks the range within a sample; indicates that heat gain or loss of uniformity has occurred in the production process is R chart. A quality control chart for variables that indicates when changes occur in the central tendency of a production process is an X chart.

Q 41: the covariance of the two variables, normalized by the variance of each variable is termed as. A) correlation. B) standard deviation. C) partial variance. D) mean absolute deviation

A) correlation. Correlation between two random variables, ρ(X, Y) is the covariance of the two variables normalized by the variance of each variable. This normalization cancels the units out and normalize is the measure so that it is always in the range [0, 1].

Q 67: increasing the sample size of the study. A) decreases the confidence interval. B) increases the confidence interval. C) creates no impact on confidence interval. D) first increases then decreases.

A) decreases the confidence interval. Increasing the sample size decreases the width of the confidence interval because it decreases the standard error.

Q 72: the least likely method of data transformation is. A) dimensionality reduction. B) normalization. C) smoothing. D) aggregation.

A) dimensionality reduction. Dimensionality reduction is used to reduce the data set size and not used for transformation unlike other given methods of normalization, aggregation and smoothing.

Q 6: SLIQ and Sprint are used for. A) induction of decision trees from very large training sets. B) bootstrapping to create several small samples of a training set for decision tree construction. C) pre-pruning the decision trees. D) post pruning the decision trees.

A) induction of decision trees from very large training sets. SLIQ and Sprint are algorithms used for addressing scalability issues of decision trees construction from very large training sets. Boat is bootstrapped optimistic algorithm for tree construction which is used for creating small samples of training sets while constructing a tree. Pruning is the reduction of a tree to remove branches reflecting anomalies in a decision tree.

Q 9: Which of the following is an assumption of discrete events simulation? A) no change in the system would occur between the events. B) the changes in the system state are continuous. C) time is not an important factor for simulation. D) they run slower as compared to non-discrete event simulation

A) no change in the system would occur between the events. The primary condition for discrete event simulation is that no change in the system would occur between the events. The changes are not continuous, but discrete in nature. Time is always an important factor in all type of simulations. And since this approach doesn't need to simulate for every time slice, therefore, it is assumed to be much faster than continuous simulations.

Q 76: which of the following as a data reduction technique? A) principal component analysis. B) data imputation. C) binning D) schema integration.

A) principal component analysis. Data inutation refers to filling the missing values; bending method is used to smooth in the data; schema integration is used for data integration. PCA is used for data reduction.

Q 25: which of the following is not an assumption to regression modeling with respect to the error terms? A) the errors are dependent. B) the errors are normally distributed. C) the errors have a mean of 0 D) the errors have a constant variance.

A) the errors are dependent. One important assumption while performing regression is that the errors are independent.

Q 20: based on the output of a regression model given in Q 19, which of the following statements is most relevant, considering a 95% confidence interval? A) the regression model is statistically significant (p value 6.84e-07) B) the regression model is statistically not significant (p value 6.84e-07) C) the intercept of the regression model is statistically significant (p value .7154) D) the number of predictors and the regression model is too low (5 total predictors)

A) the regression model is statistically significant. Based on the P value of the model (which is less than .05 at a 95% CI), it can be said that the regression model is statistically significant. The P value of the intercept is not less than .05, therefore it is not statistically significant. We cannot comment on the number of predictors of the regression model by looking only at this output information.

Q 64: rejecting a true null hypothesis is. A) type I error. B) type II error. C) type III error D) type IV error

A) type I error With a type I error, the null hypothesis is true, but the researcher decides that it is not. In type II errors, the null hypothesis is false, but the decision is made not to reject it. As such, there is no existence of type III and type IV error.

Q 40: if a set of data is normally distributed or bellshaped, as per the empirical rule, approximately how much percentage of the data values are within two times the standard deviation of the meme? A) 68% B) 95% C) 99.70% D) 99.90%.

B) 95% As per the empirical rule, 68% of data is within one standard deviation, 95% is within two standard deviations, and 99.7% is within three standard deviations of the mean.

Q 73: statistical test used for missing value analysis is. A) ANOVA B) Littles MCAR test C) MANCOVA D) principal component Analysis.

B) Littles MCAR test. Missing value complete at random test was devised by Little to check whether missing values are random or following any specific pattern

Q 52: a quality control chart that is used to control attributes is. A) C chart B) P chart C) R chart D) X chart

B) P chart A quality control chart that is used to control the number of defects per unit of output is a C chart. A quality control chart that is used to control attributes is P chart. A process control chart that tracks the range within a sample; indicates that heat gain or loss of uniformity has occurred in the production process is R chart. A quality control chart for variables that indicates when changes occur in the central tendency of a production process is an X chart.

Q 87: a contingency table is required for which of the following statistical tests? A) F test B) chi-squared test C) t-test. D) Z test

B) chi-squared test. Chi-squared test uses the observed and expected frequency of categorical data from the contingency table. Other tests are based on mean and variance of the data.

Q 81: which of the following is a type of random sampling? A) snowball. B) cluster. C) convenience. D) judgment.

B) cluster. Cluster sampling, also known as area sampling, is a type of random sampling.

Q 75: systematic approach of decomposing tables to eliminate data redundancy, an undesirable characteristics, like insertion, update, and deletion anomalies is known as. A) data reduction. B) database normalization. C) database connectivity. D) database segregation..

B) database normalization. Database normalization involves systematic decomposition of tables to minimize redundancies, minimizing the dependencies, and integrating the data to avoid anomalies.

Q 98: what does the interquartile range measure? A) difference between first and fourth quartile B) difference between first and third quartile. C) difference between second and fourth quartile D) difference between second and third quartile

B) difference between first and third quartile. An interquartile range is the measure for the middle 50% of the data. So it measures the difference between first and third quartile of the data.

Q 29: a sure way of removing multicollinearity from the model is to A) work with panel data. B) drop a variables that cause multicollinearity in the first place. C) transform the variables by first differencing them. D) obtain additional sample data.

B) drop the variables that caused multicollinearity in the first place. Other options may or may not improve the multicollinearity but the dropping the variable certainly improves the multicollinearity in the model.

Q 19: Based on the output of a regression model given below which of the following is a significant predictor in the model at a 95% confidence interval? A) area (p-value .2963) B) elevation (p-value 3.8e-06) C) nearest (p-value .9932) D) scruz (p-value .0003)

B) elevation Based on the P values of the predictors, only elevation has received a lower value, then .05 (for a 95% confidence interval), therefore, elevation is the significant predictor in the model. Adjacent is also a significant predictor. The others are not.

Q 24: considering all other parameters to be constant, which is the most appropriate explanation for a new predictor variable to remain in the regression model? A) if the adjusted R squared of the model decreases B) if the adjusted R squared of the model increases C) if the adjusted R squared of the model remains constant. D) new variables, inclusion in the model does not depend on adjusted R squared.

B) if the adjusted R squared of the model increases Generally, the new variables should increase the adjusted R squared of the model, which means that strength of the prediction model has increased.

Q 70: what is true about naïve Bayes classifier? A) it is not based on base theorem. B) it assumes class condition of independence. C) it necessarily complicates the computation. D) it does not need a dependent class variable.

B) it assumes class conditional independence. Bayesian classification is based on bayes theorem. It assumes that the effect of an attribute value on a given class is independent of the values of the other attributes, i.e. it assumes class conditional independence. It is made to simplify the computations involved, and in this sense, it is considered as naïve. Every classifier must have a dependent class variable which needs to be classified using different algorithms.

Q39: the amount of peakedness of a distribution is measured by. A) skewness. B) kurtosis. C) interquartile range. D) standard deviation.

B) kurtosis Kurtosis describes the amount of peakedness of a distribution. It may be leptokurtic, platykurtic or mesokurtic distribution.

Q 85: the specifications for a certain kind of ribbon call for a mean, breaking strength of 185 pounds. If five pieces are randomly selected from different roles, have breaking strengths of 171.6, 191.8, 178.3, 184.9, and 189.1 pounds, which is the most appropriate statistic to test the null hypothesis μ = 185 pounds against the alternative hypothesis of μ < 185 pounds at the .05 level of significance? A) paired t-test B) one sample t-test C) one sample Z test. D) F test

B) one sample t-test. Since the value of sample size (n) is less than 30 and variances unknown ties, the one sample t-test is the best statistic to test the hypothesis. Had n been greater than 30, we would have gone for one sample. The test with Variance still unknown. F test is generally used when variance is known to us.

Q47: Annabelle wants to open a small apparel sharp in Vienna. She has located a good mall that attracts Rhett customers. Her options are to open a small shop, a medium size shop, or no shop at all. The market for an apparel shop can be good, average, or bad. The probabilities for these three possibilities are .2 for a good market, .5 for average market, and .3 for bad market. net profit or loss for the different size shops is given in the table below. Which is the best decision based on EMV criterion. For good, average, bad respectively: Small shop = 75k, 25k, -40k Medium = 100k, 35k, -60k No shop = 0, 0, 0 A) open a small shop. B) open a medium size shop. C) open no shop. D) data insufficient.

B) open medium size shop EMV (small shop) = (.2) (75,000) + (.5) (25,000) + (.3) (- 40,000) = 15,500 EMV (medium sharp) = (.2) (100,000) + (.5) (35,000) + (.3) (-60,000) = 19,500 EMV (no shop) = 0. Best decision based on EMV is opening a medium size shop.

Q 82: a type of service sampling in which the survey subjects are selected based on referral from other survey. Respondents is known as. A) quota sampling. B) snowball sampling. C) judgment sampling. D) stratified, random sampling.

B) snowball sampling. Cora involves specific group, judgment is based on researchers judgment select the sample, while considering a strata in random sampling is stratified, random sampling.

Q 35: which is the most appropriate conclusion about correlation among the variables shown through the following scatterplot? Description of plot: all of the points are in a fairly straight line, starting from the top left corner all the way down to the bottom right corner. A) strong positive correlation. B) strong negative correlation. C) no correlation. D) mild positive correlation.

B) strong negative correlation. The figure shows that as the quantity on the X axis increases, the quantity on the Y axis decreases. And variation is very low in the scatterplot. Therefore, it can be concluded that there exist a strong negative correlation among the variables.

Q 18: which of the following is not a restriction of performance lift chart of an analytics model? A) lift charts require that the predictable attribute be a discrete value. In other words, you cannot use lift charts to measure the accuracy of models that predict continuous numeric values. B) to see prediction accuracy lines for any individual value of the predictable attribute, you need not create a separate lift chart for each target value. C) you cannot display timeseries models in a lift chart. D) you can add multiple models to a lift chart, as long as the models all have the same predictable attribute

B) to see prediction accuracy lines for any individual value of the predictable attribute, you need not create a separate lift chart for each targeted value. The prediction accuracy, for all discreet values of the predictable attribute is shown in a single line. If you want to see prediction accuracy lines for any individual value of the predictable attribute, you must create a separate lift chart for each targeted value.

Q 86: for comparison of two kinds of paint, a consumer testing service finds that for 1 gallon cans of one brand cover on the average 546 ft.² with a standard deviation of 31 ft.², whereas for 1 gallon cans of another brand cover on average 492 ft.² with a standard deviation of 26 ft.². Assuming that the two population samples are normal, and have equal variances, which is the most appropriate statistic to test the null hypothesis μ1 - μ2 = 0 against the alternative hypothesis μ1 - μ2 > 0 at the .05 level of significance? A) paired t-test B) two sample t-test. C) two sample Z test. D) chi-squared test

B) two sample t-test. Both the samples are small (less than 30) and for both populations variance is unknown. Therefore, the choice of a two sample t-test is most appropriate. Had the sample size been large (greater than 30), two sample Z test would have been appropriate choice. Chi-squared is used on categorical types of data. Paired t-test can never be applied on two different samples.

Q 84: when do we generally reject the null hypothesis? A) when test statistic value is less than the tabular value of test statistic B) when P value is less than the alpha (level of significance) C) when sample size is very large. D) when test statistic value is greater than alpha (level of significance)

B) when P value is less than the alpha (level of significance) Null hypothesis is rejected only in two cases - if P value is less than alpha or calculated test statistic is greater than tabular value

Q 21: weather department wants to use analytical models to predict whether it will rain tomorrow or not, based on information available on the geography of the area, demographics, weather updates, and census data sets. Which is the most appropriate model, which can solve this problem? A) panel data regression. B) simulation. C) logistic regression D) MANOVA

C) logistic regression. Since the dependent variable is binary in nature, it is most appropriate to use logistic regression model.

Q46: which is the most suitable method to compute the risk value of a portfolio? A) maximax criterian B) logistic regression C) Monte Carlo simulation D) linear programming.

C) Monte Carlo simulation. Monte Carlo simulation uses probability distribution for modeling, a stochastic or random variable. Different probability distributions are used for modeling input variables, such as normal, log normal, uniform, and triangular. From probability, distribution of input, variable, different paths of outcome are generated. Compared to deterministic analysis, the Montecarlo method provides a superior simulation of risk. It gives an idea of not only what outcome to expect, but also the probability of occurrence of that outcome. It is also possible to model correlated input variables. For instance, Monte Carlo simulation can be used to compute the value at risk of a portfolio. This method tries to predict the worst return expected from a portfolio, given a certain confidence interval for a specified time. Normally, stock prices are believed to follow a geometric Brownian motion (GMP), which is a mark off process, which means a certain state follows. A random walk in its future values is dependent on the current value.

Q 7: As simulation is not an analytical model, therefore, the results of simulation must be viewed as. A) unrealistic. B) exact. C) approximation. D) simplified.

C) approximation. The results obtained from simulation or approximations based on systems characteristics. They are not exact or simplified. Neither are they unrealistic, because simulations are done based on systems characteristics only.

Q 99: while analyzing a data set, the analyst detected some outliers in the data set. What should be the most appropriate first step after outlier detection? A) immediately delete the outliers. B) ignore the outliers and proceed for next stage of analysis. C) check the accuracy and appropriateness of the outliers. D) discard the data set an report to the manager.

C) check the accuracy and appropriateness of the outliers. Checking, the accuracy and appropriateness of the outliers is always the first step. The data might be correct and outliers may be a genuine part of the data set. Then the next step should be to decide whether to delete them, ignore them, or treat them based on other characteristics.

Q 36: which is the most appropriate test to find whether the beverage preference (i.e. tea, soda, other) is impacted by age group or not? A) independent T test. B) ANOVA C) chi-squared. D) F test

C) chi-squared. Since the data is categorical in nature, chi-squared is the most suitable test to answer the query based on a given data set.

Q 33: the regression coefficient is estimated in the presence of autocorrelation in the sample data or not A) unbiased estimators B) consistent estimators. C) efficient estimators. D) linear estimators.

C) efficient estimators. They are not efficient estimators, and must not be reliable on for any decision-making.

Q 97: stem and leaf display is primarily used for. A) predictive analysis. B) simulation. C) exploratory data analysis. D) decision trees.

C) exploratory data analysis. Stem and leaf display is primarily used for exploratory data analysis. The data is organized in the form of stem and leaves. It is easy to construct and can provide more information within a class interval than any other method.

Q 42: which statement is true about covariance? A) covariance is a quantitative measure of the extent to which the deviation of one variable from its median matches the deviation of the other from its median. B) if they covariance of two random variables zero, they must be independent. C) if two random variables are independent, their covariance is 0 D) the scatterplot will show no observations if covariance of two random variables is zero.

C) if two random variables are independent, they're cool Ariens zero Firstly, for covariance deviation around mean is considered, and not median; secondly, even if the covariance is zero, two random variables can still be dependent; certainly the scatterplot will show no pattern, but observations will certainly be plotted for covariance equals zero for two random variables. It is necessary that if two random variables are independent, their covariance must be zero.

Q 69: a certain lottery works by picking six numbers from 1 to 49. It costs one dollar to play the lottery, and if you win, you win $2 million after taxes. If you play the lottery every week for 10 years, what are your expected winnings or losses (approximate value)? A) gain of $86 B) loss of $860. C) loss of $447 D) gain of $44.7.

C) loss of $447. E (X) = Σx*P (X). Here X is the amount of losing or winning a bed. X equals -1 dollar on losing, X equals 2*10^6 of winning. Probability of winning equals 1/49C6 = 7.2*10^-8. Therefore, for every bet on one dollar, E (X) = (- 1 *.999999928) + ((2*10^6)*(7.2*10^-8)) = - $.86. This implies that there is a negative expected value. A person loses $.86 on every bit of one dollar. For 52 weeks in a year and for 10 years the losing amount would be 520×.86 = $447.2

Q 48: a criterion that minimizes the maximum opportunity loss while making a decision is. A) criterion of realism. B) criterion of equally likely. C) minimax regret. D) Maxi Max criterion.

C) minimax regret. Realism - uses a weighted average of best, and worst possible payoffs for each alternative; equally likely - places an equal weight on all states of nature; Maxi max - and opportunistic decision making that selects the alternative with highest possible return.

Q 10: A restaurant is running to its full capacity and does not have resources to accommodate to the request of new customers. Analytics consultant has proposed to optimize. The current set up as there is no scope of changing the current set up and move to a new set up. The consultant is given a data sheet for service patterns, inventory management, staff, working profiles, orders served per day, new target customers and their demands, and frequency of product selling. What should be the next step for the consultant? A) forecast the future order requirements or predict the demand. B) review the customer and staff satisfaction levels and compare them using statistical testing methods. C) study the data, find the best optimization model to be used, and map the data to the Problem. D) used discrete event simulation to find the approximation of the current complex situation.

C) study the data, find the best optimization model to be used, and map the data to the problem. Finding the best optimization model to be used is the best way forward as the need for optimization have already been proposed. There is no point of forecasting as we already know that new customers are coming to the already packed system. There is no point in simulating the current process as real time. Data is available to manage the resources. Also, the satisfaction levels of both customer and staff seem to be reasonably high as services are in full capacity demand.

Q 26: which of the following is most appropriate assumption to regression modeling with respect to predictor variables? A) predictor should be perfectly correlated to each other. B) predictor should have zero variance. C) the number of observations or data rows should be more than the number of predictor variables D) predictor variable should not have outliers.

C) the number of observations or data rows should be more than the number of predictor variables. There should be no correlation among the predictor variables. They should have positive variance amongst themselves. It is desirable to have no outliers in the data, but it is not an assumption of regression modeling. Important fact is that the number of observations must be greater than the number of predictors, else modeling would be incorrect.

Q 57: which of the following is not true for binomial distribution? A) it is a discrete distribution. B) it involves identical trials with only two possible outcomes. C) the probability of getting a success or failure on one trial may vary throughout the experiment. D) each trial is independent of the previous trial.

C) the probability of getting a success or failure on one trial may vary throughout the experiment. The probability of getting a success or failure on one trial, must remain constant throughout the experiment.

Q 54: which of the following is a type of discrete probability distribution? A) normal B) uniform C) chi-squared. D) Poisson

D) Poisson All others are types of continuous probability distributions.

Q 78: which of the following is not an OLAP operation on multi dimensional data? A) roll up. B) drill down C) slice and dice. D) SQL injection

D) SQL injection. OLAP operations include rollup, drill down, slice and dice, pivoting, drill through, drill across, etc. SQL injection is not an OLAP operation.

Q 31: which of the following is not a source of heteroscedasticity? A) skewness in the regressor's B) outliers in the regressor's C) incorrect data transformations. D) addition of significant regressor's.

D) addition of significant regressor's. Addition of significant regressor is never creates heteroscedasticity, in fact, correctness of models supports homoscedasticity. Outliers, skewness, and incorrect transformations, certainly impact the model and increase heteroscedasticity.

Q 71: which of the following is least likely method of missing value treatment? A) filling values with attribute mean B) using a global constant to fill the missing value. C) using the most probable value. D) conducting the transaction again to obtain the missing value.

D) conducting the transaction again to obtain the missing value. Collecting the data again is a costly process, so it is least likely to fill the missing value. In fact, sometimes the tuple is ignored, but not recollect it. Filling with mean, median, mode, global constants, and most probable values are some of the ways to deal with missing values.

Q 100: in the box, plot graph, the upper limit in lower limit or mathematically related to? A) median. B) range. C) frequency. D) interquartile range.

D) interquartile range. In a box plot, the upper limit and lower limit is 1.5 times the interquartile range. Points beyond the upper and lower limit are considered as outliers. It is an important graph for exploratory analysis.

Q 60: which of the following is false per normal distribution? A) area under the curve is 1 B) it is a symmetric distribution about its mean. C) it has a bell shaped curve. D) it is a discrete distribution.

D) it is a discrete distribution. Normal distribution is a form of continuous distribution.

Q 56: which of the following is false for exponential distribution? A) it is a continuous distribution. B) it is skewed to the right. C) the curve steadily decreases as X gets larger. D) it is characterized by multiple parameters.

D) it is characterized by multiple parameters. Exponential distribution is characterized by only one parameter, which is lambda.

Q 1: Which of the following is not a part of the process of knowledge discovery from data (KDD)? A) data cleaning B) data mining. C) knowledge presentation D) knowledge transfer

D) knowledge transfer. KDD process includes seven steps - data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation. Knowledge transfer is not a part of KDD but it is a part of training program.

Q 74: the objective of database normalization is two A) maximize redundancy. B) clean the data. C) smooth and the data D) minimize dependencies.

D) minimize dependencies. Database normalization involves minimizing redundancies, minimizing the dependencies, and integrates the data. It does not clean or smooth in the data.

Q 17: which is a better prediction model based on performance lift? A) model having constant lift curve running parallel to the X axis. B) model having overlapping lift curve to the base line. C) model having small area between the lift curve and baseline. D) model having large area between the lift curve and the baseline.

D) model having a large area between the lift curve and the baseline. Performance lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. Cumulative gains and lift charts are visual aids for measuring model performance. Both charts consist of a lift curve and a baseline. The greater the area between the lift curve and the baseline, the better is the model.

Q 8: While assigning random numbers in Monte Carlo simulation, it is. A) not necessary to assign the exact range of random number interval as the probability. B) necessary to assign the particular appropriate random numbers. C) not necessary to develop a cumulative probability distribution. D) necessary to develop a cumulative probability distribution.

D) necessary to develop a cumulative probability distribution. Monte Carlo simulation technique involves conducting repetitive experiments on the model of the system under study, with some known probability distribution to draw random samples using random numbers. It involves setting up a probability distribution for variables; building a cumulative probability distribution For each random variable; generating random numbers; and conducting the simulation experiment using random sampling.

Q 32: when error terms across timeseries data or inter-correlated, it is known as. A) cross correlation. B) cross autocorrelation. C) special autocorrelation. D) serial autocorrelation.

D) serial autocorrelation. When terms across sections of data are correlated, then it is special autocorrelation, while across timeseries, it is known as serial autocorrelation.


Related study sets

APUSH FINAL 1st Semester (Packets of Practice Questions)

View Set

PHR FLASHCARDS ONWARD OPPORTUNITY

View Set

Chapter 27- Florida laws and Rules pertinent to Life insurance

View Set

CNA 221 | Ch. 6, Implementing Remote Access

View Set

Marketing Chapter 6, 8, 11, 12, 15, 19

View Set