WPC 300 Final
A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be: $6900 $2090 $3850 $7400
$6900
The value of R-Squared always falls between ________ and ________, inclusive. -1 and +1 -infinity to + infinity 0 and -1 0 and 1
0 and 1
What is the confidence level when the level of significance is 0.07? 7% 0.930 0.970 0.093
0.930
The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean? 4.00 3.00 3.46 10.00
3.00 sigma/sqrt(n) = 12/sqrt(16) = 12/4 = 3
The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true? 90% of the repair cost will be explained by the age of the vehicle 81% of money spent on repairs is explained by the age of the vehicle 81% of the variation in the money spent on repairs is explained by the age of the vehicle 90% of the money spent on repair is explained by the age of the vehicle
81% of the variation in the money spent on repairs is explained by the age of the vehicle
Which of the following statement is true based on the following regression equation? IQ = 4.0 + Reading Label * 5.6 A unit point change in reading label will increase IQ by 5.6 point. Reading label is not a good predictor of IQ. A unit point change in IQ will result in 9.6-point increase in reading label. A unit point change in IQ will result in 5.6-point increase in reading label.
A unit point change in reading label will increase IQ by 5.6 point.
You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table? Airport Code City Address State
Airport Code
What kinds of bias could show up when collecting data? Framing effect Sampling bias Self-selection bias All of the answer selections are correct
All of the answer selections are correct
Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion? Who will win the NBA championship? How many students will enroll for an online class in the Spring? What is the purchase pattern(s) of our customers? All of the answer selections are correct.
All of the answer selections are correct.
A/B testing can help marketers to Increase more clicks to their website Increase more likes to their social media sites Increase more sales All of the answers are correct
All of the answers are correct
An ideal machine learning process needs Large volume of data Highly granular data Extremely diverse data All other answer are true.
All other answer are true.
In order to reject the null hypothesis, the p-value must be less than the Alpha Standard deviation Variance Degrees of freedom
Alpha
Which of the following is not a component of the relational database? Relationship among rows in tables Tables Analysis Metadata
Analysis
Over-reliant on the first piece of information is called ____________ Bandwagon effect Zero risk bias Clustering illusion Anchoring bias
Anchoring bias
You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias? Polls are completed only by visitors to the site Those with an interest in your mission are the ones to participate in the survey Some individuals are less likely to participate because they have strong opinions against ASU education standard. Anonymously data collection by hiding ASU brand in the survey question
Anonymously data collection by hiding ASU brand in the survey question
You bought a top-of-the-line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here? Bandwagon effect Zero-risk bias Overconfidence Endowment effect
Bandwagon effect
Which of the following is a step of agglomerative hierarchical clustering? By joining two clusters farthest away from each other By joining two clusters that not at a Euclidean distance By joining two clusters that are closest to each other By separating cluster into two finer groups
By joining two clusters that are closest to each other
A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer? By asking the customer if he is planning to default the loan or not By utilizing a simple linear regression model developed by an in-house analyst By asking his colleague if he knows the person By utilizing a multiple logistic regression model developed by an in-house analyst
By utilizing a multiple logistic regression model developed by an in-house analyst
Which of the following is an example of unsupervised machine learning? Artificial neural networks Clustering Decision tree Logistic regression
Clustering
In data extraction process for an ETL tool, which of the following is not an example of legit data source? Competitions' data Online Line Transaction data Point of Sales data Customers' social media data
Competitions' data
When sample size increases Confidence interval remains the same Confidence interval increases Confidence interval decreases Standard deviation of the sample mean increases
Confidence interval decreases
In classification problems, the primary source for accuracy estimation of the model is ________. Confusion matrix Logit Odds ratio Probability of success
Confusion matrix
The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known. Effect summary table Parameter estimates table ANOVA table Confusion matrix
Confusion matrix
Which of the following is not an application of clustering analysis? Market segmentation analysis Web click stream analysis Crime prediction analysis Collaborating filtering analysis
Crime prediction analysis
In an ETL process, data is loaded into a final target database such as: Data warehouse Operational dashboard Public database Social media database
Data Warehouse
Which of the following is not a standard practice in "Data Transformation" process of an ETL tool? Data aggregation Splitting data fields Data extraction from ERP Change of data format
Data extraction from ERP
Which of the following statement(s) about charts is true? Data ink can sometimes help tell a richer story The more data ink, the better A useless chart is called "chart junk" We should make as many grids as possible in a chart
Data ink can sometimes help tell a richer story
Which of the following statements below is false about supervised data analysis? The data is labeled for supervised analysis Data is not labeled for supervised analysis Multiple linear regression analysis is a type of supervised data analysis Logistic regression analysis is a type of supervised data analysis
Data is not labeled for supervised analysis
In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______. Original Database Data warehouse Online analytical processing Master data management
Data warehouse
Which of the following techniques is a modern update of artificial neural networks? Clustering Decision tree Deep learning Logistic regression
Deep learning
Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using? Predictive analytics Descriptive analytics Forecast analytics Prescriptive analytics
Descriptive analytics
What are the four types of data analytical method? Critical, analytical, predictive and explanatory Descriptive, analytical, predictive and prescriptive Descriptive, explanatory, predictive and prescriptive Descriptive, logical, predictive and prescriptive
Descriptive, explanatory, predictive and prescriptive
Which of the following is not one of the processes involved in data cleaning? Matching Consolidating Parsing Encrypting
Encrypting
When you buy a new car, you value it more than the price you paid because of: Zero-risk bias Endowment effect bias Sunk cost fallacy None of the answer selections are correct
Endowment effect bias
Which of the following statements is true? Experimentation is a way of analytical thinking Using intuition is a way of analytical thinking Analytical thinking is not based on facts Heuristic thinking is slow
Experimentation is a way of analytical thinking
Which of the following is an example of secondary data? Interview data Simulated data Survey data Firm's proprietary data
Firm's proprietary data
Which of the following is an example of association rule learning? How frequently an item set occurs in a transaction How frequently a cluster can be formed in a given transaction The association between customers and what they purchase How frequently items are purchased in a group of transaction
How frequently an item set occurs in a transaction
When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______. INNER JOIN GROUP BY COUNT ORDER BY
INNER JOIN
Which of the following statements is not true about artificial neural networks? In the hidden layer of the networks, input data is hidden The input layer in the network receives the data The learning process is similar to our brain The network is modeled after the human brain in which brain cells work in a network
In the hidden layer of the networks, input data is hidden
Deleting the grid lines in a chart Decreases the data-ink ratio Increases the lie-factor Decreases the lie-factor Increases the data-ink ratio
Increases the data-ink ratio
Artificial Intelligence _______ Cannot be used for retail industry Is a broad science of mimicking human abilities Does not depend on machine learning Is a specific subset of machine learning
Is a broad science of mimicking human abilities
AI is not embraced everywhere in every industry because _______. It is not very well understood It is not very well developed It is likely to fail in the future It can be operationally expensive
It can be operationally expensive
Which of the following is true about multi-collinearity? The P-value reduces significantly, leading to rejection of the null hypothesis. It is measured using a measure called variance inflation factor (VIF). The effect of an independent variable on the dependent variable becomes easy to isolate. The regression coefficients become clearer and are easier to interpret.
It is measured using a measure called variance inflation factor (VIF).
Which of the following statements is NOT true about experimental studies to compare two treatments? We can design experiments to minimize any bias in the comparison. Experiments allow us to set up a direct comparison between the treatments of interest. It is not easy to control uncertainties in the comparison.. We can design experiments so that the error in the comparison is small.
It is not easy to control uncertainties in the comparison..
Which of the following describes the standard deviation? It is the average of the greatest and least values in the data set. It is the square of variance It is the square root of the variance. It is the difference between the first and third quartiles of a data set.
It is the square root of the variance.
In developing spam filter algorithms, we need Labeled data of spam emails Unlabeled data of spam emails Unlabeled data of non-spam emails Labeled data of both spam and non-spam emails
Labeled data of both spam and non-spam emails
One of the processes in ETL is Treatment Load Transition Extend
Load
The final stage of an ETL process is: Transform Load Extract Data Analysis
Load
In logistic regression, the dependent variable y is defined as: Log (p/1-p) Log(1/p) Log (1/1-p) Log (1-p)
Log (p/1-p)
In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________. Log of Y Logit Odds ratio Odds
Logit
In a cluster analysis, the distance between the clusters should be: Zero Even Minimized Maximized
Maximized
Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be people's mood Money organic food work ethics
Money
If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer? Simple logistic regression Multiple logistic regression Simple linear regression Multiple linear regression
Multiple logistic regression
Which of the following biases cannot be categorized as a cognitive bias? Groupthink Anchoring Bias Sunk cost fallacy None of the answer selections are correct
None of the answer selections are correct
Which of the following is not a drawback of analytical decision making? Delayed action Lack of flexibility Frustration in teams None of the answer selections are correct
None of the answer selections are correct
Which of the following statement(s) about charts is false? None of the other answers are false A chart should have graphical integrity A chart should tell a story A chart should minimize graphical complexity
None of the other answers are false
Which of the following proposition describes an existing theory or belief? Null hypothesis Proportion Standard deviation Alternative hypothesis
Null hypothesis
Which of the following tools help in periodic managerial decision-making? Servers OLTP Database OLAP
OLAP
After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________. . Experimental study Observational Study Comparative study None of the other answers is correct
Observational Study
A person who is convinced he is gaining admission to Harvard by merely applying is suffering from: Gambler's fallacy Overconfidence Zero-risk bias None of the answer selections are correct
Overconfidence
In an agile approach of analytics what is the first step of the process? Perform data discovery Score and deploy Perform business discovery Model data
Perform business discovery
What best describes the nature of a rose diagram? Rarely used for azimuthal data Represents various species of flowers Uses various colors to represent cause of mortality Plots data using a circular historical plot
Plots data using a circular historical plot
Which of the following examples is not an application of AI? Monitoring epidemics and diseases and stopping them from spreading Predicting human behavior by reading natural language used Predicting the exam score by scanning the appropriate textbook Optimizing traffic patterns over time
Predicting the exam score by scanning the appropriate textbook
Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using? Predictive analytics Prescriptive analytics Explanatory analytics Descriptive analytics
Predictive analytics
Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance. Explanatory analytics Prescriptive analytics Descriptive analytics Forecast analytics
Prescriptive analytics
Which of the following data analysis models use optimization techniques? Predictive analytics Prescriptive analytics Diagnostic analytics Descriptive analytics
Prescriptive analytics
Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this? Prescriptive analytics Predictive analytics Descriptive analytics Explanatory analytics
Prescriptive analytics
Which of the following is an important task of a database management system? Provides unauthorized access to data when authentication fails Provides support such as performing maintenance and routine backups. Helps collect data from vendors Helps create rules for data analysis
Provides support such as performing maintenance and routine backups.
_______ ensures that related data exist in parent table before allowing an entry into a child table. Referential integrity Data redundancy SQL Data Integrity
Referential integrity
The unexplained variance in the regression analysis is also known as: Predicted variance Total variance Residual variance Regression variance
Residual variance
The SQL code to extract only first_name information for all records of the "Actor" table below is: SELECT * FROM Actor WHERE first_name = "NICK"; SELECT * FROM Actor; SELECT first_name FROM Actor; SELECT Actor FROM first_name;
SELECT first_name FROM Actor;
"Google Doc" is an example of _______ in a could computing environment. SaaS IaaS Virtualization PaaS
SaaS
Which of the following category of data mining you would use for Spam filtering of emails? Supervised Unsupervised Both supervised and unsupervised Heuristics
Supervised
Which of the following statements below is true about supervised/unsupervised machine learning? Unsupervised learning requires labeled data for training Supervised learning require unlabeled data for training Unsupervised learning require no supervision from human Supervised learning requires labeled data for training
Supervised learning requires labeled data for training
Which of the following statements is a reason not to use a table for data visualization? Large amount of information can be included in a very small space Tables cannot easily show trends Tables display more information in less space than a chart The table has more precise numbers
Tables cannot easily show trends
In the Target story, why did Target send the teen daughter maternity ads? Target was sending ads to all women in a particular neighborhood Target was using special promotion that targeted all teens in her geographical area Target analytic model confused her with an older woman with a similar name Target analytics model suggested she was pregnant based on her buying patterns
Target analytics model suggested she was pregnant based on her buying patterns
Which of the following is an ETL vendor? MySQL Teradata Tableau JMP
Teradata
Which of the following is true of hierarchical clustering? All clusters must have more than one object in it The data partition does not occur in a single step All clusters must have the same number of data No single cluster can have all data
The data partition does not occur in a single step
Which of the following violates the principle of data visualization? The lie-factor should be closely equal to 1 Avoid unnecessary chart junk The chart should tell a story The data-ink ratio should be higher than 1
The data-ink ratio should be higher than 1
Which of the following is a definition of distance between two clusters in a complete linkage clustering? The distance between the least distant pair of objects, one from each group The sum of square of the distance between clusters The average of distance between all pairs of objects, where each pair is made up of one object from each group The distance between the most distant pair of objects, one from each group
The distance between the most distant pair of objects, one from each group
Which are useful principles for data visualization? It is important to include pointed arrows whenever possible to really draw attention to the eyes Including as many grids as possible is vital for fully specifying the data to be represented The use of a wide range of colors is critical to emphasize distinctions The graph suggests a possible true effect
The graph suggests a possible true effect
Which of the following is a Type-I error? The null hypothesis is actually false, but the test incorrectly fails to reject it. The null hypothesis is actually true, but the hypothesis test incorrectly rejects it. The null hypothesis is actually false, and the test correctly rejects it. The null hypothesis is actually true, and the hypothesis test correctly fails to reject it.
The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.
Which of the following is an example of a sample? The population of Canada The number of IT employees out of all employees working in an office of Google The number of individuals who have a Ford car The number of members in the Democratic party
The number of IT employees out of all employees working in an office of Google
Which of the following is a difference between the t-distribution and the standard normal (z) distribution? The standard normal distributions' confidence levels are wider than those of the t-distribution The t-distribution has a larger variance than the standard normal distribution The standard normal distribution is dependent on parameters like degree of freedom, while t-distribution is not. The t-distribution cannot be calculated without a known standard deviation, while the standard normal distribution can be.
The t-distribution has a larger variance than the standard normal distribution
Which of the following is a continuous random variable? The number of new hires in a year The number bounced check from a bank The time to complete a specific task The outcomes of rolling two dice
The time to complete a specific task
What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable? There is a negative relationship between profit and sales. There is a positive relationship between profit and sales. There is no linear relationship between profit and sales. There is a linear relationship between profit and sales that can be either positive or negative.
There is no linear relationship between profit and sales.
Which of the following assumptions is not true for multiple linear regression? The residuals are normally distributed. The relationship between dependent and independent variables is linear. There will be a multi-collinearity effect. The independent variables are not correlated.
There will be a multi-collinearity effect.
A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that: Students who do best on this exam will make the worst students. They should hire a new statistician. The exam is a poor predictor of success. The entrance exam is a good predictor of success.
They should hire a new statistician.
In classification analysis, we are determining the probability of an observation ________. To be one To be part of a certain class or not To be zero To be undefined
To be part of a certain class or not
Which of the following is true about A/B testing? You should test multiple elements of your landing page at a time and compare. To increase conversion rate of your website traffic, A/B testing can be beneficial. You need to attend WPC 300 course to learn about A/B Testing. A neutral result on an A/B testing means you correctly performed the test.
To increase conversion rate of your website traffic, A/B testing can be beneficial.
Which of the following is a false statement? The k-means algorithm is a method for doing partitional clustering In cluster analysis, the objects within clusters should exhibit a high degree of similarity To predict sales from transactional data, one should perform clustering analysis. Reducing SSE (sum of squared error) within a cluster increases cohesion
To predict sales from transactional data, one should perform clustering analysis.
In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model. Testing and validation Binary and numeric Training and validation/testing Training and Binary
Training and validation/testing
When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"? Arrival time Destination city Flight Number Traveler
Traveler
Which of the following is a cloud service provider? iCloud VMWare Gmail Dropbox
VMWare
Which of the following is true about k-means clustering? It is a type of hierarchical clustering The cluster analysis will give us an optimum value for k We choose the value for k before doing the clustering analysis A tree diagram is used to illustrate the steps in the clustering analysis
We choose the value for k before doing the clustering analysis
Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables. independent numeric dependent a binary numeric a binary categorical
a binary categorical
Gamblers' fallacy is ____________. a clustering illusion an endowment effect bias framing effect bias a zero-risk bias
a clustering illusion
Which of the following describes a positively skewed histogram? a histogram for which mean and mode values are the same. a histogram that tails off towards the right a histogram with large kurtosis a histogram that has no fluctuation in mass
a histogram that tails off towards the right
A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________. a linear variable a continuous variable an independent variable a dependent variable
an independent variable
In for a chart to minimize graphical complexity, the data-ink ratio must be: greater than 1 close to 1 close to zero less than 1
close to 1
In order for a chart to have graphical integrity, the lie factor must be: less than 1 close to zero greater than 1 close to 1
close to 1
When two variables are highly positively correlated, the correlation coefficient will be _______. close to 0 close to 1 close to 10 close to -1
close to 1
Which of the following is not a requirement for an ETL architecture? data quality data security data integration data compliance
data quality
Data transformation involves: data splitting and aggregation format changes and encryption duplication and load format changes and load
data splitting and aggregation
The central limit theorem states that even if the population is not normally distributed, the distribution of the sample mean will still be normal when the sample size is large Sampling distribution of the mean will vary from the sample to sample Standard error of the mean will not vary from the population mean Mean of the population can be calculated without using samples
distribution of the sample mean will still be normal when the sample size is large
For a normal distribution mean is _______ to median. greater than less than not equal equal
equal
In the experimental design example "IQ Water", students are called _______. measurement units response variable experimental units treatments
experimental units
The difference between the first and third quartiles is referred to as the ____________. interquartile range standard deviation variance midrange
interquartile range
Standard deviation of a normal data distribution is a _______. measure of data dispersion measure of data quality measure of data shape measure of data centrality
measure of data dispersion
The ________ is the observation that occurs most frequently. mean median mode outlier
mode
An experiment is said to be double-blinded if _________ the researchers only observe the variables of interest a placebo is given to some of the subjects the researcher is not aware of the confounding effect neither the subject nor those working with the subject is aware of who is being given which treatment
neither the subject nor those working with the subject is aware of who is being given which treatment
Odds ratio is defined as ________, where p is the probability of success. 1/p-1 1/1-p p/p-1 p/1-p
p/1-p
Extract function in ETL reads data from data mart specified source database data warehouse unknown database
specified source database
A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not. spurious correlation negative correlation positive correlation non-correlation
spurious correlation
When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________. sunk-cost fallacy availability heuristics bias endowment effect bias zero risk bias
sunk-cost fallacy
When the lie-factor of a graphical chart is more than 1, the size of the effect shown in the graph is bigger than the actual effect in the data. the graph understates the true effect. the graph simplifies the true effect. the graph suggests a possible true effect.
the size of the effect shown in the graph is bigger than the actual effect in the data.
According to statistical notation, what does ∑ stand for? to act as a summation operator to represent population measure to represent the number of items in a population to represent sample statistics
to act as a summation operator
The first step for any kind of A/B testing is to determine how we want to evaluate the performances? to develop a tracking URL. to execute test according to the plan. to develop a test plan for what you want to test.
to develop a test plan for what you want to test.
A sample study is mostly done to estimate the parameters of the population. to establish causality in a controlled environment to learn how different parameters in the population behave together. to rule out any spurious correlation in the data.
to estimate the parameters of the population.
Which of the following is an example of a measure of dispersion? variance median mode mean
variance