its winning time wpc 300
A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be: $3850 $2090 $6900 $7400
$6900
The value of R-Squared always falls between ________ and ________, inclusive. -1 and +1 0 and 1 -infinity to + infinity 0 and -1
0 and 1
What is the confidence level when the level of significance is 0.07? 0.093 7% 0.970 0.930
0.930
The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean? 10.00 3.00 3.46 4.00
3.00 (sigma/sqrt(n) = 12/sqrt(16) = 12/4 = 3)
The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true? 90% of the repair cost will be explained by the age of the vehicle 81% of the variation in the money spent on repairs is explained by the age of the vehicle 81% of money spent on repairs is explained by the age of the vehicle 90% of the money spent on repair is explained by the age of the vehicle
81% of the variation in the money spent on repairs is explained by the age of the vehicle
Which of the following statement is true based on the following regression equation?IQ = 4.0 + Reading Label * 5.6 A unit point change in IQ will result in 9.6-point increase in reading label. A unit point change in IQ will result in 5.6-point increase in reading label. A unit point change in reading label will increase IQ by 5.6 point. Reading label is not a good predictor of IQ.
A unit point change in reading label will increase IQ by 5.6 point.
You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table? Address State Airport Code City
Airport Code
What kinds of bias could show up when collecting data? Framing effect Sampling bias Self-selection bias All of the answer selections are correct
All of the answer selections are correct
Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion? All of the answer selections are correct. Who will win the NBA championship? How many students will enroll for an online class in the Spring? What is the purchase pattern(s) of our customers?
All of the answer selections are correct.
A/B testing can help marketers to Increase more sales Increase more clicks to their website Increase more likes to their social media sites All of the answers are correct
All of the answers are correct
An ideal machine learning process needs Large volume of data Extremely diverse data All other answer are true. Highly granular data
All other answer are true.
In order to reject the null hypothesis, the p-value must be less than the Degrees of freedom Variance Standard deviation Alpha
Alpha
Which of the following is not a component of the relational database? Tables Metadata Relationship among rows in tables Analysis
Analysis
Over-reliant on the first piece of information is called ____________ Bandwagon effect Zero risk bias Clustering illusion Anchoring bias
Anchoring bias
You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias? Anonymously data collection by hiding ASU brand in the survey question. Some individuals are less likely to participate because they have strong opinions against ASU education standard. Polls are completed only by visitors to the site Those with an interest in your mission are the ones to participate in the survey
Anonymously data collection by hiding ASU brand in the survey question.
You bought a top of the line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here? Bandwagon effect Zero-risk bias Overconfidence Endowment effect
Bandwagon effect
Which of the following is a step of agglomerative hierarchical clustering? By separating cluster into two finer groups By joining two clusters farthest away from each other By joining two clusters that are closest to each other By joining two clusters that not at a Euclidean distance
By joining two clusters that are closest to each other
A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer? By utilizing a multiple logistic regression model developed by an in-house analyst By asking his colleague if he knows the person By asking the customer if he is planning to default the loan or not By utilizing a simple linear regression model developed by an in-house analyst
By utilizing a multiple logistic regression model developed by an in-house analyst
What are the three principles of describing data? Center, spread and shape Dispersion, range and standard deviation Mean, median and mode Centrality, dispersion and size
Center, spread and shape
Which of the following is an example of unsupervised machine learning? Logistic regression Clustering Artificial neural networks Decision tree
Clustering
In data extraction process for an ETL tool, which of the following is not an example of legit data source? Point of Sales data Competitions' data Customers' social media data Online Line Transaction data
Competitions' data
When sample size increases Confidence interval remains the same Confidence interval increases Confidence interval decreases Standard deviation of the sample mean increases
Confidence interval decreases
In classification problems, the primary source for accuracy estimation of the model is ________. Logit Probability of success Confusion matrix Odds ratio
Confusion matrix
The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known. ANOVA table Confusion matrix Effect summary table Parameter estimates table
Confusion matrix
Which of the following is not an application of clustering analysis? Collaborating filtering analysis Market segmentation analysis Web click stream analysis Crime prediction analysis
Crime prediction analysis
Which of the following is not a standard practice in "Data Transformation" process of an ETL tool? Data aggregation Splitting data fields Data extraction from ERP Change of data format
Data extraction from ERP
Which of the following statement(s) about charts is true? Data ink can sometimes help tell a richer story A useless chart is called "chart junk" We should make as many grids as possible in a chart The more data ink, the better
Data ink can sometimes help tell a richer story
Which of the following statements below is false about supervised data analysis? Data is not labeled for supervised analysis The data is labeled for supervised analysis Multiple linear regression analysis is a type of supervised data analysis Logistic regression analysis is a type of supervised data analysis
Data is not labeled for supervised analysis
In an ETL process, data is loaded into a final target database such as: Operational dashboard Social media database Public database Data warehouse
Data warehouse
In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______. Online analytical processing Master data management Data warehouse Original Database
Data warehouse
Which of the following techniques is a modern update of artificial neural networks? Logistic regression Clustering Deep learning Decision tree
Deep learning
Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using? Prescriptive analytics Descriptive analytics Forecast analytics Predictive analytics
Descriptive analytics
What are the four types of data analytical method? Critical, analytical, predictive and explanatory Descriptive, analytical, predictive and prescriptive Descriptive, logical, predictive and prescriptive Descriptive, explanatory, predictive and prescriptive
Descriptive, explanatory, predictive and prescriptive
Which of the following is not one of the processes involved in data cleaning? Parsing Consolidating Encrypting Matching
Encrypting
When you buy a new car, you value it more than the price you paid because of: Zero-risk bias Endowment effect bias Sunk cost fallacy None of the answer selections are correct
Endowment effect bias
Which of the following statements is true? Experimentation is a way of analytical thinking Using intuition is a way of analytical thinking Analytical thinking is not based on facts Heuristic thinking is slow
Experimentation is a way of analytical thinking
Which of the following is an example of secondary data? Interview data Survey data Firm's proprietary data Simulated data
Firm's proprietary data
Which of the following is an example of association rule learning? The association between customers and what they purchase How frequently items are purchased in a group of transaction How frequently an item set occurs in a transaction How frequently a cluster can be formed in a given transaction
How frequently an item set occurs in a transaction
When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______. ORDER BY INNER JOIN COUNT GROUP BY
INNER JOIN
Which of the following statements is not true about artificial neural networks In the hidden layer of the networks, input data is hidden The network is modeled after the human brain in which brain cells work in a network The input layer in the network receives the data The learning process is similar to our brain
In the hidden layer of the networks, input data is hidden
Deleting the grid lines in a chart Increases the data-ink ratio Decreases the lie-factor Increases the lie-factor Decreases the data-ink ratio
Increases the data-ink ratio
Artificial Intelligence _______ Cannot be used for retail industry Is a broad science of mimicking human abilities Does not depend on machine learning Is a specific subset of machine learning
Is a broad science of mimicking human abilities
AI is not embraced everywhere in every industry because _______. It is not very well developed It is likely to fail in the future It is not very well understood It can be operationally expensive
It can be operationally expensive
Which of the following is true about multi-collinearity? The P-value reduces significantly, leading to rejection of the null hypothesis. The effect of an independent variable on the dependent variable becomes easy to isolate. The regression coefficients become clearer and are easier to interpret. It is measured using a measure called variance inflation factor (VIF).
It is measured using a measure called variance inflation factor (VIF).
Which of the following statements is NOT true about experimental studies to compare two treatments? We can design experiments to minimize any bias in the comparison. It is not easy to control uncertainties in the comparison. We can design experiments so that the error in the comparison is small. Experiments allow us to set up a direct comparison between the treatments of interest.
It is not easy to control uncertainties in the comparison.
Which of the following describes the standard deviation? It is the square of variance It is the average of the greatest and least values in the data set. It is the square root of the variance. It is the difference between the first and third quartiles of a data set.
It is the square root of the variance.
In developing spam filter algorithms, we need Unlabeled data of spam emails Unlabeled data of non-spam emails Labeled data of both spam and non-spam emails Labeled data of spam emails
Labeled data of both spam and non-spam emails
One of the processes in ETL is Treatment Extend Transition Load
Load
The final stage of an ETL process is: Load Data Analysis Transform Extract
Load
In logistic regression, the dependent variable y is defined as: Log (p/1-p) Log(1/p) Log (1-p) Log (1/1-p)
Log (p/1-p)
In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________. Logit Odds ratio Log of Y Odds
Logit
In a cluster analysis, the distance between the clusters should be: Zero Minimized Maximized Even
Maximized
Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be Money work ethics people's mood organic food
Money
If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer? Simple logistic regression Multiple logistic regression Simple linear regression Multiple linear regression
Multiple logistic regression
Which of the following biases cannot be categorized as a cognitive bias? Groupthink Anchoring Bias Sunk cost fallacy None of the answer selections are correct
None of the answer selections are correct
Which of the following is not a drawback of analytical decision making? Delayed action Lack of flexibility Frustration in teams None of the answer selections are correct
None of the answer selections are correct
Which of the following statement(s) about charts is false? A chart should minimize graphical complexity A chart should tell a story A chart should have graphical integrity None of the other answers are false
None of the other answers are false
Which of the following proposition describes an existing theory or belief? Proportion Alternative hypothesis Standard deviation Null hypothesis
Null hypothesis
Which of the following tools help in periodic managerial decision-making? OLTP Servers Database OLAP
OLAP
After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________. None of the other answers is correct Observational Study Experimental study Comparative study
Observational Study
A person who is convinced he is gaining admission to Harvard by merely applying is suffering from: Gambler's fallacy Overconfidence Zero-risk bias None of the answer selections are correct
Overconfidence
In an agile approach of analytics what is the first step of the process? Score and deploy Perform data discovery Model data Perform business discovery
Perform business discovery
What best describes the nature of a rose diagram? Represents various species of flowers Rarely used for azimuthal data Uses various colors to represent cause of mortality Plots data using a circular historical plot
Plots data using a circular historical plot
Which of the following examples is not an application of AI? Monitoring epidemics and diseases and stopping them from spreading Predicting human behavior by reading natural language used Predicting the exam score by scanning the appropriate text book Optimizing traffic patterns over time
Predicting the exam score by scanning the appropriate text book
Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using? Prescriptive analytics Descriptive analytics Explanatory analytics Predictive analytics
Predictive analytics
Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance. Prescriptive analytics Explanatory analytics Forecast analytics Descriptive analytics
Prescriptive analytics
Which of the following data analysis models use optimization techniques? Diagnostic analytics Descriptive analytics Predictive analytics Prescriptive analytics
Prescriptive analytics
Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this? Explanatory analytics Prescriptive analytics Predictive analytics Descriptive analytics
Prescriptive analytics
Which of the following is an important task of a database management system? Helps collect data from vendors Provides support such as performing maintenance and routine backups. Helps create rules for data analysis Provides unauthorized access to data when authentication fails
Provides support such as performing maintenance and routine backups.
_______ ensures that related data exist in parent table before allowing an entry into a child table. Data redundancy Referential integrity Data Integrity SQL
Referential integrity
The unexplained variance in the regression analysis is also known as: Regression variance Residual variance Total variance Predicted variance
Residual variance
The SQL code to extract only first_name information for all records of the "Actor" table below is: SELECT Actor FROM first_name; SELECT first_name FROM Actor; SELECT * FROM Actor; SELECT * FROM Actor WHERE first_name = "NICK";
SELECT first_name FROM Actor;
"Google Doc" is an example of _______ in a could computing environment. PaaS SaaS IaaS Virtualization
SaaS
Which of the following category of data mining you would use for Spam filtering of emails? Both supervised and unsupervised Heuristics Unsupervised Supervised
Supervised
Which of the following statements below is true about supervised/unsupervised machine learning? Unsupervised learning require no supervision from human Unsupervised learning require labeled data for training Supervised learning require unlabeled data for training Supervised learning require labeled data for training
Supervised learning require labeled data for training
Which of the following statements is a reason not to use a table for data visualization? The table has more precise numbers Tables cannot easily show trends Large amount of information can be included in a very small space Tables display more information in less space than a chart
Tables cannot easily show trends
In the Target story, why did Target send the teen daughter maternity ads? Target was sending ads to all women in a particular neighborhood Target analytic model confused her with an older woman with a similar name Target analytics model suggested she was pregnant based on her buying patterns Target was using special promotion that targeted all teens in her geographical area
Target analytics model suggested she was pregnant based on her buying patterns
Which of the following is an ETL vendor? Teradata MySQL Tableau JMP
Teradata
Which of the following is true of hierarchical clustering? No single cluster can have all data All clusters must have more than one object in it All clusters must have the same number of data The data partition does not occur in a single step
The data partition does not occur in a single step
Which of the following violates the principle of data visualization? The lie-factor should be closely equal to 1 The data-ink ratio should be higher than 1 The chart should tell a story Avoid unnecessary chart junk
The data-ink ratio should be higher than 1
Which of the following is a definition of distance between two clusters in a complete linkage clustering? The sum of square of the distance between clusters The distance between the least distant pair of objects, one from each group The distance between the most distant pair of objects, one from each group The average of distance between all pairs of objects, where each pair is made up of one object from each group
The distance between the most distant pair of objects, one from each group
Which are useful principles for data visualization? The graph suggests a possible true effect The use of a wide range of colors is critical to emphasize distinctions Including as many grids as possible is vital for fully specifying the data to be represented It is important to include pointed arrows whenever possible to really draw attention to the eyes
The graph suggests a possible true effect
Which of the following is a Type-I error? The null hypothesis is actually true, but the hypothesis test incorrectly rejects it. The null hypothesis is actually true, and the hypothesis test correctly fails to reject it. The null hypothesis is actually false, but the test incorrectly fails to reject it. The null hypothesis is actually false, and the test correctly rejects it.
The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.
Which of the following is an example of a sample? The population of Canada The number of individuals who have a Ford car The number of IT employees out of all employees working in an office of Google The number of members in the Democratic party
The number of IT employees out of all employees working in an office of Google
Which of the following is a difference between the t-distribution and the standard normal (z) distribution? The t-distribution has a larger variance than the standard normal distribution The t-distribution cannot be calculated without a known standard deviation, while the standard normal distribution can be. The standard normal distributions' confidence levels are wider than those of the t-distribution The standard normal distribution is dependent on parameters like degree of freedom, while t-distribution is not.
The t-distribution has a larger variance than the standard normal distribution
Which of the following is a continuous random variable? The number bounced check from a bank The time to complete a specific task The number of new hires in a year The outcomes of rolling two dice
The time to complete a specific task
What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable? There is a positive relationship between profit and sales. There is no linear relationship between profit and sales. There is a linear relationship between profit and sales that can be either positive or negative. There is a negative relationship between profit and sales.
There is no linear relationship between profit and sales.
Which of the following assumptions is not true for multiple linear regression? The residuals are normally distributed. There will be a multi-collinearity effect. The relationship between dependent and independent variables is linear. The independent variables are not correlated.
There will be a multi-collinearity effect.
A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that: The exam is a poor predictor of success. Students who do best on this exam will make the worst students. The entrance exam is a good predictor of success. They should hire a new statistician.
They should hire a new statistician.
In classification analysis, we are determining the probability of an observation ________. To be undefined To be part of a certain class or not To be zero To be one
To be part of a certain class or not
Which of the following is true about A/B testing? You should test multiple elements of your landing page at a time and compare. To increase conversion rate of your website traffic, A/B testing can be beneficial. You need to attend WPC 300 course to learn about A/B Testing. A neutral result on an A/B testing means you correctly performed the test.
To increase conversion rate of your website traffic, A/B testing can be beneficial.
Which of the following is a false statement? In cluster analysis, the objects within clusters should exhibit a high degree of similarity The k-means algorithm is a method for doing partitional clustering Reducing SSE (sum of squared error) within a cluster increases cohesion To predict sales from transactional data, one should perform clustering analysis.
To predict sales from transactional data, one should perform clustering analysis.
In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model. Training and Binary Training and validation/testing Binary and numeric Testing and validation
Training and validation/testing
When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"? Destination city Traveler Arrival time Flight Number
Traveler
Which of the following is a cloud service provider? VMWare Gmail Dropbox iCloud
VMWare
Which of the following is true about k-means clustering We choose the value for k before doing the clustering analysis It is a type of hierarchical clustering A tree diagram is used to illustrate the steps in the clustering analysis The cluster analysis will give us an optimum value for k
We choose the value for k before doing the clustering analysis
Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables. a binary numeric independent numeric dependent a binary categorical
a binary categorical
Gamblers' fallacy is ____________. a clustering illusion an endowment effect bias framing effect bias a zero-risk bias
a clustering illusion
Which of the following describes a positively skewed histogram? a histogram for which mean and mode values are the same. a histogram with large kurtosis a histogram that tails off towards the right a histogram that has no fluctuation in mass
a histogram that tails off towards the right
A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________. a continuous variable a dependent variable an independent variable a linear variable
an independent variable
In for a chart to minimize graphical complexity, the data-ink ratio must be: close to 1 less than 1 greater than 1 close to zero
close to 1
In order for a chart to have graphical integrity, the lie factor must be: close to 1 less than 1 close to zero greater than 1
close to 1
When two variables are highly positively correlated, the correlation coefficient will be _______. close to 10 close to -1 close to 1 close to 0
close to 1
Which of the following is not a requirement for an ETL architecture? data security data compliance data quality data integration
data quality
Data transformation involves format changes and encryption format changes and load data splitting and aggregation duplication and load
data splitting and aggregation
The central limit theorem states that even if the population is not normally distributed, the Sampling distribution of the mean will vary from the sample to sample Standard error of the mean will not vary from the population mean distribution of the sample mean will still be normal when the sample size is large Mean of the population can be calculated without using samples
distribution of the sample mean will still be normal when the sample size is large
For a normal distribution mean is _______ to median. equal not equal greater than less than
equal
In the experimental design example "IQ Water", students are called _______. treatments response variable experimental units measurement units
experimental units
The difference between the first and third quartiles is referred to as the ____________. interquartile range variance standard deviation midrange
interquartile range
Standard deviation of a normal data distribution is a _______. measure of data quality measure of data dispersion measure of data shape measure of data centrality
measure of data dispersion
The ________ is the observation that occurs most frequently. mode median outlier mean
mode
An experiment is said to be double-blinded if _________ neither the subject nor those working with the subject is aware of who is being given which treatment the researcher is not aware of the confounding effect the researchers only observes the variables of interest a placebo is given to some of the subjects
neither the subject nor those working with the subject is aware of who is being given which treatment
Odds ratio is defined as ________, where p is the probability of success. 1/1-p p/1-p p/p-1 1/p-1
p/1-p
Extract function in ETL reads data from data warehouse data mart specified source database unknown database
specified source database
A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not. positive correlation spurious correlation negative correlation non-correlation
spurious correlation
When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________. sunk-cost fallacy availability heuristics bias endowment effect bias zero risk bias
sunk-cost fallacy
When the lie-factor of a graphical chart is more than 1, the size of the effect shown in the graph is bigger than the actual effect in the data. the graph understates the true effect. the graph suggests a possible true effect. the graph simplifies the true effect.
the size of the effect shown in the graph is bigger than the actual effect in the data.
According to statistical notation, what does ∑ stand for? to represent the number of items in a population to represent population measure to represent sample statistics to act as a summation operator
to act as a summation operator
The first step for any kind of A/B testing is to determine how we want to evaluate the performances? to execute test according to the plan. to develop a test plan for what you want to test. to develop a tracking URL.
to develop a test plan for what you want to test.
A sample study is mostly done to rule out any spurious correlation in the data. to estimate the parameters of the population. to establish causality in a controlled environment to learn how different parameters in the population behave together.
to estimate the parameters of the population.