WPC 300 Final Exam (Darcy)

¡Supera tus tareas y exámenes ahora con Quizwiz!

A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be: $3850 $7400 $2090 $6900

$6900

The value of R-Squared always falls between ________ and ________, inclusive. -1 and +1 -infinity to + infinity 0 and 1 0 and -1

0 and 1

What is the confidence level when the level of significance is 0.07? 0.930 7% 0.970 0.093

0.930

The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean? 4.00 3.46 10.00 3.00

3.00 [sigma/sqrt(n) = 12/sqrt(16) = 12/4 = 3]

The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true? 81% of the variation in the money spent on repairs is explained by the age of the vehicle 90% of the repair cost will be explained by the age of the vehicle 90% of the money spent on repair is explained by the age of the vehicle 81% of money spent on repairs is explained by the age of the vehicle

81% of the variation in the money spent on repairs is explained by the age of the vehicle

Which of the following statement is true based on the following regression equation? IQ = 4.0 + Reading Label * 5.6 A unit point change in reading label will increase IQ by 5.6 point. A unit point change in IQ will result in 5.6-point increase in reading label. A unit point change in IQ will result in 9.6-point increase in reading label. Reading label is not a good predictor of IQ.

A unit point change in reading label will increase IQ by 5.6 point.

You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table? Airport Code City State Address

Airport Code

What kinds of bias could show up when collecting data? Framing effect Sampling bias Self-selection bias All of the answer selections are correct

All of the answer selections are correct

Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion? Who will win the NBA championship? All of the answer selections are correct. What is the purchase pattern(s) of our customers? How many students will enroll for an online class in the Spring?

All of the answer selections are correct.

A/B testing can help marketers to Increase more clicks to their website Increase more sales Increase more likes to their social media sites All of the answers are correct

All of the answers are correct

An ideal machine learning process needs Extremely diverse data Large volume of data All other answer are true. Highly granular data

All other answer are true.

In order to reject the null hypothesis, the p-value must be less than the Variance Alpha Standard deviation Degrees of freedom

Alpha

Which of the following is not a component of the relational database? Tables Analysis Relationship among rows in tables Metadata

Analysis

Over-reliant on the first piece of information is called ____________ Bandwagon effect Zero risk bias Clustering illusion Anchoring bias

Anchoring bias

You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias? Polls are completed only by visitors to the site Those with an interest in your mission are the ones to participate in the survey Anonymously data collection by hiding ASU brand in the survey question. Some individuals are less likely to participate because they have strong opinions against ASU education standard.

Anonymously data collection by hiding ASU brand in the survey question.

You bought a top of the line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here? Bandwagon effect Zero-risk bias Overconfidence Endowment effect

Bandwagon effect

Which of the following is a step of agglomerative hierarchical clustering? By joining two clusters farthest away from each other By joining two clusters that not at a Euclidean distance By separating cluster into two finer groups By joining two clusters that are closest to each other

By joining two clusters that are closest to each other

A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer? By utilizing a simple linear regression model developed by an in-house analyst By utilizing a multiple logistic regression model developed by an in-house analyst By asking his colleague if he knows the person By asking the customer if he is planning to default the loan or not

By utilizing a multiple logistic regression model developed by an in-house analyst

Which of the following is an example of unsupervised machine learning? Logistic regression Decision tree Artificial neural networks Clustering

Clustering

In data extraction process for an ETL tool, which of the following is not an example of legit data source? Customers' social media data Point of Sales data Competitions' data Online Line Transaction data

Competitions' data

When sample size increases Standard deviation of the sample mean increases Confidence interval increases Confidence interval remains the same Confidence interval decreases

Confidence interval decreases

In classification problems, the primary source for accuracy estimation of the model is ________. Probability of success Logit Odds ratio Confusion matrix

Confusion matrix

The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known. Parameter estimates table ANOVA table Confusion matrix Effect summary table

Confusion matrix

Which of the following is not an application of clustering analysis? Crime prediction analysis Market segmentation analysis Web click stream analysis Collaborating filtering analysis

Crime prediction analysis

Which of the following is not a standard practice in "Data Transformation" process of an ETL tool? Change of data format Data aggregation Splitting data fields Data extraction from ERP

Data extraction from ERP

Which of the following statements below is false about supervised/unsupervised data analysis? Data is not labeled for supervised analysis The data is labeled for supervised analysis For unsupervised analysis, the goal is to find cases that are similar to each other The data is not labeled for unsupervised

Data is not labeled for supervised analysis

In an ETL process, data is loaded into a final target database such as: Operational dashboard Public database Social media database Data warehouse

Data warehouse

In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______. Master data management Online analytical processing Original Database Data warehouse

Data warehouse

Which of the following techniques is a modern update of artificial neural networks? Deep learning Clustering Decision tree Logistic regression

Deep learning

Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using? Descriptive analytics Prescriptive analytics Forecast analytics Predictive analytics

Descriptive analytics

What are the four types of data analytical method? Descriptive, explanatory, predictive and prescriptive Critical, analytical, predictive and explanatory Descriptive, logical, predictive and prescriptive Descriptive, analytical, predictive and prescriptive

Descriptive, explanatory, predictive and prescriptive

Which of the following is not one of the processes involved in data cleaning? Consolidating Matching Parsing Encrypting

Encrypting

When you buy a new car, you value it more than the price you paid because of: Zero-risk bias Endowment effect bias Sunk cost fallacy None of the answers are correct

Endowment effect bias

Which of the following statements is true? Experimentation is a way of analytical thinking Using intuition is a way of analytical thinking Analytical thinking is not based on facts Heuristic thinking is slow

Experimentation is a way of analytical thinking

Which of the following is an example of secondary data? Simulated data Survey data Interview data Firm's proprietary data

Firm's proprietary data

Which of the following is true for a median? Median can be calculated no matter how the data is arranged For an even number of observations, the median is the mean of the two middle numbers Medians are affected by outliers A median is only meaningful for interval or ordinal data and not for ratio data

For an even number of observations, the median is the mean of the two middle numbers

Which of the following is an example of association rule learning? How frequently items are purchased in a group of transaction The association between customers and what they purchase How frequently a cluster can be formed in a given transaction How frequently an item set occurs in a transaction

How frequently an item set occurs in a transaction

When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______. GROUP BY INNER JOIN COUNT ORDER BY

INNER JOIN

Which of the following statements is not true about artificial neural networks In the hidden layer of the networks, input data is hidden The input layer in the network receives the data The learning process is similar to our brain The network is modeled after the human brain in which brain cells work in a network

In the hidden layer of the networks, input data is hidden

Artificial Intelligence _______ Is a broad science of mimicking human abilities Cannot be used for retail industry Is a specific subset of machine learning Does not depend on machine learning

Is a broad science of mimicking human abilities

AI is not embraced everywhere in every industry because _______. It is not very well understood It is not very well developed It is likely to fail in the future It can be operationally expensive

It can be operationally expensive

Which of the following is true about multi-collinearity? The P-value reduces significantly, leading to rejection of the null hypothesis. The regression coefficients become clearer and are easier to interpret. It is measured using a measure called variance inflation factor (VIF). The effect of an independent variable on the dependent variable becomes easy to isolate.

It is measured using a measure called variance inflation factor (VIF).

Which of the following statements is NOT true about experimental studies to compare two treatments? It is not easy to control uncertainties in the comparison.. We can design experiments so that the error in the comparison is small. We can design experiments to minimize any bias in the comparison. Experiments allow us to set up a direct comparison between the treatments of interest.

It is not easy to control uncertainties in the comparison.

Which of the following describes the standard deviation? It is the average of the greatest and least values in the data set. It is the difference between the first and third quartiles of a data set. It is the square of variance It is the square root of the variance.

It is the square root of the variance.

In developing spam filter algorithms, we need Unlabeled data of spam emails Unlabeled data of non-spam emails Labeled data of both spam and non-spam emails Labeled data of spam emails

Labeled data of both spam and non-spam emails

One of the processes in ETL is... Load Treatment Transition Extend

Load

The final stage of an ETL process is: Data Analysis Extract Load Transform

Load

In logistic regression, the dependent variable y is defined as: Log (1/1-p) Log (1/p) Log (p/1-p) Log (1-p)

Log (p/1-p)

In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________. Odds Log of Y Logit Odds ratio

Logit

In a cluster analysis, the distance between the clusters should be: Even Minimized Maximized Zero

Maximized

If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer? Simple logistic regression Simple linear regression Multiple logistic regression Multiple linear regression

Multiple logistic regression

Which of the following biases cannot be categorized as a cognitive bias? Groupthink Anchoring Bias Sunk cost fallacy None of the answers are correct

None of the answers are correct

Which of the following is not a drawback of analytical decision making? Delayed action Lack of flexibility Frustration in teams None of the answers are correct

None of the answers are correct

Which of the following proposition describes an existing theory or belief? Proportion Null hypothesis Standard deviation Alternative hypothesis

Null hypothesis

Which of the following tools help in periodic managerial decision-making? OLTP Database OLAP Servers

OLAP

After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________. Experimental study None of the other answers is correct Observational Study Comparative study

Observational Study

A person who is convinced he is gaining admission to Harvard by merely applying is suffering from: Gambler's fallacy Overconfidence Zero-risk bias None of the answers are correct

Overconfidence

In an agile approach of analytics, what is the first step of the process? Perform data discovery Perform business discovery Score and deploy Model data

Perform business discovery

Which of the following examples is not an application of AI? Monitoring epidemics and diseases and stopping them from spreading Optimizing traffic patterns over time Predicting human behavior by reading natural language used Predicting the exam score by scanning the appropriate text book

Predicting the exam score by scanning the appropriate text book

Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using? Predictive analytics Descriptive analytics Prescriptive analytics Explanatory analytics

Predictive analytics

Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance. Prescriptive analytics Descriptive analytics Explanatory analytics Forecast analytics

Prescriptive analytics

Which of the following data analysis models use optimization techniques? Prescriptive analytics Predictive analytics Descriptive analytics Diagnostic analytics

Prescriptive analytics

Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this? Predictive analytics Explanatory analytics Descriptive analytics Prescriptive analytics

Prescriptive analytics

Which of the following is an important task of a database management system? Helps collect data from vendors Provides support such as performing maintenance and routine backups. Provides unauthorized access to data when authentication fails Helps create rules for data analysis

Provides support such as performing maintenance and routine backups.

_______ ensures that related data exist in parent table before allowing an entry into a child table. SQL Referential integrity Data redundancy Data Integrity

Referential integrity

The unexplained variance in the regression analysis is also known as: Predicted variance Regression variance Total variance Residual variance

Residual variance

The SQL code to extract only first_name information for all records of the "Actor" table below is: SELECT first_name FROM Actor SELECT * FROM Actor WHERE first_name = "NICK" SELECT * FROM Actor SELECT Actor FROM first_name

SELECT first_name FROM Actor

"Google Doc" is an example of _______ in a could computing environment. PaaS IaaS SaaS Virtualization

SaaS

Which of the following category of data mining you would use for Spam filtering of emails? Supervised Heuristics Both supervised and unsupervised Unsupervised

Supervised

Which of the following statements below is true about supervised/unsupervised machine learning? Supervised learning require unlabeled data for training Supervised learning require labeled data for training Unsupervised learning require labeled data for training Unsupervised learning require no supervision from human

Supervised learning require labeled data for training

In the Target story discussed in the lecture, why did Target send the teen daughter maternity ads? Target analytic model confused her with an older woman with a similar name Target was using special promotion that targeted all teens in her geographical area Target was sending ads to all women in a particular neighborhood Target analytics model suggested she was pregnant based on her buying habit

Target analytics model suggested she was pregnant based on her buying habit

Which of the following is an ETL vendor? MySQL Teradata Tableau JMP

Teradata

Which of the following is true of hierarchical clustering? The data partition does not occur in a single step All clusters must have the same number of data All clusters must have more than one object in it No single cluster can have all data

The data partition does not occur in a single step

Which of the following is a definition of distance between two clusters in a complete linkage clustering? The distance between the least distant pair of objects, one from each group The sum of square of the distance between clusters The distance between the most distant pair of objects, one from each group The average of distance between all pairs of objects, where each pair is made up of one object from each group

The distance between the most distant pair of objects, one from each group

Which of the following is a Type-I error? The null hypothesis is actually true, and the hypothesis test correctly fails to reject it. The null hypothesis is actually false, and the test correctly rejects it. The null hypothesis is actually false, but the test incorrectly fails to reject it. The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.

The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.

Which of the following is an example of a sample? The number of IT employees out of all employees working in an office of Google The number of members in the Democratic party The number of individuals who have a Ford car The population of Canada

The number of IT employees out of all employees working in an office of Google

Which of the following is a difference between the t-distribution and the standard normal (z) distribution? The standard normal distribution is dependent on parameters like degree of freedom, while t-distribution is not. The t-distribution cannot be calculated without a known standard deviation, while the standard normal distribution can be. The t-distribution has a larger variance than the standard normal distribution The standard normal distributions' confidence levels are wider than those of the t-distribution

The t-distribution has a larger variance than the standard normal distribution

Which of the following is a continuous random variable? The time to complete a specific task The number bounced check from a bank The outcomes of rolling two dice The number of new hires in a year

The time to complete a specific task

What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable? There is a linear relationship between profit and sales that can be either positive or negative. There is no linear relationship between profit and sales. There is a positive relationship between profit and sales. There is a negative relationship between profit and sales.

There is no linear relationship between profit and sales.

Which of the following assumptions is not true for multiple linear regression? The relationship between dependent and independent variables is linear. There will be a multi-collinearity effect. The independent variables are not correlated. The residuals are normally distributed.

There will be a multi-collinearity effect.

A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that: The entrance exam is a good predictor of success. They should hire a new statistician. Students who do best on this exam will make the worst students. The exam is a poor predictor of success.

They should hire a new statistician.

In classification analysis, we are determining the probability of an observation ________. To be undefined To be part of a certain class or not To be zero To be one

To be part of a certain class or not

Which of the following is true about A/B testing? A neutral result on an A/B testing means you correctly performed the test. You should test multiple elements of your landing page at a time and compare. To increase conversion rate of your website traffic, A/B testing can be beneficial. You need to attend WPC 300 course to learn about A/B Testing.

To increase conversion rate of your website traffic, A/B testing can be beneficial.

Which of the following is a false statement? The k-means algorithm is a method for doing partitional clustering Reducing SSE (sum of squared error) within cluster increases cohesion In the cluster analysis, the objects within clusters should exhibit an high amount of similarity To predict sales from transactional data one should perform clustering analysis.

To predict sales from transactional data one should perform clustering analysis.

In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model. Binary and numeric Testing and validation Training and validation/testing Training and Binary

Training and validation/testing

When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"? Arrival time Destination city Flight Number Traveler

Traveler

Which of the following is a cloud service provider? VMWare iCloud Gmail Dropbox

VMWare

Which of the following is true about k-means clustering? A tree diagram is used to illustrate the steps in the clustering analysis The cluster analysis will give us an optimum value for k We choose the value for k before doing the clustering analysis It is a type of hierarchical clustering

We choose the value for k before doing the clustering analysis

Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables. numeric dependent a binary numeric a binary categorical independent

a binary categorical

Gamblers' fallacy is ____________. a clustering illusion an endowment effect bias framing effect bias a zero-risk bias

a clustering illusion

Which of the following describes a positively skewed histogram? a histogram for which mean and mode values are the same. a histogram that has no fluctuation in mass a histogram with large kurtosis a histogram that tails off towards the right

a histogram that tails off towards the right

A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________. a linear variable a dependent variable an independent variable a continuous variable

an independent variable

When two variables are highly positively correlated, the correlation coefficient will be _______. close to -1 close to 1 close to 10 close to 0

close to 1

Which of the following is not a requirement for an ETL architecture? data compliance data integration data quality data security

data quality

Data transformation involves... duplication and load format changes and encryption format changes and load data splitting and aggregation

data splitting and aggregation

The central limit theorem states that even if the population is not normally distributed, the... Mean of the population can be calculated without using samples distribution of the sample mean will still be normal when the sample size is large Standard error of the mean will not vary from the population mean Sampling distribution of the mean will vary from the sample to sample

distribution of the sample mean will still be normal when the sample size is large

For a normal distribution mean is _______ to median. not equal greater than less than equal

equal

In the experimental design example "IQ Water", students are called _______. response variable experimental units treatments measurement units

experimental units

The difference between the first and third quartiles is referred to as the ____________. standard deviation interquartile range midrange variance

interquartile range

Standard deviation of a normal data distribution is a _______. measure of data dispersion measure of data shape measure of data centrality measure of data quality

measure of data dispersion

Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be organic food people's mood work ethics money

money

An experiment is said to be double-blinded if _________ the researchers only observes the variables of interest neither the subject nor those working with the subject is aware of who is being given which treatment the researcher is not aware of the confounding effect a placebo is given to some of the subjects

neither the subject nor those working with the subject is aware of who is being given which treatment

Odds ratio is defined as ________, where p is the probability of success. p/p-1 1/1-p p/1-p 1/p-1

p/1-p

Extract function in ETL reads data from specified source database data mart data warehouse unknown database

specified source database

A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not. positive correlation non-correlation negative correlation spurious correlation

spurious correlation

When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________. sunk-cost fallacy availability heuristics bias endowment effect bias zero risk bias

sunk-cost fallacy

According to statistical notation, what does ∑ stand for? to represent population measure to represent sample statistics to represent the number of items in a population to act as a summation operator

to act as a summation operator

The first step for any kind of A/B testing is... to develop a test plan for what you want to test. to execute test according to the plan. to develop a tracking URL. to determine how we want to evaluate the performances

to develop a test plan for what you want to test.

A sample study is mostly done to rule out any spurious correlation in the data. to estimate the parameters of the population. to learn how different parameters in the population behave together. to establish causality in a controlled environment

to estimate the parameters of the population.

Which of the following is an example of a measure of dispersion? mode mean variance median

variance


Conjuntos de estudio relacionados

Unit 8 - Science - From Practice Quiz

View Set

Module 5: Principles of Delegation and Prioritization of Care

View Set

CompTIA Practice: Network Protocols Quiz

View Set

Chapter 5: Control Statements Part 2

View Set

Lewis-Chapter 15 - Infection and Human Immunodeficiency Virus Infection

View Set

Chapter 27: Cerebral Dysfunction

View Set

11.1 - 11.5 Wide Area Networks (WANs)

View Set