WPC 300 all quizzes for final
When we utilize a visualization on paper/screen, that visualization is limited to exploring:
As many variables as we can coherently communicate in 2 dimensions
A/B testing can help marketers to
Increase more likes to their social media sites Increase more clicks to their website Increase more sales
Deleting the grid lines in a chart
Increases the data-ink ratio
Which of the following violates the principle of data visualization?
The data-ink ratio should be higher than 1
Which are useful principles for data visualization?
The graph suggests a possible true effect
Which of the following is a Type-I error?
The null hypothesis is actually true, but the hypothesis test incorrectly rejects it.
Which of the following is an example of a sample?
The number of IT employees out of all employees working in an office of Google
Which of the following statement is false with regard to interpretation?
We cannot refine a model after interpreting its results.
Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.
a binary categorical
Gamblers' fallacy is ____________.
a clustering illusion
Which of the following describes a positively skewed histogram?
a histogram that tails off towards the right
What function can we use to join two or more text strings into one string?
concatenate
In classification problems, the primary source for accuracy estimation of the model is ________.
confusion matrix
The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.
confusion matrix
The primary statistical model result should focus on all the answers below except
data cleaning
In an ETL process, data is loaded into a final target database such as:
data warehouse
What function can be used to create a date?
date
Visualizing data is which kind of analytical technique?
decriptive
For a normal distribution mean is _______ to median.
equal
In the experimental design example "IQ Water", students are called _______.
experimental units
An election poll is an example of Machine Learning Applications. (T/F)
false
Compared with observational data, analyses of experimental data are more challenging. (T/F)
false
In order to have a successful A/B testing, we should develop a test plan for what we want to test first. (T/F)
false
New product development could not utilize A/B testing to enhance the process. (T/F)
false
The "Unique" keyword will help obtain unique values from returned columns (T/F)
false
We also need to sample the data that will be used as a training data in machine learning algorithm. (T/F)
false
We can adjust or train input variable and the slope in a machine learning linear regression model? (T/F)
false
We can only have a limited number of variables in a machine learning model. (T/F)
false
You can design an experiment for any scenario. (T/F)
false
Which option below is an example of supervised learning application?
learn to classify spam
In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.
logit
Visualization of spatial data are most illustrative when shown using
maps
Standard deviation of a normal data distribution is a _______.
measure of data dispersion
The ________ is the observation that occurs most frequently.
mode
In William Playfair's Line Chart, which two parameters did he chart?
national debt vs time
Which of the following proposition describes an existing theory or belief?
null hypothesis
Odds ratio is defined as ________, where p is the probability of success.
p/1-p
William Playfair is credited for inventing which type of chart?
pie chart
With more data is available, the machine learning algorithms improve their performance. (T/F)
true
Visual data enables the reader to see trends and dependencies. (T/F)
true
We can use a letter as a delimiter to split a cell to 2 or 3 cells.(T/F)
true
What Range_lookup parameter value should we use if we want to find an exactly value match?
0
What is the confidence interval when the level of significance is 0.07?
0.930
The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean?
3.00 sigma/sqrt(n) = 12/sqrt(16) = 12/4 = 3
To retrieve information, we will use "Select" and "From" keyword in MySQL (T/F)
true
Over-reliant on the first piece of information is called ____________
Anchoring bias
You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias?
Anonymously data collection by hiding ASU brand in the survey question.
A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?
By utilizing a multiple logistic regression model developed by an in-house analyst
What are the three principles of describing data?
Center, spread and shape
When sample size increases
Confidence interval decreases
Which of the following statement(s) about charts is true?
Data ink can sometimes help tell a richer story
What function can be used to calculate the time between two dates? Units can be years, months, or days.
Dateif
What are the four types of data analytical method?
Descriptive, explanatory, predictive and prescriptive
Which of the following is an example of secondary data?
Firm's proprietary data
Which of the following describes the standard deviation?
It is the square root of the variance.
In logistic regression, the dependent variable y is defined as:
Log (p/1-p)
If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?
Multiple logistic regression
In an agile approach of analytics what is the first step of the process?
Perform business discovery
What best describes the nature of a rose diagram?
Plots data using a circular historical plot
Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance.
Prescriptive analytics
Which of the following data analytics model use optimization techniques?
Prescriptive analytics
What is data visualization?
Process of graphically representing information and data
"Google Doc" is an example of ____________ in a could computing environment.
SaaS
The central limit theorem states that if the population is normally distributed, then the
Sampling distribution of the mean will also be normal for any sample size
Which of the following statements is a reason not to use a table for data visualization?
Tables cannot easily show trends
Which of the following is a difference between the t-distribution and the standard normal (z) distribution?
The t-distribution has a larger variance than the standard normal distribution
Which of the following is a continuous random variable?
The time to complete a specific task
In classification analysis, we are determining the probability of an observation ________.
To be part of a certain class or not
In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.
Training and validation/testing
You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table?
airport code
In order to reject the null hypothesis, the p-value must be less than the
alpha
Which of the following is not a component of relational database?
analysis
In for a chart to minimize graphical complexity, the data-ink ratio must be:
close to 1
In order for a chart to have graphical integrity, the lie factor must be:
close to 1
The difference between the first and third quartiles is referred to as the
interquartile range
What function will return a random number between the numbers specify?
randbetween
What function can be used to sum numbers in a range that meet supplied criteria?
sumif
When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________.
sunk-cost fallacy
What type of learning do we apply when we try to predict who has higher chance to survive on Titanic?
supervised learning
Which machine learning technique can a bank apply to determine if a loan application will be approved?
supervised learning
When the lie-factor of a graphical chart is more than 1,
the size of the effect shown in the graph is bigger than the actual effect in the data
What function can be used to create a time?
time
According to statistical notation, what does ∑ stand for?
to act as a summation operator
What function will remove the extra white space in front of the text?
trim
"Order by" keyword will sort the output in either numerical or alphabetical order. (T/F)
true
A/B testing has been used a lot in marketing promotions. (T/F)
true
Data generation process for observational studies and experiments are different.
true
Experimentation is a way of analytical thinking (T/F)
true
For an even number of observations, the median is the mean of the two middle numbers (T/F)
true
Keyword "where" is used to specify constraints or conditions in SQL. (T/F)
true
Machine learning algorithms can learn better with more data. (T/F)
true
SQL is short for "structured query language". (T/F)
true
To increase conversion rate of your website traffic, A/B testing can be beneficial. (T/F)
true
To retrieve information from a database is called a "query" (T/F)
true
Which machine learning technique can be used to determine where to build cell phone towers?
unsupervised learning
What function can be used to find things in a table or a range by row?
vlookup