data analysis midterm
A method for modifying variables that reduces bias prior to cluster analysis is ______
None of the answers are correct
To identify patterns across transactions, we can use ______
association rules.
April needs to display data over time. Which of the following charts should he use?
bar chart
The charts that are helpful in making comparisons between categorical variables are ______.
bar charts and column charts.
The degree of correlation among independent variables in a regression model is called
multicollinearity
__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.
Unsupervised learning
__________ assigns values to outcomes based on the decision maker's attitude toward risk, loss, and other factors.
Utility Theory
__________ analytics is the analysis of online activity, such as visits to websites or social media.
Web
In which of the following scenarios would it be appropriate to use hierarchical clustering?
When binary or ordinal data needs to be clustered
A multiple regression model for predicted heart rate is as follows: heart rate = 10 - 0.4 run speed + 12 body weight. As the run speed increases by 1 unit (holding body weight constant), heart weight is expected to increase by how much?
As the run speed increases by 1 unit (holding body weight constant), heart weight is expected to decrease by 0.4 units
Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)
-1.33
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. (Round to two decimal places if necessary.)
-2
Given that P(A) = 0.3, P(A | B) = 0.4, and P(B) = 0.5, compute P(A and B).
.4*.5= .2
What would be the coefficient of determination if the total sum of squares (SST) is 23.29 and the sum of squares due to regression (SSR) is 10.03?
.43
In the probability table below, which value is a marginal probability?
.5
Compute the relative frequencies for the data given in the table below:
0.31, 0.14, 0.37, 0.18
The time in seconds that it takes a production worker to inspect an item has an exponential distribution with a mean of 15 seconds. What proportion of inspection times is less than 10 seconds?
0.4866.
Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?
001
Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the frequency of the 25-28 bin?
1
The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?
1
A simple random sample of 100 observations was taken from a large population. The sample mean and the standard deviation were determined to be 80 and 12, respectively. Calculate the standard error of the mean.
1.20
Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?
100
Compute the 50th percentile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
15.5
Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the 25th, 50th, and 75th percentiles.
25th percentile: 15, 50th percentile: 19, 75th percentile: 21
Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57
40.6
Give an estimated simple linear regression equation of y = 46.2 + 589.2x with a coefficient r2 of determination of 0.7523, interpret the coefficient of determination for this equation.
75.2% of the variation
A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. One day he took 15,000 steps. What was his percentile on that day?
99.7%
A simple random sample of 11 observations from a population containing 400 female soccer players was taken, and the following values were obtained.
=>(48+53+72+56+63+64+56+76+50+46+73)/11 =>59.7273
The contingency table below represents employees of a communications company classified by age and field of expertise. Fill in the missing entries.
Business= 1. 13,625 2. 11,994 Education= 1. 7,880 2. 17,825 Total= 37,083
A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction.
Data Mining
The use of analytical techniques for better understanding patterns and relationships that exist in large data sets is ______________.
Data mining
Which statement is true about mutually exclusive events?
If events A and B cannot occur at the same time, they are called mutually exclusive.
The multiple regression model represents pricing for residential housing in a certain market. Predicted Price ̂ = 19,856.56 + 6,985.25 bedrooms + 87.53 square foot. A house in the local market has 5 bedrooms and 3,500 square feet of living area. Use the multiple regression model to determine the predicted price and the residual if the house sells for $360,200.
Predicted price= $361,137 Residual= -$937
__________ analytics use techniques that take input data and yield a best course of action.
Prescriptive
A cell phone user wants to determine what data plan she should get on her new contract. She selects a random sample of 15 months and finds her average usage to be 11.25 GB with a standard deviation of 2.5 GB. She wants to test H0: µ ≥ 12 GB versus Ha: µ < 12 GB. What is the value of the test statistic and the associated p value?
T= -1.162 P-Value= .1324
Which of the following is a discrete random variable?
The number of times a student guesses the answers to questions on a certain test
The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn from the scatter chart given below?
The residuals have an increasingvariance as the dependent variable increases.
Which of the following is not a characteristic of the normal probability distribution?
The standard deviation must be 1
A Type I error is committed when
a true null hypothesis is rejected
k-means clustering is the process of ______
agglomerating observations into a series of nested groups based on a measure of similarity.
For a population with an unknown distribution, the form of the sampling distribution of the sample mean is
approximately normal for large sample sizes
In order to visualize three variables in a two-dimensional graph, we use a ______.
bubble chart
A sample of 92 observations is taken from an infinite population. The sampling distribution of is approximately normal because of what theorem?
central limit theorem
__________ are visual methods of displaying data.
charts
An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a _____.
clustered column chart.
Single linkage is a measure of calculating dissimilarity between clusters by _____
considering only the two most similar observations in the two clusters.
An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a
continuous random variable.
__________ are collected from several entities at the same point in time.
cross sectional data
The data dashboard for a marketing manager may have KPIs related to ______.
current sales measures and sales by region.
Optimization models can be used to
decide on how to invest cash received from insurance policies.
A bucket contains 3 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and is not replaced. Another ball is taken from the bucket. Are the events of pulling a red ball first and then a purple one independent or dependent?
dependent
Jaccard's coefficient is different from the matching coefficient in that the former ______
does not count matching zero entries while the latter does.
In a(n) __________, one or more variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest identified first.
experimental study
An effective display of trend and magnitude is achieved by using a combination of a _____.
heat map and sparklines
The __________ the lift ratio, the __________ the association rule.
higher; stronger
Tactical decisions are concerned with
how the organization should achieve the goals and objectives set by its strategy
Deleting the grid lines in a table and the horizontal lines in a chart
increases the data-ink ratio.
In a linear regression model, the variable (or variables) used for predicting or explaining values of the response variable are known as the __________. It(they) is(are) denoted by x.
independent variable
Data-ink is the ink used in a table or chart that _____.
is necessary to convey the meaning of the data to the audience
A disadvantage of stacked - column charts and stacked- bar charts is that _______.
it can be difficult to perceive small differences in areas.
In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as ______.
key performance indicators
The best way to differentiate chart elements is using ______.
labels
The finite correction factor should be used in the computation of the standard deviation of the sample mean and the standard population when n / N is
less than .05 wrong
An analysis of items frequently co-occurring in transactions is known as _______
market basket analysis
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the ______
matching coefficient.
The endpoint of a k-means clustering algorithm occurs when ______
no further changes are observed in cluster structure and number.
Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as
overfitting.
Two approaches to drawing a conclusion in a hypothesis test are
p-value and critical value.
Making visual comparisons between categorical variables may be difficult in a _____
pie chart
What is the general form of an interval estimate?
point estimate +/- margin of error
Advanced analytics generally refers to
predictive and prescriptive analytics.
In the financial sector, __________ are used to construct financial instruments such as derivatives.
predictive models
A __________ describes the range and relative likelihood of all possible values for a random variable.
probability distribution for a random variable
In many cases, white space in a chart can improve ______.
readability
The value of the ___________ is used to estimate the value of the population parameter.
sample statistic
A __________ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.
scatter chart
A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a _____.
scatter chart
A line chart that has no axes but is used to provide information on overall trends for time series data is called a ______.
sparkline
Jeff would like to create a graph to display the number of males and females in his class who got an A, B, C, D, and F on the last test. Which of the following graphs could he use?
stacked column chart
The graph of the simple linear regression equation is a(n)
straight line
All the events in the sample space that are not part of the specified event are called
the complement of the event
Simulation optimization helps
to find good decisions in highly complex and highly uncertain settings.
A __________ is useful for visualizing hierarchical data along multiple dimensions.
treetop
A _____________ is a line that provides an approximation of the relationship between the variables.
trendline
The goal of __________ is to use the variable values to identify relationships between observations.
unsupervised learning
The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 100 miles per gallon or better?
wrong 6
The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?
wrong H 0: u = 12, Ha: u≠ 12
In a simple linear regression model, y = ß0 + ß1x + ε the parameter ß1 represents the
wrong error term
The random numbers generated using Excel's RAND function follows a __________ probability distribution between 0 and 1.
wrong normal
When the expected value of the point estimator is equal to the population parameter it estimates, it is said to be
wrong precise
A __________ determines how far a particular value is from the mean relative to the data set's standard deviation.
z-score
When the mean value of the dependent variable is independent of variation in the independent variable, the slope of the regression line is
zero