DSBA 6276 Exam 1
Compute the relative frequencies for the data given in the table below: Grades Number of students A 16 B 28 C 33 D 13 Total 90
0.18, 0.31, 0.37, 0.14
Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
20
A data visualization tool that updates in real time and gives multiple outputs is called _____.
A data dashboard
Which of the following graphs provides information on outliers and IQR of a data set?
Box plot
Data dashboards are a type of _____analytics
Descriptive
The data dashboard for a marketing manager may have KPIs related to _____.
current sales measures and sales by region
The _____ the lift ratio, the _____ the association rule.
higher; stronger
A _____ is a graphical summary of data previously summarized in a frequency distribution.
histogram
Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center: This chart allows the IT manager to _____.
identify the frequency of a particular type of problem by location
Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 21-24 bin?
0.25
Calculate the Simple Matching Coefficient and the Jaccard coefficient similarity measures for the following two binary vectors. P1 = (1, 0, 1, 0, 0, 0, 1, 0, 1, 0) P2 = (0, 0, 1, 0, 1, 1, 1, 0, 0, 1) Round your answer to two decimal places. Matching Coefficient= Jaccard coefficient=
0.50 0.29
Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?
001
Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22
1.0148
Manhattan distance is the distance traveled as if traveled along rectangular city blocks. and it can be used to calculate the dissimilarity between two observations. Let u = (25, $350,$180) correspond to a 25-year-old customer that spent $350 at Store A and spent $180 at Store B in the previous fiscal year. Let v = (53, $420, $80) correspond to a 53-year-old customer that spent $420 at Store A and spent $80 at Store B in the previous fiscal year. Calculate the dissimilarity between these two observations using Manhattan distance.
198
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700.
2.5%
The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494.
2.5%
Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
21.36%
Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
6.75
Compute the IQR for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
7.75
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
Which of the following best exemplifies big data?
Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.
Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use?
Clustered-column (bar) chart
Complete linkage can be used to measure the distance between _____ in cluster analysis.
Clusters
A collection of text documents to be analyzed is called a _____.
Corpus
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score?
David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score.
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____.
Dendrogram
_____________ is the most critical step of the decision-making process.
Identifying and defining the problem
Which statement is true of an association rule?
It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
DJ needs to display data over time. Which of the following charts should he use?
Line chart
Which one of the following is used in predictive analytics?
Linear regression
Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use?
Stacked-column chart
The process of converting a word to its stem, or root word, is referred to as _____.
Stemming
A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.
Strategic
The decisions concerning an organization's goals and future plans are called
Strategic Decisions
_____ refers to the number of times a collection of items occurs together in a transaction data set.
Support count
The process of dividing text into separate terms is referred to as _____.
Tokenization
A better understanding of consumer behavior through analytics directly leads to _____.
better pricing strategies
The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.
lift
k-means clustering is the process of _____.
organizing observations into distinct groups based on a measure of similarity
We create multiple dashboards _____.
so that each dashboard can be viewed on a single screen
A popular measure for weighing terms based on frequency and uniqueness is _____.
term frequency times inverse document frequency
A visual representation of a document or set of documents in which the size of the word is proportional to the frequency with which the word appears is called a _____.
word cloud
Which of the following sources of big data is not publicly available?
Medical records
Explain how the confidence of 52.99% and lift ratio of 2.20 was computed for the rule "If a customer buys a cooking book and a biography book, then they buy an art book." Interpret these quantities.A confidence of 52.99% means that for 52.99% of the transactions when a cooking book and biography are purchased, an art book _____ purchased. A lift ratio 2.20 means that a transaction in which a cooking book and biography is purchased is 120% ________ likely to also have purchased an art book than a randomly-selected transaction.
is; more
An analysis of items frequently co-occurring in transactions is known as _____.
market basket analysis