exam 1 - quiz overview
_____ refers to the number of times a collection of items occurs together in a transaction data set. A consequent Antecedent Validation count Support count
Support count
A data visualization tool that updates in real time and gives multiple outputs is called _____. a data dashboard a metrics table a data table the GIS
a data dashboard
Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22 1.0221 1.0148 1.1475
1.0148 Rationale: The geometric mean is a measure of location that is calculated by finding the nth root of the product of n values.
The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494. 95% 97.5% 2.5% 5%
2.5 Rationale: z-score = (494 - 686) / 96 = -2. Recall that 95% of observations fall within two standarddeviations of the mean, which means 2.5% of observations fall in each tail. Since we want to know the percentage of students who scored less than 494, we essentially want to know the percentage of observations that fall below -2 standard deviations. 2.5% of observations fall below -2 standard deviations.
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700. 97.5% 2.5% 95% 5%
2.5 Rationale: z-score = (700 - 500) / 100 = 2. Recall that 95% of the observations fall within twostandard deviations of the mean, so 2.5% of the observations will fall above 2 standard deviations and 2.5% of observations will fall below -2 standard deviations. 2.5% of students will score greater than 700.
Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 15.5 21.5 21.25 11.75
21.25 Rationale: Quartiles divide data into four parts, with each part containing approximately one-fourth, or 25 percent, of the observations. This can be calculated with the Excel function =QUARTILE.EXC(range,3) = 21.25.
Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 5.42 5.96 6.41 6.75
6.75 Rationale: The standard deviation is defined to be the positive square root of the variance and can be calculated using the Excel function =STDEV.S( ).
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score? David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score. David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, Steven has the higher standardized score. Cannot be determined with the information provided. David's standardized score is 1.64 and Steven's standardized score is 2.00. Therefore, Steven has the higher standardized score.
David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score. Rationale: David's standardized score is (52 - 70) / 11 = -1.64 and Steven's standardized score is (52 -64) / 6 = -2.00. Therefore, David has the higher standardized score.
_____ is the most critical step of the decision-making process. Identifying and defining the problem Evaluating the alternatives Determining the set of alternatives Choosing an alternative
Identifying and defining the problem
A better understanding of consumer behavior through analytics directly leads to _____. reduced risk better pricing strategies reduced advertising costs more profits
better pricing strategies
Which of the following graphs provides information on outliers and IQR of a data set? Line chart Scatter chart Box plot Histogram
box plot
Complete linkage can be used to measure the distance between _____ in cluster analysis. wards objects observations clusters
clusters
A collection of text documents to be analyzed is called a _____. consequent corpus library book
corpus
The data dashboard for a marketing manager may have KPIs related to _____. overall performance of the company's stock over the previous 52 weeks current sales measures and sales by region current financial standing of the company data on the company's call center
current sales measures and sales by region
An analysis of items frequently co-occurring in transactions is known as _____. cluster analysis market segmentation regression analysis market basket analysis
market basket analysis
Which of the following sources of big data is not publicly available? Twitter Weather data Sports records Medical records
medical records
Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 20.28% 21.36% 18.64% 21.67%
21.36 Rationale: The coefficient of variation indicates how large the standard deviation is relative to the mean. The coefficient of variation is (6.75 / 31.6 × 100) = 21.36%.
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. 88.57 75.39 72.28 66.21
75.39
Which of the following best exemplifies big data? Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. A pharmacy keeps track of customer purchases to send its customers coupons. Five hundred Facebook users upload one thousand pictures per day. A local grocery store collects data from those that scan their loyalty card.
Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.
Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use? Clustered-column (bar) chart Scatter chart Line chart Bubble chart
Clustered-column (bar) chart
Which statement is true of an association rule? It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. it seeks to classify a categorical outcome into two or more categories. It uses analytic models to describe the relationship between metrics that drive business performance. It is a data reduction technique that reduces large information into smaller homogeneous groups.
It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
DJ needs to display data over time. Which of the following charts should he use? Line chart Pie chart Scatter chart Bar chart
Line chart
Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use? Pie chart Heat map Stacked-column chart Scatter chart
Stacked-column chart
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____. cumulative lift tree decile-wise lift chart dendrogram scatter chart
dendrogram
Data dashboards are a type of _____analytics. prescriptive predictive descriptive decision
descriptive
The _____ the lift ratio, the _____ the association rule. higher; weaker lower; weaker lower; stronger higher; stronger
higher; stronger
A _____ is a graphical summary of data previously summarized in a frequency distribution. scatter chart box plot histogram line chart
histogram
The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. antecedent lift consequent support count
lift
Which one of the following is used in predictive analytics? Linear regression Data visualization Data dashboard Optimization model
linear regression
k-means clustering is the process of _____. agglomerating observations into a series of nested groups based on a measure of similarity estimating the value of a continuous outcome variable reducing the number of variables to consider in data-mining organizing observations into distinct groups based on a measure of similarity
organizing observations into distinct groups based on a measure of similarity
We create multiple dashboards _____. to make sure the KPIs are not displayed in the data dashboard to help the user scroll vertically and horizontally to see the entire dashboard so that each dashboard can be viewed on a single screen so that all dashboards can be viewed on a single screen
so that each dashboard can be viewed on a single screen
The process of converting a word to its stem, or root word, is referred to as _____. stemming stacking tokenization data cleaning
stemming
A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future. strategic intuitive operational tactical
strategic
The decisions concerning an organization's goals and future plans are called _____. operational decisions financial decisions strategic decisions tactical decisions
strategic decisions
A popular measure for weighing terms based on frequency and uniqueness is _____. corpus word cloud term frequency times inverse document frequency cosine distance
term frequency times inverse document frequency
The process of dividing text into separate terms is referred to as _____. stacking data cleaning tokenization stemming
tokenization
A visual representation of a document or set of documents in which the size of the word is proportional to the frequency with which the word appears is called a _____. word cloud cosine distance dendrogram corpus
word cloud