DSBA 6276 Exam 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Compute the relative frequencies for the data given in the table below: Grades Number of students A 16 B 28 C 33 D 13 Total 90

0.18, 0.31, 0.37, 0.14

Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

20

A data visualization tool that updates in real time and gives multiple outputs is called _____.

A data dashboard

Which of the following graphs provides information on outliers and IQR of a data set?

Box plot

Data dashboards are a type of _____analytics

Descriptive

The data dashboard for a marketing manager may have KPIs related to _____.

current sales measures and sales by region

The _____ the lift ratio, the _____ the association rule.

higher; stronger

A _____ is a graphical summary of data previously summarized in a frequency distribution.

histogram

Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center: This chart allows the IT manager to _____.

identify the frequency of a particular type of problem by location

Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 21-24 bin?

0.25

Calculate the Simple Matching Coefficient and the Jaccard coefficient similarity measures for the following two binary vectors. P1 = (1, 0, 1, 0, 0, 0, 1, 0, 1, 0) P2 = (0, 0, 1, 0, 1, 1, 1, 0, 0, 1) Round your answer to two decimal places. Matching Coefficient= Jaccard coefficient=

0.50 0.29

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?

001

Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22

1.0148

Manhattan distance is the distance traveled as if traveled along rectangular city blocks. and it can be used to calculate the dissimilarity between two observations. Let u = (25, $350,$180) correspond to a 25-year-old customer that spent $350 at Store A and spent $180 at Store B in the previous fiscal year. Let v = (53, $420, $80) correspond to a 53-year-old customer that spent $420 at Store A and spent $80 at Store B in the previous fiscal year. Calculate the dissimilarity between these two observations using Manhattan distance.

198

The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700.

2.5%

The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494.

2.5%

Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37

21.36%

Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37

6.75

Compute the IQR for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

7.75

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

Which of the following best exemplifies big data?

Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.

Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use?

Clustered-column (bar) chart

Complete linkage can be used to measure the distance between _____ in cluster analysis.

Clusters

A collection of text documents to be analyzed is called a _____.

Corpus

Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score?

David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score.

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____.

Dendrogram

_____________ is the most critical step of the decision-making process.

Identifying and defining the problem

Which statement is true of an association rule?

It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

DJ needs to display data over time. Which of the following charts should he use?

Line chart

Which one of the following is used in predictive analytics?

Linear regression

Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use?

Stacked-column chart

The process of converting a word to its stem, or root word, is referred to as _____.

Stemming

A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.

Strategic

The decisions concerning an organization's goals and future plans are called

Strategic Decisions

_____ refers to the number of times a collection of items occurs together in a transaction data set.

Support count

The process of dividing text into separate terms is referred to as _____.

Tokenization

A better understanding of consumer behavior through analytics directly leads to _____.

better pricing strategies

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

lift

k-means clustering is the process of _____.

organizing observations into distinct groups based on a measure of similarity

We create multiple dashboards _____.

so that each dashboard can be viewed on a single screen

A popular measure for weighing terms based on frequency and uniqueness is _____.

term frequency times inverse document frequency

A visual representation of a document or set of documents in which the size of the word is proportional to the frequency with which the word appears is called a _____.

word cloud

Which of the following sources of big data is not publicly available?

Medical records

Explain how the confidence of 52.99% and lift ratio of 2.20 was computed for the rule "If a customer buys a cooking book and a biography book, then they buy an art book." Interpret these quantities.A confidence of 52.99% means that for 52.99% of the transactions when a cooking book and biography are purchased, an art book _____ purchased. A lift ratio 2.20 means that a transaction in which a cooking book and biography is purchased is 120% ________ likely to also have purchased an art book than a randomly-selected transaction.

is; more

An analysis of items frequently co-occurring in transactions is known as _____.

market basket analysis


Kaugnay na mga set ng pag-aaral

Chapter 11: Leadership in Organizational Settings

View Set

ACCT 210 Final Part 2 (Ch 9,10,11,12)

View Set

Hubspot Inbound Marketing part 4

View Set

Anatomy Chapter 11 Autonomic Nervous System

View Set