exam 1 - quiz overview

¡Supera tus tareas y exámenes ahora con Quizwiz!

_____ refers to the number of times a collection of items occurs together in a transaction data set. A consequent Antecedent Validation count Support count

Support count

A data visualization tool that updates in real time and gives multiple outputs is called _____. a data dashboard a metrics table a data table the GIS

a data dashboard

Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22 1.0221 1.0148 1.1475

1.0148 Rationale: The geometric mean is a measure of location that is calculated by finding the nth root of the product of n values.

The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494. 95% 97.5% 2.5% 5%

2.5 Rationale: z-score = (494 - 686) / 96 = -2. Recall that 95% of observations fall within two standarddeviations of the mean, which means 2.5% of observations fall in each tail. Since we want to know the percentage of students who scored less than 494, we essentially want to know the percentage of observations that fall below -2 standard deviations. 2.5% of observations fall below -2 standard deviations.

The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700. 97.5% 2.5% 95% 5%

2.5 Rationale: z-score = (700 - 500) / 100 = 2. Recall that 95% of the observations fall within twostandard deviations of the mean, so 2.5% of the observations will fall above 2 standard deviations and 2.5% of observations will fall below -2 standard deviations. 2.5% of students will score greater than 700.

Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 15.5 21.5 21.25 11.75

21.25 Rationale: Quartiles divide data into four parts, with each part containing approximately one-fourth, or 25 percent, of the observations. This can be calculated with the Excel function =QUARTILE.EXC(range,3) = 21.25.

Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 5.42 5.96 6.41 6.75

6.75 Rationale: The standard deviation is defined to be the positive square root of the variance and can be calculated using the Excel function =STDEV.S( ).

Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score? David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score. David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, Steven has the higher standardized score. Cannot be determined with the information provided. David's standardized score is 1.64 and Steven's standardized score is 2.00. Therefore, Steven has the higher standardized score.

David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score. Rationale: David's standardized score is (52 - 70) / 11 = -1.64 and Steven's standardized score is (52 -64) / 6 = -2.00. Therefore, David has the higher standardized score.

_____ is the most critical step of the decision-making process. Identifying and defining the problem Evaluating the alternatives Determining the set of alternatives Choosing an alternative

Identifying and defining the problem

A better understanding of consumer behavior through analytics directly leads to _____. reduced risk better pricing strategies reduced advertising costs more profits

better pricing strategies

Which of the following graphs provides information on outliers and IQR of a data set? Line chart Scatter chart Box plot Histogram

box plot

Complete linkage can be used to measure the distance between _____ in cluster analysis. wards objects observations clusters

clusters

A collection of text documents to be analyzed is called a _____. consequent corpus library book

corpus

The data dashboard for a marketing manager may have KPIs related to _____. overall performance of the company's stock over the previous 52 weeks current sales measures and sales by region current financial standing of the company data on the company's call center

current sales measures and sales by region

An analysis of items frequently co-occurring in transactions is known as _____. cluster analysis market segmentation regression analysis market basket analysis

market basket analysis

Which of the following sources of big data is not publicly available? Twitter Weather data Sports records Medical records

medical records

Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 20.28% 21.36% 18.64% 21.67%

21.36 Rationale: The coefficient of variation indicates how large the standard deviation is relative to the mean. The coefficient of variation is (6.75 / 31.6 × 100) = 21.36%.

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. 88.57 75.39 72.28 66.21

75.39

Which of the following best exemplifies big data? Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. A pharmacy keeps track of customer purchases to send its customers coupons. Five hundred Facebook users upload one thousand pictures per day. A local grocery store collects data from those that scan their loyalty card.

Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.

Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use? ​ Clustered-column (bar) chart Scatter chart Line chart Bubble chart

Clustered-column (bar) chart

Which statement is true of an association rule? It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. it seeks to classify a categorical outcome into two or more categories. It uses analytic models to describe the relationship between metrics that drive business performance. It is a data reduction technique that reduces large information into smaller homogeneous groups.

It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

DJ needs to display data over time. Which of the following charts should he use? Line chart Pie chart Scatter chart Bar chart

Line chart

Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use? ​ Pie chart Heat map Stacked-column chart Scatter chart

Stacked-column chart

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____. cumulative lift tree decile-wise lift chart dendrogram scatter chart

dendrogram

Data dashboards are a type of _____analytics. prescriptive predictive descriptive decision

descriptive

The _____ the lift ratio, the _____ the association rule. higher; weaker lower; weaker lower; stronger higher; stronger

higher; stronger

A _____ is a graphical summary of data previously summarized in a frequency distribution. scatter chart box plot histogram line chart

histogram

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. antecedent lift consequent support count

lift

Which one of the following is used in predictive analytics? Linear regression Data visualization Data dashboard Optimization model

linear regression

k-means clustering is the process of _____. agglomerating observations into a series of nested groups based on a measure of similarity estimating the value of a continuous outcome variable reducing the number of variables to consider in data-mining organizing observations into distinct groups based on a measure of similarity

organizing observations into distinct groups based on a measure of similarity

We create multiple dashboards _____. to make sure the KPIs are not displayed in the data dashboard to help the user scroll vertically and horizontally to see the entire dashboard so that each dashboard can be viewed on a single screen so that all dashboards can be viewed on a single screen

so that each dashboard can be viewed on a single screen

The process of converting a word to its stem, or root word, is referred to as _____. stemming stacking tokenization data cleaning

stemming

A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future. strategic intuitive operational tactical

strategic

The decisions concerning an organization's goals and future plans are called _____. operational decisions financial decisions strategic decisions tactical decisions

strategic decisions

A popular measure for weighing terms based on frequency and uniqueness is _____. corpus word cloud term frequency times inverse document frequency cosine distance

term frequency times inverse document frequency

The process of dividing text into separate terms is referred to as _____. stacking data cleaning tokenization stemming

tokenization

A visual representation of a document or set of documents in which the size of the word is proportional to the frequency with which the word appears is called a _____. word cloud cosine distance dendrogram corpus

word cloud


Conjuntos de estudio relacionados

MANG3778 - Management Information Systems - Quiz 5

View Set

12. dagur- Positive emotions and why some people are happier

View Set

ms prepu 44: Patients with Biliary Disorders

View Set

History of Structures - Final Exam Study Set

View Set