Test 1

Ace your homework & exams now with Quizwiz!

Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)

-1.33

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is

0.75

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?

001

Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the frequency of the 25-28 bin?

1

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

1

Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22

1.0148

Compute the mode for the following data. 12, 16, 19, 10, 12, 11, 21, 12, 21, 10

12

Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the 25th percentile.

15

Compute the 50th percentile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

15.5

The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 400.

16%

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the standard deviation?

2.58

Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

21.25

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the mean resolution time?

25.68

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the median resolution time?

26.4

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the mode of these 15 executives?

26.8

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the third quartile?

28.1

Compute the median of the following data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37

31

Consider the data below. What percentage of students scored grade C? Grades | # of students A | 16 B | 28 C | 33 D | 13 Total 90

37%

Below is the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 What is the median number of days that it took Wyche Accounting to perform audits in the last quarter of last year?

39.5

Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57

40.6

What is the mode of the data set given below? 35, 47, 65, 47, 22

47

The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the variance?

6.67

A sample of 13 adult males' heights are listed below. 70, 72, 71, 70, 69, 73, 69, 68, 70, 71, 67, 71, 74 Find the range of the data.

7

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

Jaccard's coefficient is different from the matching coefficient in that the former A. does not count matching zero entries while the latter does. B. measures overlap while the latter measures dissimilarity. C. is affected by the scale used to measure variables while the latter is not. D. deals with categorical variable while the latter deals with continuous variables.

A

In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. What is the population being studied?

All Patients in a local hospital

In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. What is the sample for the population being studied?

All Survey Respondents

Observation refers to the A. mean of all variable values associated with one particular entity. B. set of recorded values of variables associated with a single entity C. estimated continuous outcome variable. D. goal of predicting a categorical outcome based on a set of variables.

B

Which of the following best exemplifies big data? A. A local grocery store collects data from those that scan their loyalty card. B. Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. C. A pharmacy keeps track of customers purchases to send their customers coupons. D. Five hundred Facebook users upload on thousand pictures per day

B

Which of the following graphs provides information on outliers and IQR of a data set?

Box Plot

Which graph represents a negative linear relationship between x and y?

C (chart starts high on the left and goes down as it moves right)

______ are visual methods of displaying data.

Charts

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called

Cluster Analysis

The ______________________ shows the number of data items with values less than or equal to the upper class limit of each class.

Cumulative Frequency Distribution

Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data? A. Number of overlapping bins, width of each bin, and bin upper limits B. Width of each bin and bin lower limits C. Width of each bin and number of bins D. Number of nonoverlapping bins, width of each bin, and bin limits

D

Which statement is true of an association rule? A. It uses analytic models to describe the relationship between metrics that drive business performance. B. It is a data reduction technique that reduces large information into smaller homogeneous groups. C. It seeks to classify a categorical outcome into two or more categories. D. It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

D

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?

Data Preparation

The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies

Data Queries

____________________ are analytical tools that describe what has happened.

Descriptive analytics

The software package most commonly used for creating simple charts is

Excel

To generate a scatter chart matrix, we use

Excel Add-In XLMiner.

You would _________________ a table if you wanted to display only data that match specific criteria.

Filter

Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable.

Filters

The scores of a sample of students in a Math test are 20, 15, 19, 21, 22, 12, 17, 14, 24, 16 and in a Stat test are 16, 12, 19, 17, 22, 14, 20, 21, 24, 15, 13. Compute the mean and median scores for both the Math and the Stat tests.

For Math Test Mean = 18 Median = 18 For Stat Test: Mean = 17.5 Median = 17

Bar charts use

Horizontal bars to display the magnitude of the quantitative variable

______________________ is the most critical step of the decision-making process.

Identifying and defining the problem

_____________________ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed.

Internet of Things (IOT)

Which of the following is true of Euclidean distances?

It is commonly used as a method of measuring dissimilarity between quantitative observations.

DJ needs to display data over time. Which of the following charts should he use?

Line Chart

Which Excel command will return all modes when more than one mode exists?

MODE.MULT

______________________ refers to a programming model used within Hadoop that performs the two major steps for which it is named: the map step and the reduce step.

MapReduce

Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the mean, median, and mode.

Mean = 18.53 Median = 19 Mode = 15

A dashboard is a collection of tables, charts, and maps to help management ____________ selected aspects of the company's performance.

Monitor

What is(are) the mode(s) number of days that it took Wyche Accounting to perform audits in the last quarter of last year?

None

A decision concerned with how the organization is run from day to day is known as a(n) _______________.

Operational Decision

A ______________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable.

Parallel-Coordinates Plot

To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs

PivotCharts with PivotTables

Which one of the following statements is not true concerning PivotTables in Excel?

PivotTables summarize only categorical and quantitative data.

_______________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.

Predictive

_______________ analytics use techniques that take input data and yield a best course of action.

Prescriptive

Which of the following analytical techniques helps us arrive at the best decision?

Prescriptive analytics

Which of the following gives the proportion of items in each bin?

Relative Frequency

The ________________ is a point estimate of the population mean for the variable of interest.

Sample Mean

A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n)

Sample Summary

Business analytics is the __________________________ process of transforming data into insight for making better decisions.

Scientific

A data __________________ is trained in both computer science and statistics and knows how to effectively process and analyze large amounts of data.

Scientist

An increase in data ____________________ would help to protect stored data from destructive forces or unauthorized users.

Security

____________________ are used in the pharmaceutical industry to assess the risk of introducing a new drug.

Simulations

Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n)

Strategic Decision

A ____________________ decision is concerned with how the organization should achieve the goals and objectives set by its strategy.

Tactical

_____ merges maps and statistics to present data collected over different geographies.

The geographic information system

_______________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.

Unsupervised learning

A quantity of interest that can take on different values is known as a(n)

Variable

One of the 4 Vs of big data that refers to uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, and model approximations is _________.

Veracity

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level

Ward's method

In which of the following scenarios would it be appropriate to use hierarchical clustering?

When binary or ordinal data needs to be clustered.

The Excel function STANDARDIZE can be used to calculate ____________.

Z=Scores

_______________ act(s) as a representative of the population.

a sample

To identify patterns across transactions, we can use

association rules

A chart that is recommended as an alternative to a pie chart is a

bar chart

The charts that are helpful in making comparisons between categorical variables are

bar charts and column charts

A better understanding of consumer behavior through analytics directly leads to

better pricing strategies.

The correlation coefficient will always take values

between -1 and +1

Data that are too large or too complex to be handled by standard data-processing techniques and typical desktop software are called _______________________ .

big data

In order to visualize three variables in a two-dimensional graph, we use a

bubble chart

An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a

clustered column chart

Complete linkage can be used to measure the distance between _________ in cluster analysis.

clusters

Average linkage is a measure of calculating dissimilarity between two clusters by

computing the average distance between every pair of observations between two clusters

Single linkage is a measure of calculating dissimilarity between clusters by

considering only the two most similar observations in the two clusters.

In preparing categorical variables for analysis, it is usually best to

convert the categories to binary, dummy variables.

The data dashboard for a marketing manager may have KPIs related to

current sales measures and sales by region

Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.

data dashboards

The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies

data queries

Optimization models can be used to

decide on how to invest cash received from insurance policies.

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

dendrogram

The variance is based on the

deviation about the mean

The process of eliminating variables from formal analysis without losing any crucial information is called

dimension reduction

A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

durability

A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a

heat map

The __________ the lift ratio, the ____________ the association rule.

higher, stronger

A _______________ is a graphical summary of data previously summarized in a frequency distribution.

histogram

Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center:

identify a particular type of problem by location.

Deleting the grid lines in a table and the horizontal lines in a chart

increases the data-ink ratio

The letter grades of business analysis students is recorded by a professor (4=A, 3=B, 2=C, 1=D). This variable's classification

is categorical data

​A disadvantage of stacked-column charts and stacked-bar charts is that

it can be difficult to perceive small differences in areas

The best way to differentiate chart elements is using

labels

The strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

lift

A time series plot is also known as a

line chart

An analysis of items frequently co-occurring in transactions is known as

market basket analysis.

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

matching coefficient

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.

most different

The endpoint of a k-means clustering algorithm occurs when

no further changes are observed in cluster structure and number.

In k-means clustering, k represents the

number of clusters

The data collected from the customers in restaurants about the quality of food is an example of a(n)

observational study

k-means clustering is the process of

organizing observations into distinct groups based on a measure of similarity.

Any data value with a z-score less than -3 or greater than +3 is considered to be a(n)

outlier

Making visual comparisons between categorical variables is difficult in a

pie chart

Advanced analytics generally refers to

predictive and prescriptive analytics

In the financial sector, ___________________________ are used to construct financial instruments such as derivatives.

predictive models

In the spectrum of business analytics, which is the most complex?

prescriptive

In many cases, white space in a chart can improve

readability

Data-driven decision making tends to decrease a firm's

risk

A _____________ is a graphical presentation of the relationship between two quantitative variables.

scatter chart

A useful chart for displaying multiple variables is the

scatter chart matrix.

We create multiple dashboards

so that each dashboard can be viewed on a single screen.

When working with large spreadsheets with many rows of data, it can be helpful to ____________ the data to better find, view, or manage subsets of data.

sort and filter

A line chart that has no axes but is used to provide information on overall trends for time series data is called a

sparkline

To avoid problems in interpreting the differences in color in a heat map, ____________ can be added.

sparklines

A method for modifying variables that reduces bias prior to cluster analysis is

standardization

A _______________ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.

strategic

The decisions concerning an organization's goals and future plans are called

strategic decisions.

If a model's implications depend on the inclusion or exclusion of outliers, one should spend additional time to track down

the cause of the outliers

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

the hypotenuse

Tactical decisions define

the steps taken to achieve the goals and objectives.

Tables should be used instead of charts when

the values being displayed have different units or very different magnitudes

If covariance between two variables is near 0, it implies that

the variables are not linearly related

Using multiple lines on a line chart or employing multiple charts is an alternative to a

three-dimensional chart

Data collected from several entities over a period of time (minutes, hours, days, etc.) are called

time series data

Simulation optimization helps

to find good decisions in highly complex and highly uncertain settings.

Utility theory is the study of the __________________ or relative desirability of a particular outcome that reflects the decision maker's attitude toward a collection of factors, such as profit, loss, and risk.

total worth

A _____________ is a line that provides an approximation of the relationship between the variables.

trendline

Veracity has to do with how much __________________ is in the data.

uncertainty

The goal of ___________________ is to use the variable values to identify relationships between observations.

unsupervised learning

The goal regarding using an appropriate number of bins is to show the

variation in the data

____________________ analytics is the analysis of online activity, such as visits to websites or social media.

web

A _____________________ determines how far a particular value is from the mean relative to the data set's standard deviation.

z-score


Related study sets

AWS Certified Solutions Architect - Associate SAA-C03 v1.0

View Set

CH. 1 Molecular and Cell Bio Q&As

View Set

Econ 102 Chapter 7 Multiple Choice

View Set

Machine Learning and AI (Class 21)

View Set

The Branches of Government; The Budget

View Set

Managing Family Business Exam #1 (Chapters 1-4)

View Set

Chap. 28: putting it all together

View Set