Test 1
Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)
-1.33
The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is
0.75
Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?
001
Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the frequency of the 25-28 bin?
1
The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?
1
Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22
1.0148
Compute the mode for the following data. 12, 16, 19, 10, 12, 11, 21, 12, 21, 10
12
Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the 25th percentile.
15
Compute the 50th percentile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
15.5
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 400.
16%
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the standard deviation?
2.58
Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
21.25
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the mean resolution time?
25.68
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the median resolution time?
26.4
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the mode of these 15 executives?
26.8
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the third quartile?
28.1
Compute the median of the following data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
31
Consider the data below. What percentage of students scored grade C? Grades | # of students A | 16 B | 28 C | 33 D | 13 Total 90
37%
Below is the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 What is the median number of days that it took Wyche Accounting to perform audits in the last quarter of last year?
39.5
Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57
40.6
What is the mode of the data set given below? 35, 47, 65, 47, 22
47
The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name | Time in Minutes Jack | 25.3 Samantha | 28.2 Richard | 26.8 Steve | 29.5 Mary | 22.4 Sergio | 21.7 John | 24.3 Michelle | 22.4 Linda | 26.8 Mark | 29.4 Matt | 23.6 Polly | 26.4 Sheila | 23.5 Jeff | 26.8 Gerald | 28.1 Round to 2 decimal places if necessary What is the variance?
6.67
A sample of 13 adult males' heights are listed below. 70, 72, 71, 70, 69, 73, 69, 68, 70, 71, 67, 71, 74 Find the range of the data.
7
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
Jaccard's coefficient is different from the matching coefficient in that the former A. does not count matching zero entries while the latter does. B. measures overlap while the latter measures dissimilarity. C. is affected by the scale used to measure variables while the latter is not. D. deals with categorical variable while the latter deals with continuous variables.
A
In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. What is the population being studied?
All Patients in a local hospital
In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. What is the sample for the population being studied?
All Survey Respondents
Observation refers to the A. mean of all variable values associated with one particular entity. B. set of recorded values of variables associated with a single entity C. estimated continuous outcome variable. D. goal of predicting a categorical outcome based on a set of variables.
B
Which of the following best exemplifies big data? A. A local grocery store collects data from those that scan their loyalty card. B. Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. C. A pharmacy keeps track of customers purchases to send their customers coupons. D. Five hundred Facebook users upload on thousand pictures per day
B
Which of the following graphs provides information on outliers and IQR of a data set?
Box Plot
Which graph represents a negative linear relationship between x and y?
C (chart starts high on the left and goes down as it moves right)
______ are visual methods of displaying data.
Charts
The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called
Cluster Analysis
The ______________________ shows the number of data items with values less than or equal to the upper class limit of each class.
Cumulative Frequency Distribution
Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data? A. Number of overlapping bins, width of each bin, and bin upper limits B. Width of each bin and bin lower limits C. Width of each bin and number of bins D. Number of nonoverlapping bins, width of each bin, and bin limits
D
Which statement is true of an association rule? A. It uses analytic models to describe the relationship between metrics that drive business performance. B. It is a data reduction technique that reduces large information into smaller homogeneous groups. C. It seeks to classify a categorical outcome into two or more categories. D. It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
D
In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?
Data Preparation
The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies
Data Queries
____________________ are analytical tools that describe what has happened.
Descriptive analytics
The software package most commonly used for creating simple charts is
Excel
To generate a scatter chart matrix, we use
Excel Add-In XLMiner.
You would _________________ a table if you wanted to display only data that match specific criteria.
Filter
Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable.
Filters
The scores of a sample of students in a Math test are 20, 15, 19, 21, 22, 12, 17, 14, 24, 16 and in a Stat test are 16, 12, 19, 17, 22, 14, 20, 21, 24, 15, 13. Compute the mean and median scores for both the Math and the Stat tests.
For Math Test Mean = 18 Median = 18 For Stat Test: Mean = 17.5 Median = 17
Bar charts use
Horizontal bars to display the magnitude of the quantitative variable
______________________ is the most critical step of the decision-making process.
Identifying and defining the problem
_____________________ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed.
Internet of Things (IOT)
Which of the following is true of Euclidean distances?
It is commonly used as a method of measuring dissimilarity between quantitative observations.
DJ needs to display data over time. Which of the following charts should he use?
Line Chart
Which Excel command will return all modes when more than one mode exists?
MODE.MULT
______________________ refers to a programming model used within Hadoop that performs the two major steps for which it is named: the map step and the reduce step.
MapReduce
Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the mean, median, and mode.
Mean = 18.53 Median = 19 Mode = 15
A dashboard is a collection of tables, charts, and maps to help management ____________ selected aspects of the company's performance.
Monitor
What is(are) the mode(s) number of days that it took Wyche Accounting to perform audits in the last quarter of last year?
None
A decision concerned with how the organization is run from day to day is known as a(n) _______________.
Operational Decision
A ______________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable.
Parallel-Coordinates Plot
To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs
PivotCharts with PivotTables
Which one of the following statements is not true concerning PivotTables in Excel?
PivotTables summarize only categorical and quantitative data.
_______________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.
Predictive
_______________ analytics use techniques that take input data and yield a best course of action.
Prescriptive
Which of the following analytical techniques helps us arrive at the best decision?
Prescriptive analytics
Which of the following gives the proportion of items in each bin?
Relative Frequency
The ________________ is a point estimate of the population mean for the variable of interest.
Sample Mean
A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n)
Sample Summary
Business analytics is the __________________________ process of transforming data into insight for making better decisions.
Scientific
A data __________________ is trained in both computer science and statistics and knows how to effectively process and analyze large amounts of data.
Scientist
An increase in data ____________________ would help to protect stored data from destructive forces or unauthorized users.
Security
____________________ are used in the pharmaceutical industry to assess the risk of introducing a new drug.
Simulations
Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n)
Strategic Decision
A ____________________ decision is concerned with how the organization should achieve the goals and objectives set by its strategy.
Tactical
_____ merges maps and statistics to present data collected over different geographies.
The geographic information system
_______________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.
Unsupervised learning
A quantity of interest that can take on different values is known as a(n)
Variable
One of the 4 Vs of big data that refers to uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, and model approximations is _________.
Veracity
Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level
Ward's method
In which of the following scenarios would it be appropriate to use hierarchical clustering?
When binary or ordinal data needs to be clustered.
The Excel function STANDARDIZE can be used to calculate ____________.
Z=Scores
_______________ act(s) as a representative of the population.
a sample
To identify patterns across transactions, we can use
association rules
A chart that is recommended as an alternative to a pie chart is a
bar chart
The charts that are helpful in making comparisons between categorical variables are
bar charts and column charts
A better understanding of consumer behavior through analytics directly leads to
better pricing strategies.
The correlation coefficient will always take values
between -1 and +1
Data that are too large or too complex to be handled by standard data-processing techniques and typical desktop software are called _______________________ .
big data
In order to visualize three variables in a two-dimensional graph, we use a
bubble chart
An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a
clustered column chart
Complete linkage can be used to measure the distance between _________ in cluster analysis.
clusters
Average linkage is a measure of calculating dissimilarity between two clusters by
computing the average distance between every pair of observations between two clusters
Single linkage is a measure of calculating dissimilarity between clusters by
considering only the two most similar observations in the two clusters.
In preparing categorical variables for analysis, it is usually best to
convert the categories to binary, dummy variables.
The data dashboard for a marketing manager may have KPIs related to
current sales measures and sales by region
Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.
data dashboards
The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies
data queries
Optimization models can be used to
decide on how to invest cash received from insurance policies.
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a
dendrogram
The variance is based on the
deviation about the mean
The process of eliminating variables from formal analysis without losing any crucial information is called
dimension reduction
A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.
durability
A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a
heat map
The __________ the lift ratio, the ____________ the association rule.
higher, stronger
A _______________ is a graphical summary of data previously summarized in a frequency distribution.
histogram
Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center:
identify a particular type of problem by location.
Deleting the grid lines in a table and the horizontal lines in a chart
increases the data-ink ratio
The letter grades of business analysis students is recorded by a professor (4=A, 3=B, 2=C, 1=D). This variable's classification
is categorical data
A disadvantage of stacked-column charts and stacked-bar charts is that
it can be difficult to perceive small differences in areas
The best way to differentiate chart elements is using
labels
The strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.
lift
A time series plot is also known as a
line chart
An analysis of items frequently co-occurring in transactions is known as
market basket analysis.
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
matching coefficient
Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.
most different
The endpoint of a k-means clustering algorithm occurs when
no further changes are observed in cluster structure and number.
In k-means clustering, k represents the
number of clusters
The data collected from the customers in restaurants about the quality of food is an example of a(n)
observational study
k-means clustering is the process of
organizing observations into distinct groups based on a measure of similarity.
Any data value with a z-score less than -3 or greater than +3 is considered to be a(n)
outlier
Making visual comparisons between categorical variables is difficult in a
pie chart
Advanced analytics generally refers to
predictive and prescriptive analytics
In the financial sector, ___________________________ are used to construct financial instruments such as derivatives.
predictive models
In the spectrum of business analytics, which is the most complex?
prescriptive
In many cases, white space in a chart can improve
readability
Data-driven decision making tends to decrease a firm's
risk
A _____________ is a graphical presentation of the relationship between two quantitative variables.
scatter chart
A useful chart for displaying multiple variables is the
scatter chart matrix.
We create multiple dashboards
so that each dashboard can be viewed on a single screen.
When working with large spreadsheets with many rows of data, it can be helpful to ____________ the data to better find, view, or manage subsets of data.
sort and filter
A line chart that has no axes but is used to provide information on overall trends for time series data is called a
sparkline
To avoid problems in interpreting the differences in color in a heat map, ____________ can be added.
sparklines
A method for modifying variables that reduces bias prior to cluster analysis is
standardization
A _______________ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.
strategic
The decisions concerning an organization's goals and future plans are called
strategic decisions.
If a model's implications depend on the inclusion or exclusion of outliers, one should spend additional time to track down
the cause of the outliers
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?
the hypotenuse
Tactical decisions define
the steps taken to achieve the goals and objectives.
Tables should be used instead of charts when
the values being displayed have different units or very different magnitudes
If covariance between two variables is near 0, it implies that
the variables are not linearly related
Using multiple lines on a line chart or employing multiple charts is an alternative to a
three-dimensional chart
Data collected from several entities over a period of time (minutes, hours, days, etc.) are called
time series data
Simulation optimization helps
to find good decisions in highly complex and highly uncertain settings.
Utility theory is the study of the __________________ or relative desirability of a particular outcome that reflects the decision maker's attitude toward a collection of factors, such as profit, loss, and risk.
total worth
A _____________ is a line that provides an approximation of the relationship between the variables.
trendline
Veracity has to do with how much __________________ is in the data.
uncertainty
The goal of ___________________ is to use the variable values to identify relationships between observations.
unsupervised learning
The goal regarding using an appropriate number of bins is to show the
variation in the data
____________________ analytics is the analysis of online activity, such as visits to websites or social media.
web
A _____________________ determines how far a particular value is from the mean relative to the data set's standard deviation.
z-score