QBA Ch. 6 - 10, Quantitative Business Analysis
__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.
Supervised learning
The ___________ button in the Formula Auditing group allows the user to inspect each formula in detail in its cell location.
Show Formulas
__________ are used in the pharmaceutical industry to assess the risk of introducing a new drug.
Simulations
Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use?
Stacked-column chart
A __________ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.
Strategic
Spreadsheet models are referred to as what-if models because they
allow easy instantaneous recalculation for a change in model inputs.
The influence in an influence diagram is visually depicted by
an arrow
For a population with an unknown distribution, the form of the sampling distribution of the sample mean is
approximately normal for large sample sizes.
Separate error rates with respect to the false negative and false positive cases are computed to take into account the
asymmetric costs in misclassification.
Advanced analytics generally refers to
predictive and prescriptive analytics.
In the financial sector, __________ are used to construct financial instruments such as derivatives.
predictive models
A __________ describes the range and relative likelihood of all possible values for a random variable.
probability distribution for a random variable
The act of collecting data that are representative of the population data is called
random sampling.
The simplest measure of variability is the
range
The y-axis of a decile chart shows
ratio of decile mean to overall mean.
Causal models
relate a time series to other variables that are believed to explain or cause its behavior.
If a time series plot exhibits a horizontal pattern, then
there is still not enough evidence to conclude that the time series is stationary.
Using multiple lines on a line chart or employing multiple charts is an alternative to a
three-dimensional chart.
All of the following are examples of discrete random variables except
time
A set of observations on a variable measured at successive points in time or over successive periods of time constitute a
time series.
A light bulb manufacturer uses descriptive analytics
to present supply chain to managers visually.
Utility theory is the study of the __________ or relative desirability of a particular outcome that reflects the decision maker's attitude toward a collection of factors, such as profit, loss, and risk.
total worth
Data used to build a data mining model is called
training data.
A __________ is useful for visualizing hierarchical data along multiple dimensions.
tree map
The impact of two inputs on the output of interest is summarized by a
two-way data table.
A parameter is a numerical measure from a population, such as
u
The random numbers generated using Excel's RAND function follows a __________ probability distribution between 0 and 1.
uniform
The event containing the outcomes belonging to A or B or both is the __________ of A and B.
union
The moving averages method refers to a forecasting method that
uses the average of the most recent data values in the time series as the forecast for the next period.
__________ assigns values to outcomes based on the decision maker's attitude toward risk, loss, and other factors.
utility theory
The difference in a variable measured over observations (time, customers, items, etc.) is known as
variation
The goal regarding using an appropriate number of bins is to show the
variation in the data.
Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)
-1.33
The distribution of hourly sales for a local family owned store is normally distributed with a mean of $225 per hour and a standard deviation of $75 per hour. Which of the following intervals contains the middle 95% of hourly sales?
$75 to $375
In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump for President. Determine a 95% confidence interval for the proportion of all the registered voters who will vote for Trump.
(0.25, 0.34)
One minus the overall error rate is often referred to as the __________ of the model.
accuracy
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. (Round to two decimal places if necessary.)
-2
he CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the standard error of the proportion proportion ?
0.039
A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce the parts will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: Ho: μ ≥ 45 hours and Ha: μ < 45 hours. Determine the p value of the test statistic if the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours.
0.04999
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?
0.35
Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability density function for the time it takes to fill an order?
0.4866
Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive-thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes less than one minute to fill an order?
0.4866
The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 100 miles per gallon or better?
0.6%
What is the total area under the normal distribution curve?
1
What is the total relative frequency? 20XX Contest Sales SalesmanFrequencyRelative Frequency Frances Clonts150.05 Sarah Leigh1840.62 Devon Pride37 John Townes 620.21 Total298
1
A survey of 100 random high school students finds that 85 students watched the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either game?
10
How many Class 1's are incorrectly classified as Class 0? Confusion Matrix Predicted ClassActual Class 10 1 2211000 303,000
100
Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?
100
What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 25.32 and the sum of squares due to error (SSE) is 6.89?
18.43
Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean absolute error. Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16
2
If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2?
2.5
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700.
2.5%
The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores followa bell-shapeddistribution, usetheempiricalrule tofindthepercentageofstudentswhoscoredlessthan494.
2.5%
The Watch Window is observable
across different worksheets of a workbook.
A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. What is the standard error of the mean?
2.876
What is the mean of x, given the exponential probability function
20
Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22
21.25
How many Class 1's are correctly classified as Class 1 in the Table below? Confusion Matrix Predicted ClassActual Class101 2211000 303,000
221
Never use a __________ chart when a __________ chart will suffice.
3-D; 2-D
The t value for a 99% confidence interval estimation based upon a sample of size 10 is
3.169
The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus?
30%
Suppose for a particular week, the forecasted sales were $4,000. The actual sales were $3,000. What is the value of the mean absolute percentage error?
33.3%
Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean squared error. Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16
4.67
Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57
40.6
Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
6.75
For data having a bell-shaped distribution, approximately __________ percent of the data values will be within one standard deviation of the mean.
68
The random variable X is known to be uniformly distributed between 2 and 12. Compute E(X), the expected value of the distribution.
7
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. The 95% confidence interval for the true mean number of pushups that can be done is
8.56 to 21.40.
A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. One day he took 15,000 steps. What was his percentile on that day?
99.7%
In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. Who makes up the population?
All patients in a local hospital
The population parameters that describe the y-intercept and slope of the line relating y and x, respectively, are
B 0 and B1.
A better understanding of consumer behavior through analytics directly leads to
Better pricing strategies
Within a given range of cells, the number of times a particular condition is satisfied is computed by using the __________ function.
COUNTIF
Which is not true regarding trend patterns?
Can result when business conditions shift to a new level at some point in time
Which of the following best exemplifies big data?
Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.
__________ are visual methods of displaying data.
Charts
Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use?
Clustered-column (bar) chart
__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.
Cumulative lift
A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction.
Data Mining
An Excel __________ quantifies the impact of changing the value of a specific input on an output of interest.
Data Table
Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.
Data dashboards
__________ involves descriptive statistics, data visualization, and clustering.
Data exploration
__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.
Data partitioning
__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.
Data preparation
The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies
Data queries
A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce them will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: H0: μ ≥ 45 hours and Ha: μ < 45 hours. If the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours, give the appropriate conclusion, for α = 0.025.
Do not reject H0, do not switch to the new procedure
The calculations of a cell can be investigated in great detail by using the __________ button.
Evaluate Formula
To generate a scatter chart matrix, we use
Excel Add-In XLMiner.
The software package most commonly used for creating simple charts is
Excel.
__________ uses a weighted average of past time series values as the forecast.
Exponential smoothing
Excel searches for an exact match of the first argument in the first column of the data when the range in the VLOOKUP function is
FALSE
Which of the following is not an approach to making decisions?
Guess and check
A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses?
H 0: p = 0.5, Ha: p≠ 0.5
A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses?
H 0: p = 0.5, Ha: p≠ 0.5.
The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?
H 0: u ≥ 12, Ha: u < 12
The __________ function is used for the conditional computation of expressions in Excel.
IF
__________ is a measure of the heterogeneity of observations in a classification tree.
Impurity
__________ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.
Interaction
__________ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed.
Internet of Things (IoT)
Which of the following is true of the exponential smoothing coefficient?
It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error.
Which of the following is true of Euclidean distances?
It is commonly used as a method of measuring dissimilarity between quantitative observations .
The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio?
It reduces the data-ink ratio.
Which one of the following is used in predictive analytics?
Linear regression
A time series plot of a period of time (in years) versus sales (in thousands of dollars) is shown below. Which of the followingdata patterns best describes the scenario shown?
Linear trend pattern
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.
Logistic regression
__________ is a generalization of linear regression for predicting a categorical outcome variable.
Logistic regression
Which Excel command will return all modes when more than one mode exists?
MODE.MULT
Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another?
Mean forecast error
Which of the following sources of big data is not publicly available?
Medical records
__________ refers to the degree of correlation among independent variables in a regression model.
Multicollinearity
In a normal distribution, which is greater, the mean or the median?
Neither the mean or the median (they are equal)
A time series plot of a period of time (in years) versus revenue (in millions of dollars) is shown below. Which of the following data patterns best describes the scenario shown?
Nonlinear trend pattern
Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data?
Number of nonoverlapping bins, width of each bin, and bin limits
Which of the following is not present in a time series?
Operational variations
__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.
Overfitting
Two Events are Independent If...
P(B) = P(B|A) Or P(B) = P(B|Ac) Meaning: Knowing that Event A has occurred (or not occurred) doesn't change the probability that event B occurs.
What do nodes in an influence diagram represent?
Parts of the model
__________ analytics use techniques that take input data and yield a best course of action.
Perscriptive
The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour.
Poisson
_________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.
Predictive
A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of
Predictive analytics
Which of the following analytical techniques helps us arrive at the best decision?
Prescriptive analytics
Which of the following regression models is used to model an on linear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model?
Quadratic regression model
Which of the following regression models is used to model anonlinearrelationshipbetweentheindependentanddependentvariablesby includingtheindependentvariableandthesquareoftheindependentvariable in the model?
Quadratic regression model
What are the two decisions that you can make from performing a hypothesis test?
Reject the null hypothesis; Fail to reject the null hypothesis
The __________ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results
SUMPRODUCT
Which of the following graphs cannot be used to display categorical data?
Scatter chart
A time series plot of a period of time (in months) versus sales (in number of units) is shown below. Which of the following data patterns best describes the scenario shown?
Seasonal pattern
The VLOOKUP with range set to __________ takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument.
TRUE
With reference to the SUMPRODUCT function, which of the following statements is true?
The arrays that appear as arguments must be of the same dimension.
__________ merges maps and statistics to present data collected over different geographies.
The geographic information system
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?
The hypotenuse
Which of the following is a discrete random variable?
The number of times a student guesses the answers to questions on a certain test
The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart?
The residual distribution is not normally distributed.
The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn from the scatter chart given below?
The residuals have an increasingvariance as the dependent variable increases.
Which of the following is not a characteristic of the normal probability distribution?
The standard deviation must be 1.
Which of the following is true of spreadsheet packages used in business analytics?
They come preloaded on computers.
Using the diagram below, which of the following would be a likely mathematical expression for Total Cost?
Total Cost = Fixed Cost + Total Variable Cost
Which of the following would be a likely mathematical expression for Total Variable Cost?
Total Variable Cost = (Material Cost per Unit + Labor Cost per Unit) × Production Volume
The ___________ button, located in the Formula Auditing group, creates arrows pointing to the selected cell from cells that are part of the formula in that cell.
Trace Precedents
__________ is the data set used to build the candidate models.
Training set
Larger values of α have the disadvantage of increasing the probability of making a
Type I error.
The __________ function allows the user to pull a subset of data from a larger table of data based on some criterion.
VLOOKUP
__________ refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable.
Validation set
__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.
Ward's method
The user can monitor how listed cells change with a change in the model without searching through the worksheet or changing from one worksheet to another by using the __________ functionality.
Watch Window
A sample of 37 AA batteries had a mean lifetime of 584 hours. A 95% confidence interval for the population mean was 579.2 < μ < 588.8. Which statement is the correct interpretation of the results?
We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours.
The proportion of dental procedures that are extractions is 0.16. Which of the following exemplifies a Type I error in this situation?
We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16.
The proportion of dental procedures that are extractions is0.16. Which of the following exemplifies a Type I error in this situation?
We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16.
In which of the following scenarios would it be appropriate to use hierarchical clustering?
When binary or ordinal data needs to be clustered
A data visualization tool that updates in real time and gives multiple outputs is called
a data dashboard.
The moving averages and exponential smoothing methods are appropriate for a time series exhibiting
a horizontal pattern.
If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to
be an unbiased estimator of the population parameter.
In interval estimation, as the sample size becomes larger, the interval estimate
becomes narrower.
As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution
becomes smaller.
Using an α = 0.04, a confidence interval for a population proportion is determined to be 0.65 to 0.75. If the level of significance is decreased, the interval for the population proportion
becomes wider.
In order to visualize three variables in a two-dimensional graph, we use a
bubble chart.
An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a
clustered column chart.
A PivotChart, in few instances, is the same as a
clustered-column chart.
A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a
column chart.
__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.
complete linkage
The __________ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.
confidence level
Single linkage is a measure of calculating dissimilarity between clusters by
considering only the two most similar observations in the two clusters.
In preparing categorical variables for analysis, it is usually best to
convert the categories to binary, dummy variables.
A collection of text documents to be analyzed is called a ___________.
corpus
The __________ shows the number of data items with values less than or equal to the upper class limit of each class.
cumulative frequency distribution
The data dashboard for a marketing manager may have KPIs related to
current sales measures and sales by region.
Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of
data exploration.
The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies
data queries.
When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses __________ to develop an optimal strategy.
decision analysis
A(n) ____________ refers to a model input that can be controlled in a spreadsheet model.
decision variable
In a linear regression model, the variable that is being predicted or explained is known as _____________. It is denoted by y and is often referred to as the response variable.
dependent variable
In order to manage an organization's human resource activities, such as hiring employees, tracking, and influencing employee retention, HR personnel use
descriptive and predictive analytics.
The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by
determining how well a particular forecasting method is able to reproduce the time series data that are already available.
The variance is based on the
deviation about the mean
A variable that can only take on specific numeric values is called a
discrete random variable.
Jaccard's coefficient is different from the matching coefficient in that the former
does not count matching zero entries while the latter does.
A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.
durability
In the simple linear regression model, the ____________ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables.
error term
Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)
error.
A test set is the data set used to
estimate performance of the final model on unseen data.
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of
estimation of a continuous outcome.
The __________ is the range of values of the independent variables in the data used to estimate the regression model.
experimental region
Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2, . . . , xq that are outside the experimental range is called
extrapolation.
An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)
false positive.
Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable.
filters
A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n)
frequency distribution
The finite correction factor should be used in the computation of the standard deviation of the sample mean and the standard population when n / N is
greater than 0.05.
A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a
heat map.
The conceptual model
helps in organizing the data requirements
A __________ is a graphical summary of data previously summarized in a frequency distribution.
histogram
Bar charts use
horizontal bars to display the magnitude of the quantitative variable.
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size
n has the same probability of being selected
The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is known as
hypothesis testing.
A one-tailed test is a hypothesis test in which the rejection region is
in one tail of the sampling distribution.
Deleting the grid lines in a table and the horizontal lines in a chart
increases the data-ink ratio.
In a linear regression model, the variable (or variables) used for predicting or explaining values of the response variable are known as the __________. It(they) is(are) denoted by x.
independent variable
Forecast error
is associated with measuring forecast accuracy.
takes a positive value when the forecast is too high.
is associated with measuring forecast accuracy.
The letter grades (A, B, C, D, F) of business analysis students are recorded by a professor. This variable's classification
is categorical data.
Data-ink is the ink used in a table or chart that
is necessary to convey the meaning of the data to the audience
The coefficient of determination
is used to evaluate the goodness of fit.
A disadvantage of stacked - column charts and stacked- bar charts is that
it can be difficult to perceive small differences in areas.
In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as
key performance indicators.
The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model is referred to as the
knot.
The best way to differentiate chart elements is using
labels
The best way to differentiate chart elements is using
labels.
The following image is a
line chart
A time series plot is also known as a
line chart.
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
matching coefficient.
If a z-score is zero, then the corresponding x-value must be equal to the
mean
You are __________ to commit a Type I error using the 0.05 level of significance than using the 0.01 level of significance.
more likely
Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.
most different
The process of __________ might be used to determine the value of the smoothing constant that minimizes the mean squared error.
nonlinear optimization
In k -means clustering, k represents the
number of clusters.
In the moving averages method, the order k determines the
number of time series values under consideration.
The set of recorded values of variables associated with a single entity is a(n)
observation.
The data collected from the customers in restaurants about the quality of food is an example of a(n)
observational study.
Euclidean distance can be used to measure the distance between __________ in cluster analysis.
observations
Autoregressive models
occur whenever all the independent variables are previous values of the time series.
The percent of misclassified records out of the total records in the validation data is known as the
overall error rate.
Two approaches to drawing a conclusion in a hypothesis test are
p-value and critical value.
A __________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable.
parallel-coordinates plot
With reference to a spreadsheet model, an uncontrollable model input is known as a(n)
parameter.
The population parameter value and the point estimate differ because a sample is not a census of the entire population, but it is being used to develop the
point estimate.
The purpose of statistical inference is to make estimates or draw conclusions about a
population based upon information obtained from the sample.
A random sample selected from an infinite population is a sample selected such that each element selected comes from the same __________ and each element is selected __________.
population; independently
The scatter chart below displays the residuals versus the dependent variable, t. Which of the following conclusions can be drawn based upon this scatter chart?
residuals are not independent.
Data-driven decision making tends to decrease a firm's
risk
The __________ is a point estimate of the population mean for the variable of interest.
sample mean
The value of the ___________ is used to estimate the value of the population parameter.
sample statistic
A __________ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.
scatter chart
A _____________ is a graphical presentation of the relationship between two quantitative variables.
scatter chart
A useful chart for displaying multiple variables is the
scatter chart matrix.
A time series that shows a recurring pattern over one year or less is said to follow a
seasonal pattern.
With reference to time series data patterns, a cyclical pattern is the component of the time series that
shows a periodic pattern lasting more than one year.
A line chart that has no axes but is used to provide information on overall trends for time series data is called a
sparkline
To avoid problems in interpreting the differences in color in a heat map, ____________ can be added.
sparklines
A method for modifying variables that reduces bias prior to cluster analysis is
standardization.
Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n)
strategic decision
The __________ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in the sample.
sum of squares due to error (SSE)
Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as
supervised learning.
Which of the following relationships would have a negative correlation coefficient?
supply & demand
A __________ refers to the number of times a collection of items occurs together in a transaction data set.
support count
A __________ decision is concerned with how the organization should achieve the goals and objectives set by its strategy.
tactical
The basis for using a normal probability distribution to approximate the sampling distribution of the sample means and population mean is
the central limit theorem.
In the k-nearest neighbors method, when the value of k is set to 1
the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
Sample space is
the collection of all possible outcomes.
All the events in the sample space that are not part of the specified event are called
the complement of the event.
The condition that VLOOKUP assumes is that
the first column of the table is sorted in ascending order.
A procedure for using sample data to find the estimated regression equation is
the least squares method.
An exponential trend pattern occurs when
the percentage change between periods in the value of the variable is relatively constant.
The arguments supplied to the IF function, in order, are the condition for execution,
the result if condition is true, and the result if condition is false.
Tables should be used instead of charts when
the values being displayed have different units or very different magnitudes.
If covariance between two variables is near 0, it implies that
the variables are not linearly related.
In the graph of the simple linear regression equation, the parameter ß0 represents the ___________ of the true regression line.
y-intercept
A __________ determines how far a particular value is from the mean relative to the data set's standard deviation.
z-score