QBA Ch. 6 - 10, Quantitative Business Analysis

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.

Supervised learning

The ___________ button in the Formula Auditing group allows the user to inspect each formula in detail in its cell location.

Show Formulas

__________ are used in the pharmaceutical industry to assess the risk of introducing a new drug.

Simulations

Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use?

Stacked-column chart

A __________ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.

Strategic

Spreadsheet models are referred to as what-if models because they

allow easy instantaneous recalculation for a change in model inputs.

The influence in an influence diagram is visually depicted by

an arrow

For a population with an unknown distribution, the form of the sampling distribution of the sample mean is

approximately normal for large sample sizes.

Separate error rates with respect to the false negative and false positive cases are computed to take into account the

asymmetric costs in misclassification.

Advanced analytics generally refers to

predictive and prescriptive analytics.

In the financial sector, __________ are used to construct financial instruments such as derivatives.

predictive models

A __________ describes the range and relative likelihood of all possible values for a random variable.

probability distribution for a random variable

The act of collecting data that are representative of the population data is called

random sampling.

The simplest measure of variability is the

range

The y-axis of a decile chart shows

ratio of decile mean to overall mean.

Causal models

relate a time series to other variables that are believed to explain or cause its behavior.

If a time series plot exhibits a horizontal pattern, then

there is still not enough evidence to conclude that the time series is stationary.

Using multiple lines on a line chart or employing multiple charts is an alternative to a

three-dimensional chart.

All of the following are examples of discrete random variables except

time

A set of observations on a variable measured at successive points in time or over successive periods of time constitute a

time series.

A light bulb manufacturer uses descriptive analytics

to present supply chain to managers visually.

Utility theory is the study of the __________ or relative desirability of a particular outcome that reflects the decision maker's attitude toward a collection of factors, such as profit, loss, and risk.

total worth

Data used to build a data mining model is called

training data.

A __________ is useful for visualizing hierarchical data along multiple dimensions.

tree map

The impact of two inputs on the output of interest is summarized by a

two-way data table.

A parameter is a numerical measure from a population, such as

u

The random numbers generated using Excel's RAND function follows a __________ probability distribution between 0 and 1.

uniform

The event containing the outcomes belonging to A or B or both is the __________ of A and B.

union

The moving averages method refers to a forecasting method that

uses the average of the most recent data values in the time series as the forecast for the next period.

__________ assigns values to outcomes based on the decision maker's attitude toward risk, loss, and other factors.

utility theory

The difference in a variable measured over observations (time, customers, items, etc.) is known as

variation

The goal regarding using an appropriate number of bins is to show the

variation in the data.

Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)

-1.33

The distribution of hourly sales for a local family owned store is normally distributed with a mean of $225 per hour and a standard deviation of $75 per hour. Which of the following intervals contains the middle 95% of hourly sales?

$75 to $375

In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump for President. Determine a 95% confidence interval for the proportion of all the registered voters who will vote for Trump.

(0.25, 0.34)

One minus the overall error rate is often referred to as the __________ of the model.

accuracy

Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. (Round to two decimal places if necessary.)

-2

he CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the standard error of the proportion proportion ?

0.039

A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce the parts will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: Ho: μ ≥ 45 hours and Ha: μ < 45 hours. Determine the p value of the test statistic if the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours.

0.04999

The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?

0.35

Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability density function for the time it takes to fill an order?

0.4866

Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive-thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes less than one minute to fill an order?

0.4866

The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 100 miles per gallon or better?

0.6%

What is the total area under the normal distribution curve?

1

What is the total relative frequency? ​ 20XX Contest Sales SalesmanFrequencyRelative Frequency Frances Clonts150.05 Sarah Leigh1840.62 Devon Pride37 John Townes 620.21 Total298

1

A survey of 100 random high school students finds that 85 students watched the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either game?

10

How many Class 1's are incorrectly classified as Class 0? Confusion Matrix Predicted ClassActual Class 10 1 2211000 303,000

100

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?

100

What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 25.32 and the sum of squares due to error (SSE) is 6.89?

18.43

Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean absolute error. ​ Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16

2

If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2?

2.5

The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700.

2.5%

The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores followa bell-shapeddistribution, usetheempiricalrule tofindthepercentageofstudentswhoscoredlessthan494.

2.5%

The Watch Window is observable

across different worksheets of a workbook.

A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. What is the standard error of the mean?

2.876

What is the mean of x, given the exponential probability function

20

Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

21.25

How many Class 1's are correctly classified as Class 1 in the Table below? Confusion Matrix Predicted ClassActual Class101 2211000 303,000 ​

221

Never use a __________ chart when a __________ chart will suffice.

3-D; 2-D

The t value for a 99% confidence interval estimation based upon a sample of size 10 is

3.169

The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus?

30%

Suppose for a particular week, the forecasted sales were $4,000. The actual sales were $3,000. What is the value of the mean absolute percentage error?

33.3%

Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean squared error. ​ Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16

4.67

Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57

40.6

Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37

6.75

For data having a bell-shaped distribution, approximately __________ percent of the data values will be within one standard deviation of the mean.

68

The random variable X is known to be uniformly distributed between 2 and 12. Compute E(X), the expected value of the distribution.

7

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. The 95% confidence interval for the true mean number of pushups that can be done is

8.56 to 21.40.

A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. One day he took 15,000 steps. What was his percentile on that day?

99.7%

In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. Who makes up the population?

All patients in a local hospital

The population parameters that describe the y-intercept and slope of the line relating y and x, respectively, are

B 0 and B1.

A better understanding of consumer behavior through analytics directly leads to

Better pricing strategies

Within a given range of cells, the number of times a particular condition is satisfied is computed by using the __________ function.

COUNTIF

Which is not true regarding trend patterns?

Can result when business conditions shift to a new level at some point in time

Which of the following best exemplifies big data?

Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.

__________ are visual methods of displaying data.

Charts

Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use?

Clustered-column (bar) chart

__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.

Cumulative lift

A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction.

Data Mining

An Excel __________ quantifies the impact of changing the value of a specific input on an output of interest.

Data Table

Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.

Data dashboards

__________ involves descriptive statistics, data visualization, and clustering.

Data exploration

__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.

Data partitioning

__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.

Data preparation

The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies

Data queries

A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce them will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: H0: μ ≥ 45 hours and Ha: μ < 45 hours. If the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours, give the appropriate conclusion, for α = 0.025.

Do not reject H0, do not switch to the new procedure

The calculations of a cell can be investigated in great detail by using the __________ button.

Evaluate Formula

To generate a scatter chart matrix, we use

Excel Add-In XLMiner.

The software package most commonly used for creating simple charts is

Excel.

__________ uses a weighted average of past time series values as the forecast.

Exponential smoothing

Excel searches for an exact match of the first argument in the first column of the data when the range in the VLOOKUP function is

FALSE

Which of the following is not an approach to making decisions?

Guess and check

A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses?

H 0: p = 0.5, Ha: p≠ 0.5

A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses?

H 0: p = 0.5, Ha: p≠ 0.5.

The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses?

H 0: u ≥ 12, Ha: u < 12

The __________ function is used for the conditional computation of expressions in Excel.

IF

__________ is a measure of the heterogeneity of observations in a classification tree.

Impurity

__________ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.

Interaction

__________ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed.

Internet of Things (IoT)

Which of the following is true of the exponential smoothing coefficient?

It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error.

Which of the following is true of Euclidean distances?

It is commonly used as a method of measuring dissimilarity between quantitative observations .

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio?

It reduces the data-ink ratio.

Which one of the following is used in predictive analytics?

Linear regression

A time series plot of a period of time (in years) versus sales (in thousands of dollars) is shown below. Which of the followingdata patterns best describes the scenario shown?

Linear trend pattern

_________ attempts to classify a categorical outcome as a linear function of explanatory variables.

Logistic regression

__________ is a generalization of linear regression for predicting a categorical outcome variable.

Logistic regression

Which Excel command will return all modes when more than one mode exists?

MODE.MULT

Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another?

Mean forecast error

Which of the following sources of big data is not publicly available?

Medical records

__________ refers to the degree of correlation among independent variables in a regression model.

Multicollinearity

In a normal distribution, which is greater, the mean or the median?

Neither the mean or the median (they are equal)

A time series plot of a period of time (in years) versus revenue (in millions of dollars) is shown below. Which of the following data patterns best describes the scenario shown?

Nonlinear trend pattern

Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data?

Number of nonoverlapping bins, width of each bin, and bin limits

Which of the following is not present in a time series?

Operational variations

__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

Overfitting

Two Events are Independent If...

P(B) = P(B|A) Or P(B) = P(B|Ac) Meaning: Knowing that Event A has occurred (or not occurred) doesn't change the probability that event B occurs.

What do nodes in an influence diagram represent?

Parts of the model

__________ analytics use techniques that take input data and yield a best course of action.

Perscriptive

The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour.

Poisson

_________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.

Predictive

A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of

Predictive analytics

Which of the following analytical techniques helps us arrive at the best decision?

Prescriptive analytics

Which of the following regression models is used to model an on linear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model?

Quadratic regression model

Which of the following regression models is used to model anonlinearrelationshipbetweentheindependentanddependentvariablesby includingtheindependentvariableandthesquareoftheindependentvariable in the model?

Quadratic regression model

What are the two decisions that you can make from performing a hypothesis test?

Reject the null hypothesis; Fail to reject the null hypothesis

The __________ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results

SUMPRODUCT

Which of the following graphs cannot be used to display categorical data?

Scatter chart

A time series plot of a period of time (in months) versus sales (in number of units) is shown below. Which of the following data patterns best describes the scenario shown?

Seasonal pattern

The VLOOKUP with range set to __________ takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument.

TRUE

With reference to the SUMPRODUCT function, which of the following statements is true?

The arrays that appear as arguments must be of the same dimension.

__________ merges maps and statistics to present data collected over different geographies.

The geographic information system

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

The hypotenuse

Which of the following is a discrete random variable?

The number of times a student guesses the answers to questions on a certain test

The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart?

The residual distribution is not normally distributed.

The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn from the scatter chart given below?

The residuals have an increasingvariance as the dependent variable increases.

Which of the following is not a characteristic of the normal probability distribution?

The standard deviation must be 1.

Which of the following is true of spreadsheet packages used in business analytics?

They come preloaded on computers.

Using the diagram below, which of the following would be a likely mathematical expression for Total Cost?

Total Cost = Fixed Cost + Total Variable Cost

Which of the following would be a likely mathematical expression for Total Variable Cost?

Total Variable Cost = (Material Cost per Unit + Labor Cost per Unit) × Production Volume

The ___________ button, located in the Formula Auditing group, creates arrows pointing to the selected cell from cells that are part of the formula in that cell.

Trace Precedents

__________ is the data set used to build the candidate models.

Training set

Larger values of α have the disadvantage of increasing the probability of making a

Type I error.

The __________ function allows the user to pull a subset of data from a larger table of data based on some criterion.

VLOOKUP

__________ refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable.

Validation set

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

Ward's method

The user can monitor how listed cells change with a change in the model without searching through the worksheet or changing from one worksheet to another by using the __________ functionality.

Watch Window

A sample of 37 AA batteries had a mean lifetime of 584 hours. A 95% confidence interval for the population mean was 579.2 < μ < 588.8. Which statement is the correct interpretation of the results?

We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours.

The proportion of dental procedures that are extractions is 0.16. Which of the following exemplifies a Type I error in this situation?

We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16.

The proportion of dental procedures that are extractions is0.16. Which of the following exemplifies a Type I error in this situation?

We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16.

In which of the following scenarios would it be appropriate to use hierarchical clustering?

When binary or ordinal data needs to be clustered

A data visualization tool that updates in real time and gives multiple outputs is called

a data dashboard.

The moving averages and exponential smoothing methods are appropriate for a time series exhibiting

a horizontal pattern.

If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to

be an unbiased estimator of the population parameter.

In interval estimation, as the sample size becomes larger, the interval estimate

becomes narrower.

As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution

becomes smaller.

Using an α = 0.04, a confidence interval for a population proportion is determined to be 0.65 to 0.75. If the level of significance is decreased, the interval for the population proportion

becomes wider.

In order to visualize three variables in a two-dimensional graph, we use a

bubble chart.

An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a

clustered column chart.

A PivotChart, in few instances, is the same as a

clustered-column chart.

A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a

column chart.

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

complete linkage

The __________ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.

confidence level

Single linkage is a measure of calculating dissimilarity between clusters by

considering only the two most similar observations in the two clusters.

In preparing categorical variables for analysis, it is usually best to

convert the categories to binary, dummy variables.

A collection of text documents to be analyzed is called a ___________.

corpus

The __________ shows the number of data items with values less than or equal to the upper class limit of each class.

cumulative frequency distribution

The data dashboard for a marketing manager may have KPIs related to

current sales measures and sales by region.

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

data exploration.

The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies

data queries.

When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses __________ to develop an optimal strategy.

decision analysis

A(n) ____________ refers to a model input that can be controlled in a spreadsheet model.

decision variable

In a linear regression model, the variable that is being predicted or explained is known as _____________. It is denoted by y and is often referred to as the response variable.

dependent variable

In order to manage an organization's human resource activities, such as hiring employees, tracking, and influencing employee retention, HR personnel use

descriptive and predictive analytics.

The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by

determining how well a particular forecasting method is able to reproduce the time series data that are already available.

The variance is based on the

deviation about the mean

A variable that can only take on specific numeric values is called a

discrete random variable.

Jaccard's coefficient is different from the matching coefficient in that the former

does not count matching zero entries while the latter does.

A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

durability

In the simple linear regression model, the ____________ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables.

error term

Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)

error.

A test set is the data set used to

estimate performance of the final model on unseen data.

Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of

estimation of a continuous outcome.

The __________ is the range of values of the independent variables in the data used to estimate the regression model.

experimental region

Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2, . . . , xq that are outside the experimental range is called

extrapolation.

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

false positive.

Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable.

filters

A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n)

frequency distribution

The finite correction factor should be used in the computation of the standard deviation of the sample mean and the standard population when n / N is

greater than 0.05.

A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a

heat map.

The conceptual model

helps in organizing the data requirements

A __________ is a graphical summary of data previously summarized in a frequency distribution.

histogram

Bar charts use

horizontal bars to display the magnitude of the quantitative variable.

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size

n has the same probability of being selected

The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is known as

hypothesis testing.

A one-tailed test is a hypothesis test in which the rejection region is

in one tail of the sampling distribution.

Deleting the grid lines in a table and the horizontal lines in a chart

increases the data-ink ratio.

In a linear regression model, the variable (or variables) used for predicting or explaining values of the response variable are known as the __________. It(they) is(are) denoted by x.

independent variable

Forecast error

is associated with measuring forecast accuracy.

takes a positive value when the forecast is too high.

is associated with measuring forecast accuracy.

The letter grades (A, B, C, D, F) of business analysis students are recorded by a professor. This variable's classification

is categorical data.

Data-ink is the ink used in a table or chart that

is necessary to convey the meaning of the data to the audience

The coefficient of determination

is used to evaluate the goodness of fit.

A disadvantage of stacked - column charts and stacked- bar charts is that

it can be difficult to perceive small differences in areas.

In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as

key performance indicators.

The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model is referred to as the

knot.

The best way to differentiate chart elements is using

labels

The best way to differentiate chart elements is using

labels.

The following image is a

line chart

A time series plot is also known as a

line chart.

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

matching coefficient.

If a z-score is zero, then the corresponding x-value must be equal to the

mean

You are __________ to commit a Type I error using the 0.05 level of significance than using the 0.01 level of significance.

more likely

Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.

most different

The process of __________ might be used to determine the value of the smoothing constant that minimizes the mean squared error.

nonlinear optimization

In k -means clustering, k represents the

number of clusters.

In the moving averages method, the order k determines the

number of time series values under consideration.

The set of recorded values of variables associated with a single entity is a(n)

observation.

The data collected from the customers in restaurants about the quality of food is an example of a(n)

observational study.

Euclidean distance can be used to measure the distance between __________ in cluster analysis.

observations

Autoregressive models

occur whenever all the independent variables are previous values of the time series.

The percent of misclassified records out of the total records in the validation data is known as the

overall error rate.

Two approaches to drawing a conclusion in a hypothesis test are

p-value and critical value.

A __________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable.

parallel-coordinates plot

With reference to a spreadsheet model, an uncontrollable model input is known as a(n)

parameter.

The population parameter value and the point estimate differ because a sample is not a census of the entire population, but it is being used to develop the

point estimate.

The purpose of statistical inference is to make estimates or draw conclusions about a

population based upon information obtained from the sample.

A random sample selected from an infinite population is a sample selected such that each element selected comes from the same __________ and each element is selected __________.

population; independently

The scatter chart below displays the residuals versus the dependent variable, t. Which of the following conclusions can be drawn based upon this scatter chart?

residuals are not independent.

Data-driven decision making tends to decrease a firm's

risk

The __________ is a point estimate of the population mean for the variable of interest.

sample mean

The value of the ___________ is used to estimate the value of the population parameter.

sample statistic

A __________ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.

scatter chart

A _____________ is a graphical presentation of the relationship between two quantitative variables.

scatter chart

A useful chart for displaying multiple variables is the

scatter chart matrix.

A time series that shows a recurring pattern over one year or less is said to follow a

seasonal pattern.

With reference to time series data patterns, a cyclical pattern is the component of the time series that

shows a periodic pattern lasting more than one year.

A line chart that has no axes but is used to provide information on overall trends for time series data is called a

sparkline

To avoid problems in interpreting the differences in color in a heat map, ____________ can be added.

sparklines

A method for modifying variables that reduces bias prior to cluster analysis is

standardization.

Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n)

strategic decision

The __________ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in the sample.

sum of squares due to error (SSE)

Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as

supervised learning.

Which of the following relationships would have a negative correlation coefficient?

supply & demand

A __________ refers to the number of times a collection of items occurs together in a transaction data set.

support count

A __________ decision is concerned with how the organization should achieve the goals and objectives set by its strategy.

tactical

The basis for using a normal probability distribution to approximate the sampling distribution of the sample means and population mean is

the central limit theorem.

In the k-nearest neighbors method, when the value of k is set to 1

the classification or prediction of a new observation is based solely on the single most similar observation from the training set.

Sample space is

the collection of all possible outcomes.

All the events in the sample space that are not part of the specified event are called

the complement of the event.

The condition that VLOOKUP assumes is that

the first column of the table is sorted in ascending order.

A procedure for using sample data to find the estimated regression equation is

the least squares method.

An exponential trend pattern occurs when

the percentage change between periods in the value of the variable is relatively constant.

The arguments supplied to the IF function, in order, are the condition for execution,

the result if condition is true, and the result if condition is false.

Tables should be used instead of charts when

the values being displayed have different units or very different magnitudes.

If covariance between two variables is near 0, it implies that

the variables are not linearly related.

In the graph of the simple linear regression equation, the parameter ß0 represents the ___________ of the true regression line.

y-intercept

A __________ determines how far a particular value is from the mean relative to the data set's standard deviation.

z-score


Kaugnay na mga set ng pag-aaral

SOC 1010 Chapter 5: Separate and Together: Life in Groups

View Set

Information Security Midterm Review

View Set

5.1.3 Practice Questions Storage Devices

View Set

Autonomic nervous system: anatomy

View Set

Ch 12 Supply Chain Management in the Service Industry

View Set