BUSI 3304: Midterm Review

Ace your homework & exams now with Quizwiz!

Identify the shape of the distribution in the figure below. medium-high-medium-lower-lower-lower-low

Skewed right

Fields may be chosen to represent all of the following except _____ in the body of a PivotTable. a.columns b.values c.filters d.rows

c. filters

The amount of time taken by each of 10 students in a class to complete an exam is an example of what type of data?

quantitative

Data collected from several entities over a period of time (minutes, hours, days, etc.) are called _____.

time-series data

A quantity of interest that can take on different values is known as a(n) _____.

variable

The difference in a variable measured over observations (time, customers, items, etc.) is known as _____.

variation

Which of the following is not an approach to making decisions?

Guess and check

Data dashboards are a type of _____analytics.

Descriptive

_____ is the most critical step of the decision-making process.

Identifying and defining the problem

A Type I error is committed when _____. a.a true null hypothesis is rejected b.the validity of a claim was rejected c.a true alternative hypothesis is not accepted d.the critical value is greater than the value of the test statistic

a true null hypothesis is rejected

_____ refers to the number of times a collection of items occurs together in a transaction data set. a.Support count b.Validation count c.A consequent d.Antecedent

a.Support count

A useful chart for displaying multiple variables is the _____. a.scatter chart matrix b.scatter chart c.two-dimensional graph d.stacked column and bar chart

a.scatter chart matrix

The process of extracting useful information from text data is known as _____. a.text mining b.corpus c.stemming d.tokenization

a.text mining

The basis for using a normal probability distribution to approximate the sampling distribution of the sample means and population mean is _____. a.the central limit theorem b.the empirical rule c.Bayes' theorem d.Chebyshev's theorem

a.the central limit theorem

A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables. a.Gantt chart b.scatter chart c.contingency table d.pie chart

b. scatter chart

_____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. a.Average group linkage b.Average linkage c.Complete linkage d.Single linkage

c.Complete linkage

A regression analysis involving one independent variable and one dependent variable is referred to as a _____. a.factor analysis b.data mining c.simple linear regression d.time series analysis

c.simple linear regression

A procedure for using sample data to find the estimated regression equation is _____. a.point estimation b.extrapolation c.the least squares method d.interval estimation

c.the least squares method

The _____ shows the number of data items with values less than or equal to the upper-class limit of each class.

cumulative frequency distribution

A null and alternative hypothesis for a one proportion z test are given as H0: p = 0.8, Ha: p < 0.8. This hypothesis test is _____. a.upper-tailed b.incorrectly stated c.two-tailed d.lower-tailed

d.lower-tailed The given hypothesis test is left-tailed/lower-tailed becuase the claim is less than .8. If the claim was greater than .8, then it would be right-tailed/upper-tailed.

In k -means clustering, k represents the _____. a.number of variables b.mean of the cluster c.number of observations in a cluster d.number of clusters

d.number of clusters

Making visual comparisons between categorical variables may be difficult in a _____. a.line chart b.column chart c.scatter chart d.pie chart

d.pie chart

Tactical decisions are concerned with _____.

how the organization should achieve the goals and objectives set by its strategy

Data sets commonly include observations with missing values for one or more variables. In some cases missing data naturally occur; these are called _____.

legitimately missing data

If the covariance between two variables is near 0, it implies that ______.

the variables are not linearly related

Simulation optimization helps _____.

to find good decisions in highly complex and highly uncertain settings.

Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)

-1.33 =STANDARDIZE(x,mean,standarddeviation) =STANDARDIZE(52,64,9)

Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 21-24 bin?

.25

The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day? a.0.25 b.0.35 c.0.53 d.0.65

.35 53/150=.35

Compute the geometric mean for the following data on growth factors of investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22 (Please input data to Excel and operate the formula to get the answer.)

1.0148 =GEOMEAN()

Compute the 50th percentile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

15.5 =PERCENTILE.EXC(array, .5)

Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 (Please operate Excel function, "Descriptive Statistics", to obtain the required measures.)

21.36% =STDEV((B23:B32)/AVERAGE(B23:B32))

In order to determine an interval for the mean of a population with unknown standard deviation, a sample of 24 items is selected. The mean of the sample is determined to be 23. The number of degrees of freedom for reading the t value is _____. a.24 b.21 c.22 d.23

23 degree of freedom is the number of independent values or quantities which can be assigned to a statistical distribution. So for t df is n-1=23 24-1=23

Compute the median of the following data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37

31 =MEDIAN()

In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. Who makes up the sample?

All patients in a local hospital INCORRECT

Which of the following best exemplifies big data?

Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.

_____ is the process of removing variables from the analysis without losing crucial information.

Dimension reduction

Which of the following sources of big data is not publicly available?

Medical records

_____ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.

Predictive

_____ analytics use techniques that take input data and yield a best course of action.

Prescriptive

A _____ decision is concerned with how the organization should achieve the goals and objectives set by its strategy.

Tactical

You have been asked to reorganize the Excel table below into order of sales using the Sales column. Which option will allow you to do this quickly?

Use the Sort function to organize the data into order of sales.

Using an α = 0.04, a confidence interval for a population proportion is determined to be 0.65 to 0.75. If the level of significance is decreased, the interval for the population proportion _____. a.does not change b.remains the same c.becomes wider d.becomes narrower

c.becomes wider

Prediction of the value of the dependent variable outside the experimental region is called _____. a.interpolation b.averaging c.extrapolation d.forecasting

c.extrapolation

The process of dividing text into separate terms is referred to as _____. a.stemming b.stacking c.tokenization d.data cleaning

c.tokenization

In the text mining process, the text is first preprocessed by deriving a smaller set of _____ from the larger set of words contained in a collection of documents. a.stems b.stack c.tokens d.terms

c.tokens

The letter grades (A, B, C, D, F) of business analysis students are recorded by a professor. This variable's classification _____.

categorical data

When working with data sets in Excel, _____ can be used to automatically highlight cells that meet specified requirements.

conditional formatting

_____ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a.Data sampling b.Dimension reduction c.Data mining d.Unsupervised learning

d.Unsupervised learning

______ refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable. a.Training set b.Codomain c.Range d.Validation set

d.Validation set

To identify patterns across transactions, we can use _____. a.k-means b.centroid linkage c.complete linkage d.association rules

d.association rules

The variance is based on the

deviation about the mean

The coefficient of determination _____. a.takes values between -1 to +1 b.is equal to negative one for the poorest fit c.is equal to zero for a perfect fit d.is used to evaluate the goodness of fit

is used to evaluate the goodness of fit

The following image is a _____. ​​ a.sparkline b.gridline c.trendline d.line chart

line chart

Which one of the following is used in predictive analytics?

linear regression

You are _____ to commit a Type I error using the 0.05 level of significance than using the 0.01 level of significance. a.twice as likely b.equally likely c.more likely d.less likely

more likely

Any data value with a z-score less than -3 or greater than +3 is considered to be a(n) _____.

outlier

Two approaches to drawing a conclusion in a hypothesis test are _____. a.null and alternative b.one-tailed and two-tailed c.p-value and critical value d.Type I and Type II

p-value and critical value p-value: if the obtained p-value is less than the defined significance level (a), that is, p-value <a, then the null hypothesis is rejected, otherwise to not reject the null hypothesis. critical value: the decision rule is o reject the null hypothesis if the calculated test satistic value is greater than the critical (tabulated) vlue, that is, calculated>tabulated, otherwise, to not reject the null hypothesis.

A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of _____.

predictive analysis

In the financial sector, _____ are used to construct financial instruments such as derivatives.

predictive modeld

Which of the following analytical techniques helps us arrive at the best decision?

prescriptive

A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future.

strategic

Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n) _____.

strategic decision

A sample of 37 AA batteries had a mean lifetime of 584 hours. A 95% confidence interval for the population mean was 579.2 < μ < 588.8. Which statement is the correct interpretation of the results? a.The probability that the population mean is between 579.2 hours and 588.8 hours is 0.95. b.None of these statements correctly interpret the results. c.We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours. d.95% of the light bulbs in the sample had lifetimes between 579.2 hours and 588.8 hours.

We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours.

Which of the following graphs cannot be used to display categorical data? a.Stacked-column chart b.Scatter chart c.Pie chart d.Clustered-column chart

b.Scatter chart

The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn from the scatter chart given below? panel(a)​ a.The residual distribution is consistently scattered about zero. b.The residuals have an increasing variance as the dependent variable increases. c.The model captures the relationship between the variables accurately. d.The regression model follows the standard normal probability distribution.

b.The residuals have an increasing variance as the dependent variable increases.

A data visualization tool that updates in real time and gives multiple outputs is called _____. a.a data table b.a data dashboard c.the GIS d.a metrics table

b.a data dashboard

A variable used to model the effect of categorical independent variables in a regression model which generally takes only the value zero or one is called _____. a.a residual b.a dummy variable c.interaction d.the coefficient of determination

b.a dummy variable

In interval estimation, as the sample size becomes larger, the interval estimate _____. a.becomes wider b.becomes narrower c.gets closer to 1.96 d.remains the same, since the mean is not changing

b.becomes narrower

As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution _____. a.fluctuates b.becomes smaller c.becomes larger d.stays the same

b.becomes smaller

_____ refers to the use of sample data to calculate a range of values that is believed to include the value of the population parameter. a.Hypothesis testing b.Point estimation c.Interval estimation d.Statistical inference

c.Interval estimation

_____ is used to test the hypothesis that the values of the regression parameters ß 1, ß 2, ... ß q are all zero. a.A t test b.Extrapolation c.The least squares method d.An F test

d.An F test

Which one of the following statements is not true concerning PivotTables in Excel? a.PivotTables are interactive. b.PivotTables are also known as crosstabulation tables. c.PivotTables summarize data for two variables. d.PivotTables can be built using data arrayed in rows.

d.PivotTables can be built using data arrayed in rows.

Which of the following regression models is used to model a nonlinear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model? a.Simple regression model b.Least squares regression model c.Multiple regression model d.Quadratic regression model

d.Quadratic regression model

In order to manage an organization's human resource activities, such as hiring employees, tracking, and influencing employee retention, HR personnel use _____.

descriptive and predictive analysis

A simple random sample of 31 observations was taken from a large population. The sample mean equals 5. Five is a _____. a.point estimate b.population parameter c.population mean d.standard error

point estimate

Advanced analytics generally refers to _____.

predictive and prescriptive analytics

The act of collecting data that are representative of the population data is called _____.

random sampling

The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700.

2.5% The empirical rule for normal distribution random variable states that:: 68% of the measures are within 1 Standard Deviation of the mean (It includes 34% both the sides) 95% of the measures are within 2 Standard deviation of the mean (It includes extra 13.5% both the sides) 99.7% of the measures are within 3 Standard deviation of the mean (It includes extra 2.35% both the sides) Beyond that (for 4 standard deviation) it's 0.15% both the sides So 700 is 2 standard deviation above the mean Hence, it will be 2.35% + 0.15% Therefore 2.5% students have scored greater than 700

Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22

21.25 =PERCENTILE.EXC(array,.75)

A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction.

Data Mining

_____ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a.Centroid linkage b.Average linkage c.Single linkage d.Complete linkage

a.Centroid linkage

Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use? ​ a.Clustered-column (bar) chart b.Scatter chart c.Line chart d.Bubble chart

a.Clustered-column (bar) chart

A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce them will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: H0: μ ≥ 45 hours and Ha: μ < 45 hours. If the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours, give the appropriate conclusion, for α = 0.025. ​ a.Do not reject H0, do not switch to the new procedure. b.Reject H0, do not switch to the new procedure. c.Reject H0, switch to the new procedure. d.Do not reject H0, switch to the new procedure.

a.Do not reject H0, do not switch to the new procedure. Population Mean (U) =45 Sample X (Mean) =43.118 Standard Deviation (S.D) = 5.5 Number (n) =25 Test Statistic (t) = x-U/(s.d/Sqrt(n)) t = 43.118-45/(5.5/Sqrt(25)) =-1.711| to | =1.711 Critical Value: The Value of |t α| with n-1 = 24 d.f is 2.064 We got |t| = 1.711 | t α | = 2.064 Hence Value of |to | < | t α | and Here we Do not Reject HoP-Value :Left Tail -Ha : ( P < -1.7109 ) = 0.05 Hence Value of P0.025 < 0.05, Here We Do not Reject Ho c. Do not reject Ho, do not switch to the new procedure.

The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses? a.H0: u = 12, Ha: u ≠ 12 b.H0: u ≥ 12, Ha: u < 12 c.H0: u > 12, Ha: u < 12 d.H0: u ≤ 12, Ha: u > 12

a.H0: u = 12, Ha: u ≠ 12

In preparing categorical variables for analysis, it is usually best to _____. a.convert the categories to binary, dummy variables b.combine as many categories as possible c.let them remain categorical d.convert the categories to numeric representations

a.convert the categories to binary, dummy variables

Assessing the regression model on data other than the sample data that was used to generate the model is known as _____. a.cross-validation b.graphical validation c.postulation d.approximation

a.cross-validation

In a linear regression model, the variable that is being predicted or explained is known as _____. It is denoted by y and is often referred to as the response variable. a.dependent variable b.residual variable c.independent variable d.regression variable

a.dependent variable

Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center: This chart allows the IT manager to _____. a.identify the frequency of a particular type of problem by location b.identify the percent of customers who do not have one of the listed problems c.identify which city contains the most customers d.identify how often a problem is related to hardware

a.identify the frequency of a particular type of problem by location

Deleting the grid lines in a table and the horizontal lines in a chart ______. a.increases the data-ink ratio b.does not affect the data-ink ratio c.increases the non-data-ink ratio d.decreases the data-ink ratio

a.increases the data-ink ratio

Data-ink is the ink used in a table or chart that _____. a.is necessary to convey the meaning of the data to the audience b.does not help in conveying the data to the audience c.helps in presenting data when the audience need not know exact values d.increases the non-data-ink ratio

a.is necessary to convey the meaning of the data to the audience

In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as _____. a.key performance indicators b.business performance indicators c.company performance indicators d.performance indicators

a.key performance indicators

Regression analysis involving one dependent variable and more than one independent variable is known as ____. a.multiple regression b.linear regression c.None of these are correct. d.simple regression

a.multiple regression aka Multiple linear regression

A random sample selected from an infinite population is a sample selected such that each element selected comes from the same _____ and each element is selected _____. a.population; independently b.sample; simultaneously c.sample; independently d.population; simultaneously

a.population; independently

One reason a sample may fail to represent the population of interest is _____. a.sampling error b.statistical inference c.measurement error d.population proportion

a.sampling error

When the expected value of the point estimator is equal to the population parameter it estimates, it is said to be _____. a.unbiased b.predicted c.symmetric d.precise

a.unbiased

Which statement is NOT true? a.The probability of making a Type I error is symbolized by α. b.Failing to reject the null hypothesis when it is false is a Type I error. c.Rejecting the null hypothesis when it is true is a Type I error. d.Type II error can occur for both one and two-tailed tests.

b.Failing to reject the null hypothesis when it is false is a Type I error.

A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses? a.H0: p ≥ 0.5, Ha: p < 0.5. b.H0: p = 0.5, Ha: p ≠ 0.5. c.H0: p ≥ 28, Ha: p < 28. d.H0: p ≤ 0.5, Ha: p ≠ 0.5.

b.H0: p = 0.5, Ha: p ≠ 0.5. This is the two tailed test . The null and alternative hypothesis is H0 : p = 0.5 Ha : p /=/ 0.5 (does not equal)

A pizza shop advertises that they deliver in 30 minutes or less or it is free. People who live in homes that are located on the opposite side of town believe it will take the pizza shop longer than 30 minutes to make and deliver the pizza. Write the null and alternative hypotheses that can be used to conduct a significance test. a.H0: u > 30, Ha: u < 30 b.H0: u ≤ 30, Ha: u > 30 c.H0: u ≥ 30, Ha: u < 30 d.H0: u < 30, Ha: u > 30

b.H0: u ≤ 30, Ha: u > 30 We are given that the pizza shop advertises that they deliver in 30 minutes or less: m <=30 People want to test if it takes more than 30 minutes for them to deliver the pizza,i.e., m > 30. Thus, to conduct a significance test, we take m<= 30 as the null hypothesis and m > 30 as the alternative hypothesis.

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio? ​ a.It doesn't change the data-ink ratio. b.It reduces the data-ink ratio. c.The data-ink ratio becomes zero. d.It increases the data-ink ratio.

b.It reduces the data-ink ratio.

To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs _____. a.heat maps with trendlines b.PivotCharts with PivotTables c.bubble charts with trendlines d.stacked column charts with PivotTables

b.PivotCharts with PivotTables

Average linkage is a measure of calculating dissimilarity between two clusters by _____. a.finding the distance between the two most dissimilar observations in the two clusters b.computing the average distance between every pair of observations between two clusters c.computing the distance between the cluster centroids d.finding the distance between the two closest observations in the two clusters

b.computing the average distance between every pair of observations between two clusters

A collection of text documents to be analyzed is called a _____. a.book b.corpus c.consequent d.library

b.corpus

Jaccard's coefficient is different from the matching coefficient in that the former _____. a.deals with categorical variable while the latter deals with continuous variables b.does not count matching zero entries while the latter does c.is affected by the scale used to measure variables while the latter is not d.measures overlap while the latter measures dissimilarity

b.does not count matching zero entries while the latter does

The _____ the lift ratio, the _____ the association rule. a.higher; weaker b.higher; stronger c.lower; weaker d.lower; stronger

b.higher; stronger

An estimate of a population parameter that provides an interval of values believed to contain the value of the parameter is known as the _____. a.confidence level b.interval estimate c.population estimate d.parameter level

b.interval estimate Point estimate = sample mean =x(bar)= 5 population estimate: the calculation of population size parameter level: refers to the characteristics of the population which is opposite to the statistic confidence level: % of samples possible which can be included to the parameter.

The best way to differentiate chart elements is by using _____. a.colors b.labels c.chart titles d.bubbles

b.labels

A time series plot is also known as a _____. a.boxplot b.line chart c.frequency graph d.dot plot

b.line chart

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the _____. a.error term b.residual c.model parameter d.constant term

b.residual

A _____ is a line that provides an approximation of the relationship between the variables. a.line chart b.trendline c.sparkline d.gridline

b.trendline

The processes that generate big data can be described by the following four attributes or dimensions: a.tall data, wide data, narrow data, and big data b.volume, variety, veracity, and velocity c.volume, variability, veracity, and velocity d.variety, vectors, veracity, and velocity

b.volume, variety, veracity, and velocity

A better understanding of consumer behavior through analytics directly leads to _____.

better pricing strategies

The correlation coefficient will always take values _____.

between -1 and +1

Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as _____. a.hypothesizing b.approximation c.overfitting d.postulating

c. overfitting

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"? a.010 b.100 c.001 d.000

c.001

The t value for a 99% confidence interval estimation based upon a sample of size 10 is _____. a.1.645 b.1.812 c.3.249 d.2.576

c.3.249 A 99% confidence level is: x=1-99% =1-.9 =.01 X/2=.01/2 =.005 then, t.005,10=3.169 using t-table

This bar chart displays the demographics of a Business Analysis class. How many male students are in the class? ​ a.130 b.30 c.50 d.80

c.50

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a.72.28 b.88.57 c.75.39 d.66.21

c.75.39 =SQRT((25-53)^2 + (350-420)^2)

_____ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable. a.Autocorrelation b.Covariance c.Interaction d.Multicollinearity

c.Interaction

Which statement is true of an association rule? a.It seeks to classify a categorical outcome into two or more categories. b.It is a data reduction technique that reduces large information into smaller homogeneous groups. c.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. d.It uses analytic models to describe the relationship between metrics that drive business performance.

c.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

DJ needs to display data over time. Which of the following charts should he use? a.Scatter chart b.Bar chart c.Line chart d.Pie chart

c.Line chart

_____ refers to the degree of correlation among independent variables in a regression model. a.Tolerance b.Rank c.Multicollinearity d.Confidence level

c.Multicollinearity

_____ is a statistical procedure used to develop an equation showing how two variables are related. a.Data mining b.Factor analysis c.Regression analysis d.Time series analysis

c.Regression analysis

The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart? ​ panel (b) a.The model overpredicts the value of the dependent variable for small values and large values of the independent variable. b.The residuals are normally distributed. c.The model fails to capture the relationship between the variables accurately. d.The residuals have a constant variance.

c.The model fails to capture the relationship between the variables accurately.

Larger values of α have the disadvantage of increasing the probability of making a _____. a.random sampling error b.normal probability error c.Type I error d.Type II error

c.Type I error As we know, The comparision of P-value with Significance level decides if the null hypothesis will be rejected or not, If, P -value > significance level Fail to reject the null hypothesis, P -value < significance level Reject the null hypothesis, When the significance level have a higher value, it causes it to be greater than than the P-value. So, The error that increases, Reject the null hypothesis even when it is needed to be rejected i.e. True. This is a Type I error

In which of the following scenarios would it be appropriate to use hierarchical clustering? a.When the number of clusters is known beforehand b.When the number of observations in the dataset is relatively high c.When binary or ordinal data needs to be clustered d.When it is not necessary to know the nesting of clusters

c.When binary or ordinal data needs to be clustered

Bar charts use _____. a.vertical bars to display the magnitude of the categorical variable b.horizontal and vertical bars to display the magnitude of the quantitative variable c.horizontal bars to display the magnitude of the quantitative variable d.vertical bars to display the magnitude of the quantitative variable

c.horizontal bars to display the magnitude of the quantitative variable

An analysis of items frequently co-occurring in transactions is known as _____. a.regression analysis b.cluster analysis c.market basket analysis d.market segmentation

c.market basket analysis

Single linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a.closest b.most different c.most similar d.farthest apart

c.most similar

The degree of correlation among independent variables in a regression model is called _____. a.interaction b.the coefficient of determination c.multicollinearity d.the sum of squared errors (SSE)

c.multicollinearity

Euclidean distance can be used to measure the distance between _____ in cluster analysis. a.ward b.objects c.observations d.clusters

c.observations

We create multiple dashboards _____. a.to help the user scroll vertically and horizontally to see the entire dashboard b.to make sure the KPIs are not displayed in the data dashboard c.so that each dashboard can be viewed on a single screen d.so that all dashboards can be viewed on a single screen

c.so that each dashboard can be viewed on a single screen

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called _____. a.Euclidean distance b.Jaccard's coefficient c.the matching coefficient d.the antecedent

c.the matching coefficient

The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart? ​ panel (c) a.The model captures the relationship between the variables accurately. b.The model underpredicts the value of the dependent variable for intermediate values of the independent variable. c.The residuals have a constant variance. d.The residual distribution is not normally distributed.

d.The residual distribution is not normally distributed.

In order to visualize three variables in a two-dimensional graph, we use a _____. a.2-D chart b.column chart c.3-D chart d.bubble chart

d.bubble chart

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called _____. a.market analysis b.data visualization c.supervised learning d.cluster analysis

d.cluster analysis

An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a _____. a.clustered bar chart b.stacked bar chart c.pie chart d.clustered column chart

d.clustered column chart

A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a _____. a.scatter chart b.bubble chart c.clustered column chart d.column chart

d.column chart

A variable used to model the effect of categorical independent variables in a regression model is known as a _____. a.dependent variable b.predictor variable c.response d.dummy variable

d.dummy variable

In the simple linear regression model, the _____ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables. a.residual b.constant term c.model parameter d.error term

d.error term

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a.support count b.antecedent c.consequent d.lift

d.lift

The value of the _____ is used to estimate the value of the population parameter. a.population estimate b.population statistic c.sample parameter d.sample statistic

d.sample statistic

A line chart that has no axes but is used to provide information on overall trends for time series data is called a _____. a.time series plot b.trendline c.bubble chart d.sparkline

d.sparkline

The process of converting a word to its stem, or root word, is referred to as _____. a.data cleaning b.tokenization c.stacking d.stemming

d.stemming

The graph of the simple linear regression equation is a(n) _____. a.hyperbola b.ellipse c.parabola d.straight line

d.straight line

Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.

data dashboards

The U.S. Internal Revenue Service uses _____ to identify patterns that distinguish questionable annual personal income tax filings.

data mining

The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant's database exemplifies _____

data queries

Optimization models can be used to _____.

decide on how to invest cash received from insurance policies

When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses _____ to develop an optimal strategy.

decision analysis


Related study sets

Practice Test Ch 13: Political Participation, Elections, and Campaigns

View Set

9.1 Rational & Irrational Numbers

View Set

Themis- Missed PQs (Contracts & Family Law)

View Set

ECON120-Homework Questions and Answers

View Set

NURS 401 PrepU Ch. 23 Management of patients with Chest and Lower Respiratory Tract Disorders (Brunner & Suddarth)

View Set

Chapter 1: Introduction to Statistics and Research Design

View Set

Clinical Microscopy - Chemical tests

View Set