BI310 Final
___________ refers to describing the important aspects of a set of measurements.
Descriptive statistics
In a simple linear regression model, the coefficient of determination not only indicates the strength of the relationship between the independent and dependent variables, but also shows whether the relationship is positive or negative.
False
In simple linear regression analysis, if the error terms exhibit a positive or negative autocorrelation over time, then the assumption of constant variance is violated.
False
Selecting many different samples and running many different tests can eventually produce a result that makes a desired conclusion be true.
False
The error term is the difference between an individual value of the dependent variable and the corresponding mean value of the dependent variable.
False
The experimental region is the range of the previously observed values of the dependent variable.
False
The least squares simple linear regression line minimizes the sum of the vertical deviations between the line and the data points.
False
The science of describing the important aspects of a set of measures is called statistical inference.
False
When there is positive autocorrelation, over time, negative error terms are followed by positive error terms and positive error terms are followed by negative error terms.
False
When using simple regression analysis, if there is a strong correlation between the independent and dependent variables, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable.
False
The point estimate of the variance in a regression model is
MSE
Sampling error occurs because a mean of a random sample can not exactly equal the population mean that we are attempting to estimate.
True
The estimated simple linear regression equation minimizes the sum of the squared deviations between each value of Y and the line
True
The number of sick days per month taken by employees for the last 10 years at Apex Co. is an example of time series data.
True
The residual is the difference between the observed value of the dependent variable and the predicted value of the dependent variable.
True
An example of manipulating a graphical display to distort reality is ___________.
stretching the axis
If the Durbin-Watson statistic is less than dL, then we conclude that
there is significant positive autocorrelation
Which of the following is the best analytic dashboard graphical method for visualizing hierarchical information?
treemap
If r = −1, then we can conclude that there is a perfect relationship between X and Y.
true
Stem-and-leaf display is best used to ___________.
display the shape of the distribution
an ______________ is one unit of a population.
element
A data set provides information about some group of individual _____________.
elements
All of the following are assumptions of the error terms in the simple linear regression model except
error terms are dependent on each other.
The _____________ is the range of the previously observed values of x.
experimental region
In simple regression analysis, the quantity E(Y-Y)2 is called the __________ sum of squares.
explained
A population that consists of all the customers who will use the drive-thru of the local fast food restaurant is called a(n) _____________.
finite population
When the assumption of __________ residuals (error terms) is violated, the Durbin-Watson statistic is used to test to determine if there is significant _____________ among the residuals.
independent, autocorrelation
__________________ assigns a value to a variable
measurement
A person's telephone area code is an example of a(n) _____________ variable.
nominative
a _____________________ is bell-shaped with even distribution on both sides of the high point of the curve
normal curve
If successive values of the residuals are close together, then there is a ___________ autocorrelation and the value of the Durbin-Watson statistic is _________.
positive, small
As a measure of variation, the sample ___________ is easy to understand and compute. It is based on the two extreme values and is therefore a highly unstable measure.
range
A statistical model is a set of assumptions based solely on the sample data that have been selected.
False
Business analytics uses methods that are not part of traditional statistics to look at big data
False
The Durbin-Watson test statistic ranges from
0 to 4
What value of the Durbin-Watson statistic indicates that there is no autocorrelation present in time-ordered data?
2
Bullet graphs are a method of ____________
Descriptive analytics
are graphical summaries of data intended to aid the understanding of up-to-the-minute information about the operational status of a business
Descriptive analytics
is a set of tools for finding unusual observations in a data set. These observations may merit investigation.
Anomaly (outlier) detection
A ____________ variable can have values that are numbers on the real number line
quantitative
The dollar amount on an accounts receivable invoice.
quantitative
The national debt of the United States in 2015.
quantitative
The net profit for a company in 2015.
quantitative
examine all of the population units
Census
is a set of techniques for assigning observations to the most appropriate of several pre-specified categories.
Classification
is a set of techniques for finding inherent groupings or clusters within a data set without having to pre-specify a set of categories.
Cluster Detection
The ______________ is a quantity that measures the variation of a population or sample relative to its mean.
Coefficient of variation
should not be used to make valid statistical inferences about a population.
Convenience, voluntary, and judgment sampling
is the use of predictive analytics, algorithms, and information system techniques to extract useful knowledge from huge amounts of data
Data Mining
is a set of techniques for reducing a large number of correlated variables to a smaller group of underlying factors describing the essential aspects of a situation.
Factor Detection
A graphical portrayal of a quantitative data set that divides the data into classes and gives the frequency of each class is a(n) ___________.
Histogram
The least squares regression line minimizes the sum of the
squared differences between actual and predicted Y values.
Statistical ____________ refers to using a sample of measurements and making generalizations about the important aspects of a population
Inference
Door choice on Let's Make A Deal Door #1 Door #2
Nominative
Personal computer ownership Yes No
Nominative
ncome tax filing status Married filing jointly Married filing separately
Nominative
Restaurant rating ***** **** *** ** *
Ordinal
Statistics course letter grade A B C D F
Ordinal
Television show classifications TV-G TV-PG TV-14 TV-MA
Ordinal
is the use of anomalies, patterns, and associations to predict future outcomes or their probabilities.
Prediction
are methods for finding anomalies, patterns, and associations in data which can be used to redict future outcomes.
Predictive Analytics
factor detection outlier detection association learning are all methods of ___________________.
Predictive analytics
is the generation of courses of action based upon results from predictive analytics, supplemented by values of relevant variables
Prescriptive analytics
___________ sampling is where we know the chance that each element will be included in the sample, which allows us to make statistical inferences about the sample population.
Probability
The simple linear regression (least squares method) minimizes
SSE
When error terms exhibit a positive or negative autocorrelation over time, the assumption of independence is violated
STATEMENT
A relative frequency curve having a long tail to the right is said to be ___________.
Skewed to the right
If we sample without replacement, we do not place the unit chosen on a particular selection back into the population.
True
A simple linear regression model is an equation that describes the straight-line relationship between a dependent variable and an independent variable
True
By taking a systematic sample in which we select every 100th shopper arriving at a specific store, we are approximating a random sample of shoppers.
True
In a simple linear regression model, the slope term is the change in the mean value of y associated with _____________ in x.
a one-unit increase
Any characteristic of a population unit is a(n)
Variable
A flaw possessed by a population or sample unit is ___________.
a defect
Which of the following is a violation of the independence assumption?
a pattern of cyclical error terms over time a pattern of alternating error terms over time negative autocorrelation positive autocorrelation
is a method of finding characteristics that tend to occur together and finding descriptions of how these characteristics are associated.
association learning
The general term for a graphical display of categorical data made up of vertical or horizontal bars is called a(n) ___________.
bar chart
______________ and _____________ are used to describe qualitative (categorical) data.
bar charts, pie charts
As a general rule, when creating a stem-and-leaf display, there should be ______ stem values.
between 5 and 20
Which of the following is not a method of predictive analytics?
bullet graphs
Pie charts, Pareto charts, and bar charts are used with ___________________/___________________ data
categorical/ qualitative
The ____________ assumption requires that all variation around the regression line should be equal at all possible values (levels) of the ___________variable.
constant variance, independent
The _____________ measures the strength of the linear relationship between the dependent variable and the independent variable.
correlation coefficient
Which of the following is a measure of the strength of the linear relationship between x and y that is dependent on the units in which x and y are measured.
covariance
__________________________ looks at data collected at the same point in time.
cross-sectional analysis
________________ is the science of describing the important aspects of a set of measures is called statistical inference.
descriptive statistics
Which of the following is not a supervised learning technique in predictive analytics?
factor analysis
If we examine some of the population measurements, we are conducting a census of the population.
false
When the constant variance assumption holds, a plot of the residual versus x
forms a horizontal band pattern
The number of measurements falling within a class interval is called the ___________.
frequency
The ___________ the r2 and the __________ the s (standard error), the stronger the relationship between the dependent variable and the independent variable.
higher, lower
Which one of the following graphical tools is used with quantitative data?
histogram
If there is significant autocorrelation present in a data set, the ________________ assumption is violated.
independence of error terms
Temperature (in degrees Fahrenheit) is an example of a(n) __________ variable
interval
Any value of the error term in a regression model _____________ any other value of the error term
is independent of
The __________ the r2 , the better the prediction model
larger
the __________________ direction defines the skewness of the graph, in this case skewed to the right.
long tail
When using simple linear regression, we would like to use confidence intervals for the ___________ and prediction intervals for the ___________ at a given value of x.
mean y-value, individual y-value
Another name for 50th percentile
median
In simple regression analysis, the standard error is ___________ greater than the standard deviation of y values.
never
__________ is a necessary component of a runs plot.
observation over time
Measurements from a population are called
observations
An identification of police officers by rank would represent a(n) ____________ level of measurement.
ordinal
______ & ___________ are used for a single qualitative variable
pie charts and bar charts
A set of all elements we wish to study is called a ____________.
population
One method of determining whether a sample being studied can be used to make statistical inferences about the population is to
produce a runs plot
The change in the daily price of a stock is what type of variable?
quantitative
A(n) ____________ variable can have values that indicate into which of several categories of a population it belongs.
qualitative
The advertising medium (radio, television, or print) used to promote a product.
qualitative
The stock exchange on which a company's stock is traded.
qualitative
If one of the assumptions of the regression model is violated, performing data transformations on the ____________ can remedy the situation.
response variable
A ________ and ___________ both look at data over time
runs plot, time series analysis
subset of the units in a population
sample
The point estimate of the _______________ is the positive square root of the sample variance.
sample standard deviation
When we are choosing a random sample and we do not place chosen units back into the population, we are:
sampling without replacement
A _________________ is a graphical display of the relationship between two variables
scatter plot
______________ shows the relationship between two variables.
scatter plot
Data collected for a particular study are referred to as a data ____________.
set
After plotting the data points on a scatter diagram, we have observed an inverse relationship between the independent variable (X) and the dependent variable (Y). Therefore, we can expect both the sample ___________ and the sample _____________ to be negative values.
slope, correlation coefficient
_______________ is the science of using a sample of measurements to make generalizations about the population of measurements.
statistical inference
_____________ is a set of assumptions about how the sample data are selected and about the population from which the sample data are selected.
statistical model
____________ is the science of describing aspects of a set of measurements
statistics
________________ & _________________ are used for displaying a single quantitative variable.
steam-and-leaf / histograms
The _____ distribution is used for testing the significance of the slope term
t
If we collect data on the number of wins the Dallas Cowboys earned each of the past 10 years, we have _____________ data.
time series
Which of the following is not an example of unethical statistical practices?
using graphs to make statistical inferences
____________ is a characteristic of an element
variable
_____________ are characteristics of elements in a population.
variables
Which of the following is a categorical variable?
whether a person has a traffic violation
The ___________ of the simple linear regression model is the value of y when the mean value of x is zero.
y-intercept