BUAL 5380 Midterm
symmetric
A histogram that has a single peak and looks approximately the same to the left and right of the peak is called:
0.953
A multiple regression analysis including 50 data points and 5 independent variables results in sigma subscript 2 minus 40. The multiple standard error of estimate will be:
-1
A perfect straight line sloping downward would produce a correlation coefficient equal to
304
A sample of 20 observations has a standard deviation of 4. The sum of the squared deviations from the sample mean is:
cross-sectional
A sample of a population taken at one particular point in time is categorized as:
highly correlated
A single variable X can explain a large percentage of the variation in some other variable Y when the two variables are:
a side-by-side plot or side-by-side pivot table
A useful way of comparing the distribution of a numerical variable across categories of some categorical variable is
there is a natural ordering of categories
A variable is classified as ordinal if:
all of the other independent variables remain constant
An important condition when interpreting the coefficient for a particular independent variable X in a multiple regression equation is that:
67%
Approximately what percentage of the observed Y values are within one standard error of the estimate ( ) of the corresponding fitted Y values?
the strength of the linear relationship between pairs of variables
Correlation measures
discrete
Data that arise from counts are called:
All of these choices
Example of comparison problems include
Q3 and Q1
Expressed in percentiles, the interquartile range is the difference between the
50%
For a boxplot, the box itself represents what percent of the observations?
mean
For a boxplot, the point inside the box indicates the location of the
median
For a boxplot, the vertical line inside the box indicates the location of the
the covariance will be 0
Generally speaking, if two variables are unrelated (as one increases, the other shows no pattern), the covariance will be
-0.8
If Cov( X, Y) = - 16.0, variance of X = 25, variance of Y = 16 then the sample coefficient of correlation r is
95% of all values are below this value
If a value represents the 95th percentile, this means that
a cluster of points with no apparent relationship on the scatterplot
If the correlation is close to 0, then we expect to see
exactly 0.50
In a histogram, the percentage of the total area which must be to the left of the median is:
80%
In a simple linear regression analysis, the following sums of squares are produced: SST =400, SSE= 80, SSR = 320 The proportion of the variation in Y that is explained by the variation in X is:
to include categorical variables in the regression equation
In linear regression, a dummy variable is used:
predicted value of the dependent variable
In linear regression, the fitted value is the:
product
In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the __________ other variables in the regression equation.
residual
In linear regression, we fit the least squares line to a set of values (or points on a scatterplot). The distance from the line to a point is called the:
representative of the population
In order for the characteristics of a sample to be generalized to the entire population, it should be:
explanatory variables being highly correlated
In regression analysis, multicollinearity refers to:
discrete and continuous
Numerical variables can be subdivided into which two types?
sample of the population
Researchers may gain insight into the characteristics of a population by examining a
95%
Suppose that a histogram of a data set is approximately symmetric and "bell-shaped" Approximately what percent of the observations are within two standard deviations of the mean?
All of these
Suppose you run a regression of a person's height on his/her right and left foot sizes, and you suspect that there may be multicollinearity between the foot sizes. What types of problems might you see if your suspicions are true?
the number of explanatory variables in a multiple regression model
The adjusted R2 adjusts R2 for:
H0:B = 0, Ha: B does not equal 0
The appropriate hypothesis test for a regression coefficient is:
higher than males
The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The 10 female students in the class averaged:
the strength of the linear relationship between pairs of variables
The correlation is best interpreted
all of the above
The decision making process includes
interquartile range
The difference between the first and third quartile is called the
IQR
The length of the box in the boxplot portrays the
is very sensitive to the units of the variables
The limitation of covariance as a descriptive measure of association is that it
middle observation when the data values are arranged in ascending order
The median can also be described as:
most frequently occurring value
The mode is best described as the
response variable explained by the regression line
The percentage of variation (R^2 ) can be interpreted as the fraction (or percent) of variation of the
pivot table
The tool that provides useful information about a data set by breaking it down into subpopulations is the:
number of independent variables included in the equation
The value k in the number of degrees of freedom, n-k-1, for the sampling distribution of the regression coefficients represents:
autocorrelation
Time series data often exhibits which of the following characteristics?
We are usually on the lookout for large correlations near +1 or - 1
To examine relationships between two categorical variables, we can use
both of these choices
We can infer that there is a strong relationship between two numerical variables when
Correlation, Covariance, scatterplot charts
We study relationships among numerical variables using
is a candidate for exclusion
When determining whether to include or exclude a variable in regression analysis, if the p-value associated with the variable's t-value is above some accepted significance value, such as 0.05, then the variable:
Is there an observable trend? and Is there a seasonal pattern?
When we look at a time series plot, we usually look for which two things?
Covariance and correlation
Which of the following are considered measures of association?
all of these options are possible
Which of the following are possible categorizations of data type?
mean, median and mode
Which of the following are the three most common measures of central location?
variance and standard deviation
Which of the following are the two most commonly used measures of variability?
all are true
Which of the following are true statements of pivot tables?
explaining the most with the least
Which of the following definitions best describes parsimony?
frequency table
Which of the following indicates how many observations fall into various categories?
Select the scale for the model.
Which of the following is not one of the steps in the modeling process?