BUAL 5380
The correlation value ranges from
-1 to +1
The percentage of variation (R2) ranges from
0 to +1
Approximately what percentage of the observed Y values are within one standard error of the estimate of the corresponding fitted Y vaules
67%
In a generic box plot, the x inside the box indicates the location of the
A Mean
Which of the following are the three most common measures of central tendency?
A Mean, median, and mode
How is the median defined if the number of observations is even
A The average of the two middle observations
Measure of central location
A central value that best represents a distribution of data. Measures of central location include the mean, median, and mode. Also called the measure of central tendency.
ordinal data
A statistical data type that exists on an arbitrary numerical scale where the exact numerical value has no significance other than to rank a set of data points. Deals with the order or position of items such as words, letters, symbols or numbers arranged in a hierarchical order. Quantitative assessment cannot be made.
Frequency table
A table for organizing a set of data that shows the number of times each item or number appears.
If a value represents the 95th percentile, this means that
A. 95% of all values are below this value
Another term for constant error variance is
A. Homoscedasticity
In regression analysis, the variables used to help explain or predict the response variable are called the
A. Independent variables
Which of the following definitions best describes parsimony?
A. explaining the most with the least
The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The 10 female students in the class averaged.
B 85
The correlation values ranges from
B. -1 to +1
Coding males as 1 and females as 0 in a data set illustrates the use of
B. Dummy variables
Many statistical packages have three types of equation-building procedures. They are:
B. Forward, backward and stepwise
The appropriate hypotheses test for an ANOVA test is
B. H0= all B=0, Ha: at least on be B not equal to 0
What is the most common type of chart for showing the distribution of numerical variable
B. Histogram
The appropriate hypothesis test for a regression coefficient is:
B. Ho: B=0 , Ha: B not = 0
In regression analysis, if there are several explanatory variables, it is called
B. Multiple regression
Researchers may gain insight into the characteristics of a population by examining a
B. Sample of the population
Which of the following is not one of the assumptions of regression
B. The response variable is not normally distributed
The interquartile rage (IQR) represents what percent of the observations?
B. middle %50
A sample of a population taken at one particular point in time is categorized as
C. Categorical
Gender and state are examples of which type of data
C. Categorical data
If you an determine that the outlier is not really a member of the relevant population, then it is appropriate and probably best to
C. Delete it
In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the other variables in the regression equation
C. Product
In order for the characteristics of a sample to be generalized to the entire population, it should be:
C. Representative of the population
The percentage of variation (R2) can be interpreted as the fraction (or percent) of variation of the
C. Response variable explained by the regression line.
The test statistic in an ANOVA analysis is
C. The F-Statistic
Which one of the following is not one of the assumptions of regression
C. The standard deviation of the response variable increases as the explanatory variables increase
In the standardized value b1-B1)1sp, the symbol sb represents the
C. standard error of b1
Data collected from approximately the same period tf time from a cross-secion of a population are called
Cross-secional Data
A multiple regression analysis including 50 data points and 5 independent variables results in (formula). The multiple standard error estimate will be.
D. 0.953
Expressed in percentiles, the interquartile range is the difference between the
D. 35th and 85th percentiles
Categorizing age variables as "young" "middle-aged" and "elderly" is an example of
D. Binning
Data that arise from counts are called
D. Discrete Data
Is/are especially helpful in identifying outliers
D. Scatterplots
continuous data
Data that can take on any value. There is no space between data values for a given domain. Graphs are represented by solid lines.
Nominal Data
Data which consists of names, labels, or categories.
discrete data
Data with space between possible data values. Graphs are represented by dots.
In regression analysis, the variable we are trying to explain or predict is called the
Dependent variable
What is the decision making process?
Is a purposeful and goal directed effort that uses a systematic process to choose among options. * 1.identify the problem and DEFINE it 2. gather data 3. analyze data 4. identify the options/solutions 5. Pros and cons of each options 6. selection - make the DECISION
Outliers are observations that
Lie outside the typical pattern of points on a scatterplot
___ is especially helpful in identifying outliers
Scatterplots
Empirical Rule
The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve
Quartiles
Values that divide a data set into four equal parts
In regression analysis, which of the following casual relationships are possible?
X causes Y to vary Y causes X to vary Other variables cause X and Y to vary
In multiple regression, the coefficients reflect the expected change in:
Y when the associated X value increases by one unit
The weakness of scatterplots is that they
do not actually quantify the relationships between variables
A single variable X can explain a large percentage of the variation in some other variable Y when the two variables are
highly correlated
Regression analysis asks
how a single variable depends on other relevant variables
In regression analysis, the variables used to help explain or predict the response variable are called the
independent variables
In multiple regression, the constant
is the expected values of the dependent variable Y when all of the independent variables have the value zero
The covariance is not used as much as the correlation because
it is difficult to interpret
Outliers are observations that
lie outside the typical pattern of points on a scatterplot
In regression analysis, if there are several explanatory variables, it is called
multiple regression
A correlation value of zero indicates
no linear relationship
A scatterplot that appears as a shapeless mass of data points indicates
no relationship among the variables
An error term represents the vertical distance from any point to the
population regression line
In linear regresson, we fit the leat scares line to a set of values (or points on a scatterplot). The distance from the line to the point is called the
residual
The percentage of variation can be interpreted as the fraction of variation of the
response variable explained by the regression line
In choosing the "best-fitting" line through a set in linear regression, we choose the one with the
smallest sum of squared residuals
The standarad error of the estimate (Se) is essentially the
standard deviation of the residuals
mean
the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores
Interquartile Range (IQR)
the difference between the first and third quartiles
Median
the middle score in a distribution; half the scores are above it and half are below it
Mode
the most frequently occurring score(s) in a distribution
In linear regression, the fitter values is
the predicted value of the dependent variable
Given the least squares regression line, which statement is true y=8-3x
the realatioship between x and y is negative
population standard deviation
the square root of the population variance
Correlation is a summary measure that indicates
the strength of the linear relationship between pairs of variables
The term autocorrelation refers to
time series variables are usually related to their own past values