Math 15 Midterm 1
The null hypothesis for ________________ is that the population means are all equal, or (not listed here) that the samples are drawn from the same population.
analysis of variance
In a histogram, what does probability equal?
area
The sign test uses the _______________ distribution.
binomial it tests whether numbers are above or below (two possibilities!) the population median.
__________ plots provide a quick visualization of the range of data and feature a center line, median box, and first and third quartiles
box and whisker plot based on quartiles
__________ the standardized measure of the risk per unit of return
coefficient of variation
____________ is calculated as the standard deviation divided by the expected return. σ/μ(a one-number measurement of spread)
coefficient of variation
T or F: In probability density distribution functions the values are always positive and the total area under the function is =0
False, the total area under the function is =1
When there is almost no linear correlation in a scatterplot, what does that indicate about the value for R.
No linear correlation indicates that the R value is ) or close to 0. R can never be below -1 or above 1
The ___________, a measure of variability less sensitive to outliers than s, is the difference between the upper and lower quartiles.
interquartile range
________is the height of the box and= 3Q-1Q
interquartile range
40 former smokers have their breathing rate tested in January and June of the same year. The two samples should be treated as what two variables?
large and dependent
If you are analyzing net worths in the WLS dataset, and you break them down by college degree, the samples are what two variables?
large and independent
The _______ quartile separates the smallest 25% of the data from the remaining 75%.
lower
_________=(max+min)/2 and is defined by outliers
midrange/median
__________ is a distribution based measurement that applies to non-numeric data, it depends on the shape of the distribution and in noisy data it becomes less useful and may not exists
mode
The chi-square (χ²) distribution (variances) assumes the population is distributed in what way?
normally distributed or close to it
T or F: For the additional college degree variable, the mean can be used.
false
T or F: For the body mass index (2003-5 survey) variable, the mean can be used.
false
T or F: For the population of graduate's high school variable, the mean can be used.
false
True or False: A graph is honest as long as the size of the bar(or lines, or pie slices, or columns) matches the data
false
True or False: An r value close to -1 means that the two variables are not correlated.
false
True or False: An r value close to zero means that the two variables are not correlated
false
True or False: Sorted histograms use ordinal and numeric data frequently
false
True or False: The "lie ratio" of a graph measures it efficiency or inefficiency.
false
True or False: The best graphs have the most detail
false
True or false: Always use as many variables as you can in a multiple linear correlation.
false
True or false: For the number of children variable, the mean can be used.
true
True or false: Ordinal data can be ranked lowest to highest and is not treated as numbers
true
μ
represents the population mean
Σ Xi
represents the sum of all scores present in the population (say, in this case) X1 X2 X3 and so on
N
represents the total number of individuals or cases in the population
________________ show bars ordered from low to high or high to low
sorted histogram
In a probability distribution, the probability is given by ___________________
the area above the range
How do you determine the confident interval?
-Apply α or α/2 to distribution, determine critical value(s) -Convert critical values into parameter values
How to test hypothesis?
-Convert parameter value (null hypothesis) into value on distribution and compare -determine p-value via distribution; if p-value is less than α or α/2, reject the null hypothesis
In a distribution the z score of a mean equals __________
0
What would be the smallest number in a distribution? (Not the lowest frequency, but the smallest number.)
1st quartile
How do you find the proportionally weighted mean?
= Σ(pi∙xi); multiply each value by some weight (pi); Σpi must equal 1. Note that ifpi=ni/N, this is the same as population weighted mean.
Suppose you were trying to find the minimum value for a one-tailed confidence interval of a standard deviation at 99% confidence level. Which Excel formula would you use to find the chi-square value in the calculation?
CHISQ.INV(0.99) Use 0.99 here. 0.005 and 0.995 are the probabilities for a two-tailed confidence interval, and you would use 0.99 instead of 0.01 because for the standard deviation/variance, the higher critical value gives you the lower bound.
What is used to find a confidence interval for the standard deviation of a population?
Chi-Square
Analysis of variance requires use of which of what type of distribution?
F
population number symbol
N
Suppose you were trying to find the minimum value for a one-tailed confidence interval of a mean at 99% confidence level. Which Excel formula would you use to find the z-score in the calculation?
NORM.S.INV(0.01), Here, you'd use 0.01 as the probability.
_________ orders the bins from highest to lowest frequency and adds a cumulative percentage line
Pareto chart
How do you find the standard deviation for a chi-square (χ²) distribution (variances)?
Take square root of variance
_________ is an example of a distribution a general function(mathematical or empirical) of a set of x values
a histogram
__________________ establishes a confidence interval for the population mean. (Remember that we only establish confidence intervals for population values, not sample values. We know sample values.)
central limit theorem
_________ in excel works for standard deviation/variance, if population is known to be normally distributed
chi-square
What are the two methods of predictive statics?
confidence interval and hypothesis testing
The cumulative percentage line on a histogram shows what?
everything in that category or to the left of it; can be displayed as a percentage or a whole
What does the cumulative percentage line on a histogram show?
everything in that category or to the left of it; displayed as a percentage or the whole number
How is the correlation between two variables expressed?
expressed by either +/- 1
________is a bar chart that displays data sorted into categories
histograms
In the WLS data set, if you compared all the cognition scores from 1993 and all the cognition scores from 2003, these two samples would be independent or dependent?
independent The sample sizes are certainly above 30, so they're large, and since you're comparing all the scores, they're independent
A one tailed chi-square (χ²) distribution is often used for what type of standard deviation?
maximum
What are the three types of distribution based measurements?
mean, median, and mode
The centroid on a plot is determined by the ______________ of two variables.
means
__________ is the middle of the distribution and is useful because it refers to the distribution, it is resistant to outliers and can be used for ordinal data but usually isn't.
median
The median of the lower group equals what?
median lower group= Q1
The median of the upper group equals which quartile?
median of the upper group= Q3
How do you compute the geometric mean?
multiply all the values and take the Nth root of the product; (Πxi)^1/N
How do you compute the population weighted mean?
multiply each value by the frequency it occurs (ni) and divide by the total (N =Σni)
sample number symbol
n
The chi-square (χ²) distribution (variances) has a degree of freedom equal to what?
n-1
________ in excel works for means and any population if σ is known or assumed
normal
What does the mean say about the distribution?
nothing
sample fraction symbol
p^
A straight line on a log-log plot indicates a ______________ relationship between two variables.
power
In a negative correlation, what does r equal?
r=-1
In a weak negative correlation, what does r equal?
r=0
In a strong correlation what does r equal?
r=1
The chi-square (χ²) distribution (variances) deals with what type of samples?
random
__________frequency is the frequency divided by the total number of data points. It can be displayed as a percentage or decimal.
relative
We measure the heights of 100 UC/Merced students. Is this a population or a sample?
sample
We measure the sleeping habits of 150 UC/Merced students, all women. Is this a population or a sample?
sample
_______ is a small subset of the population
sample
R values only measure linear correlations so they are always in which type of plot?
scatter plot
Which of the following is based on the "third moment" of a data set? (That is, which of the following requires raising something to the third power?)
skew
A dataframe in R is most closely equivalent to a _______________ in Excel.
spreadsheet
standard deviation formula
sqrt(sum of squares of the deviation from the mean/n-1)
__________ organizes printed data like a histogram; it uses the last digit or digits as a horizontal bar
stem and leaf plot
_________ in excel works for means and if population is known to be normally distributed; useful for small samples as it does not require σ to be known
t
________ is designated as H₀.
the null hypothesis
True or False: The adjusted r2 value is always less than the unadjusted r2 value.
true
T or F: For the cognition score (1993 survey) variable, the mean can be used.
true
T or F: For the number of days in bed (2011 survey) variable, the mean can be used.
true
T or F: For the number of marriages variable, the mean can be used.
true
T or F: For the parental income variable, the mean can be used.
true
True or False: It's more important for a graph to be clear than use its elements efficiently.
true
The __________ quartile separates the largest 25% from the smallest 75%.
upper
What is on the x and y axis of histograms?
x axis=categories y axis= number in the category
The symbol for sample mean is
x̅
__________ is the number of standard deviations and an observation above/below the mean
z score
standard deviation symbol
σ
population variance symbol
σ²