Math 015

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Find the geometric mean of the following numbers: 1, 5, 25.

^3√(1*5*25)=^3√125=5

(arithmetic)mean: (Σxi)/N (4)

add up all the values and divide by the number of values

histograms (1)

is a bar chart that displays data sorted into categories ◼bins(categories) are on the x-axis, frequency(number in the category) on the y-axis. ◼relative frequency is the frequency divided by the total number of data points; it can be displayed as percentage or decimal. ◼often these categories are numerical and displayed lowest to highest. ◼ordinal data: can be ranked lowest to highest but not treated as numbers ◼example: type of degree is ordinal, number of years in college is numeric

40 former smokers have their breathing rate tested in January and June of the same year. The two samples should be treated as

large/dependent

If you are analyzing net worths in the WLS dataset, and you break them down by college degree, the samples are

large/independent

Linear regression: (5)

making the "best fit" line, y = b0+b1x

The "lie ratio" of a graph measures its efficiency (or inefficiency).

false

The best graphs have the most detail.

false

Σ Xi(3)

represents the sum of all scores present in the population (say, in this case) X1 X2 X3 and so on

N(3)

represents the total number of individuals or cases in the population

sorted histogram (1)

the bars are ordered from low-to-high or high-to-low (rarely used with ordinal or numeric data)

Coefficient of Variation (CV)(3)

the standardized measure of the risk per unit of return; calculated as the standard deviation divided by the expected return. σ/μ(a one-number measurement of spread)

What is the skew of this distribution?

zero

For the body mass index (2003-5 survey) variable, the mean can be used.

false

For the population of graduate's high school variable, the mean can be used.

false

In a distribution, the z score of the mean equals

zero

Multiple linear correlation (R Assignment 4) (5)

-Results in equation with one slope per variable: y= b0+ b1x1+ b2x2+ b3x3+ . . .-More slopes(more variables) automatically increase r^2•. . .and you'll end up modeling more than you want!To correct, use. . .•adjusted r^2value = 1 -(1 -r^2)(n -1)/(n -k -1),always less than original r^2(n = number of data points, k = number of slopes/variables)

Nonlinear correlation (5)

-Since we only have a formula for linear correlation. . .-we have to rewrite the equation of x and y in linear form to find correlation.-Most common examples:•exponential relationship:y = y0ekx, applying ln to both sides creates ln y = ln y0+kx-this can also be written y =y₀Aˣ, where A = eᵏ; applying ln to both sides creates ln y = ln y₀+ x ln A•power relationship:y = Axb; applying ln to both sides creates ln y = ln A + b ln x•using log or log-log scales will show these automatically (R Assignment 3)•. . .and you can use linear regression to get the numbers, but convert them!•polynomial relationship:use multiple linear correlations with x, x², x³,. . . as variables

The coefficient of variation of this distribution is approximately the mean is 70.9 and the standard deviation is 13.6

0.2

standard deviation and variance(3)

1. find the mean 2. take all these x values away from the mean and square the results 3. find the square root of this usually the variance is needed first. these will always be positive numbers

Which of the following would be the smallest number in a distribution? (Not the lowest frequency, but the smallest number.)

1st quartile

Interquartile range(3)

3Q-1Q=the height of the box

Determine the median of the following numbers: 30, 41, 12, 69, 51, 24, 60.

41

proportionally weighted mean (4)

= Σ(pi∙xi); multiply each value by some weight (pi); Σpi must equal 1. Note that ifpi=ni/N, this is the same as population weighted mean.

Which of the following is a correct Excel formula?

=MEDIAN(A2:A101) is correct; ranges in Excel are always specified by naming the corners (or top and bottom) of the range, separated by a colon. (A2:101) does not specify the column of the second cell.

In the WLS data set, which of the following is an example of an numeric variable but not ordinal?

BMI, net worth, and cognition score are all numeric (whether or not the mean can be used) and gender isn't ordinal (it can't be ranked).

Suppose you were trying to find the minimum value for a one-tailed confidence interval of a standard deviation at 99% confidence level. Which Excel formula would you use to find the chi-square value in the calculation?

CHISQ.INV(0.99), Use 0.99 here. 0.005 and 0.995 are the probabilities for a two-tailed confidence interval, and you would use 0.99 instead of 0.01 because for the standard deviation/variance, the higher critical value gives you the lower bound.

Which of these distributions is used to find a confidence interval for the standard deviation of a population?

Chi-square is used when finding the standard deviation of a population.

In the WLS data set, which of the following is an example of an ordinal but not numeric variable?

College degree is ordinal

Analysis of variance requires use of which of these distributions?

F

Assuming this is a probability distribution, the probability of getting a number between 0.9 and 1.5 is

In a probability distribution, the probability is given by the area above the range. For 0.9 to 1.5, that area is A2.

Suppose you were trying to find the minimum value for a one-tailed confidence interval of a mean at 99% confidence level. Which Excel formula would you use to find the z-score in the calculation?

NORM.S.INV(0.01), Here, you'd use 0.01 as the probability. (You can also use 0.99.) 0.005 and 0.995 are for a two-sided confidence interval, and 1 or 99 won't work because the probability is always between 0 and 1.

N (4)

N = number of data points

Samples and populations

Sample symbols and population symbols; also some definitions ⚫ the sample is a (small) subset of the population ⚫ we know the sample parameters, not the population parameters

The Central Limit Theorem allows us to set a confidence interval for

The Central Limit Theorem establishes a confidence interval for the population mean. (Remember that we only establish confidence intervals for population values, not sample values. We know sample values.)

10C8 =

The formula for a combinatorial (nCx) is 𝑛!/(𝑛−𝑥)!𝑥, so this is 10!/8!2!=10×9/2!=10×9/2=45. (The 8! on the bottom cancels out everything in 10! except for the 10 and the 9.)

Quartiles and the interquartile range (3)

The lower quartile separates the smallest 25% of the data from the remaining 75%, and the upper quartile separates the largest 25% from the smallest 75%. The interquartile range (iqr), a measure of variability less sensitive to outliers than s, is the difference between the upper and lower quartiles.

The null hypothesis of an analysis of variance is

The null hypothesis for analysis of variance is that the population means are all equal, or (not listed here) that the samples are drawn from the same population.

The null hypothesis is also known as

The null hypothesis is designated H₀. (H₁ is the alternative hypothesis, or claim.)

z score (z) (3)

The number of standard deviations an observation is above/below the mean

In the WLS data set, if you compared all the cognition scores from 1993 and all the cognition scores from 2003, these two samples would be

The sample sizes are certainly above 30, so they're large, and since you're comparing all the scores, they're independent. (If you compared only the people who responded to both surveys, the samples would be dependent.)

The sign test uses the _______________ distribution.

The sign test uses the binomial distribution, as it tests whether numbers are above or below (two possibilities!) the population median.

estimate the r value for this scatterplot

There is almost no linear correlation here, so the best answer is 0. -0.9 and 0.9 indicate strong linear correlations (either negative or positive) and r can never be below -1 or above 1.

histogram fact (4)

a histogram is an example of a distribution—a general function (mathematical or empirical) of a set of x-values ◼x-values can be points or a range of points (bins) ◼y-values can be frequency, relative frequency, or probability ◼all of today's examples reduce a distribution to a single number

For the additional college degree variable, the mean can be used.

false

what are the two methods of predictive statics?

confidence interval and hypothesis testing

cumulative percentage line on a histogram shows? (1)

everything in that category or to the left of it, displayed as a percentage of the whole

A graph is honest as long as the size of the bars (or lines, or pie slices, or columns) matches the data.

false

Always use as many variables as you can in a multiple linear correlation.

false

An r value close to -1 means that the two variables are not correlated.

false

An r value close to zero means that the two variables are not correlated.

false

Chebyshev's Theorem: (3)

for all distributions, the fraction of data within μKσis at least1 − 1/K2

binomial

for fractions

t

for means, if population is known to be normally distributed; useful for small samples as it does not require σ to be known

normal

for means, works on any population if σ is known or assumed

Chi-square:

for standard deviation/variance, if population is known to be normally distributed

xi= (4)

individual points

Correlation between two variables (2)

is expressed by either +-1

The average of a distribution is also called the ____________.

mean

The centroid on a plot is determined by the ______________ of two variables.

means

The r (sometimes R) value: (2)

measuring linear correlation, the centroid (x,y) the point which is determined by the two means, the formula for r, r measures points in different quadrants determined by the centroid, r values from -1 to 1, r only measures linear correlations SO ALWAYS IN SCATTER PLOT

How to find quartiles (3)

median = Q2, median of lower group=Q1, median of upper group = Q3

Which of the following values are indicated on a boxplot? Check all that apply.

median and interquartile

the range of data (3)

median, quartiles, quintiles, deciles, percentiles, rank the data, then divide them based on the fractions.

geometric mean = (Πxi)^1/N (4)

multiply all the values and take the Nth root of the product

population weighted mean = Σ(ni∙xi)/N (4)

multiply each value by the frequency it occurs (ni) and divide by the total (N =Σni)

Which of the following can be used for non-numeric data?

nidemode

steam-and-leaf plot (1)

organizes printed data like a histogram; it uses the last digit or digits as a horizontal bar.

Pareto chart (1)

orders the bins from highest to lowest frequency and adds a cumulative percentage line

μ(3)

population mean

A straight line on a log-log plot indicates a ______________ relationship between two variables.

power

box and whisker plot basked on quartiles (3)

quick visualization of the range of data, center line: median box: first and third quartiles and the stems would be max and min, good for comparing data for similar things this is often used for control and experimental group

negative correlation (2)

r=-1

weak linear correlation (2)

r=0

strong correlation (2)

r=1

sample standard deviation symbol

s

We measure the heights of 100 UC/Merced students. Is this a population or a sample?

sample

We measure the sleeping habits of 150 UC/Merced students, all women. Is this a population or a sample?

sample

Which of the following is based on the "third moment" of a data set? (That is, which of the following requires raising something to the third power?)

skew

A dataframe in R is most closely equivalent to a _______________ in Excel.

spreadsheet

standard deviation formula (3)

sqrt(sum of squares of the deviation from the mean/n-1)

For the cognition score (1993 survey) variable, the mean can be used.

true

For the number of children variable, the mean can be used.

true

For the number of days in bed (2011 survey) variable, the mean can be used.

true

For the number of marriages variable, the mean can be used.

true

For the parental income variable, the mean can be used.

true

It's more important for a graph to be clear than use its elements efficiently.

true

The adjusted r2 value is always less than the unadjusted r2 value.

true

The symbol for sample mean is

sample mean symbol

sample median symbol

θs

standard deviation symbol (3)

σ

Randomness and Probability (6)

•0 means never, 1 means always, everything else is in between•P(A and B) = P(A)P(B) for independent events•Frequentist statistics: probability is the fraction of n occurrences, as n➔∞

Probability distributions(6)

•In a histogram, probability = area, Probability density distribution functions ◼Infinitely finely divided histograms (why yes this is calculus) ◼Always positive; total area under function = 1 (so we rarely label the y-axis) ◼Probability given by relative area under the function for the given x-values

The chi-square (χ²) distribution (variances)

◼ Random samples, of course ◼ Assumes population is normally distributed or close to it ◼ Degree of freedom = n − 1 ◼ Reversed: high variance equals low chi-squared ◼ Take square root of variance for standard deviation ◼ One-tailed version often used for maximum standard deviation

the mean(μ):the average (4)

◼different kinds of mean ◼the total impact at a single value ◼advantage:it's a number everyone knows but ◼(disadvantages) it doesn't say anything about distribution ◼and is easily affected by outliers

distribution-based measurement (4)

◼midrange= (max+min)/2; defined by outliers ◼mode(most frequently occurring),this is the only measurement here that applies to non-numeric data, usefulness of mode depends on the shape of the distribution, in noisy data, mode becomes less useful, and might not exist ◼median(θ):middle of the distribution, useful because refers to distribution, resistant to outlier, can be used for ordinal data but usually isn't, first and third quartiles:middles of the top and bottom halves

How do you determine the confident interval?

⚫ Apply α or α/2 to distribution, determine critical value(s) ⚫ Convert critical values into parameter values

How to test hypothesis?

⚫ Convert parameter value (null hypothesis) into value on distribution and compare or. . . ⚫ determine p-value via distribution; if p-value is less than α or α/2, reject the null hypothesis


Ensembles d'études connexes

Spelling Review Lessons 3 and 4 Language Arts 700

View Set

GBUS 450 Business Ethics Practice Review Guide Ch 1 - 5

View Set

Unit 13: Introduction to Marketing

View Set

Accounting 202 Midterm 1 Formulas

View Set

WEEK 2 [ADN 225] CHAPTER 2-"IMMUNITY AND DISEASE", CHAPTER 3-"INFECTIOUS DISEASES"

View Set

Information Systems Management- Module 3- Hardware and Software- Computer System Uses

View Set

Study Sync First Read: Speech to the Second Virginia Convention (comprehension questions)

View Set