chapter 6

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The addition rule

tells us that to obtain the probability of either of two events occurring, we add together the individual probabilities, but then subtract the likelihood of both occurring together

the cumulative distribution

tells us the probability of a value as large or larger (or as small or smaller) than some specific value.

rule of subtraction

that the probability of some event A not happening is one minus the probability of the event happening:

The sample space

the set of possible outcomes for an experiment. We represent these by listing them within a set of squiggly brackets. For a coin flip, the sample space is {heads, tails}. For a six-sided die, the sample space is each of the possible numbers that can appear: {1,2,3,4,5,6}

The standard deviation

the square root of the variance

How would you use R to calculate variance in hwy?

var(mpg$hwy)

classical probability

we compute the probability directly based on our knowledge of the situation.

what is a z-score

A numeric value that shows how many standard deviations a score is away from the mean A numeric value that shows a score's relative position in a distribution

z-score

A z score represents the number of standard deviations a score is above (if positive) or below (if negative) the mean. So, the units are standard deviations. A z score of 2 is two standard deviations above the mean. A z score of 0.4 is 0.4 standard deviations above the mean.

Why is the sum of the residuals equal to 0?

Because the mean balances the residuals

Here we have depicted the mean as a vertical blue line. Why is the mean a good model for hwy?

Because the mean is a model that balances the residuals and minimizes the sum of squared residuals

what does the numerator mean in 𝑧𝑖=𝑌𝑖−𝑌¯/𝑠

Deviation

If we fit a normal curve on the distribution of hwy (see visualization below), what is it that we're modeling with it?

Error around the model for hwy

If you ran the R code below, what would you be able to tell from the output? Empty.model <- lm(hwy ~ NULL, data = mpg) anova(Empty.model)

How much error there is around the empty model The sum of the squared residuals The sum of squares

The mean of hwy is 23.44. If you wanted to calculate a z score for a hwy of 27, how would it be affected by the standard deviation for hwy?

If the standard deviation is large, the z score should be small and positive.

If you've calculated the standard deviation for hwy, what have you found?

Roughly the average deviation from the mean, in highway miles per gallon

Does a wider spread have a smaller or larger z -score

Smaller spread because a larger positive z-score will be closer to the right edge of a distribution

Which of these measures of spread might be most useful in measuring how far 65.1 mm is from the mean?

Standard Deviation, average error: 8.726695

What is the difference between a z score and a standard deviation?

Standard deviation (SD) is roughly the average deviation of all scores from the mean. It can be seen as an indicator of the spread of the distribution. A z score uses SD as a sort of ruler for measuring how far an individual score is above or below the mean. A z score tells you how many standard deviations a score is from the mean of its distribution, but doesn't tell you what the standard deviation is (or what the mean is).

How can we calculate the total error in a single-number model, or the total variability in a distribution?

Take the absolute value of each residual and then add them up Square each residual and then add them up

If the simple model of our TinyFingers data was a number other than the mean (62), then which of the following would be true?

The SS is bigger than the SS for the mean.

what does the sample variance (s^2) represent

The average variability in our distribution

If the z score for your friend's car's highway miles per gallon is found to be .6, what does that mean?

The car's highway miles per gallon is .6 standard deviations larger than the mean for hwy.

What's true of the distribution of any variable, if your model is the mean of that variable?

The distribution of the variable is the same shape as the distribution of its residual.

If you ran the R code below, what would you be able to tell from the output? Empty.model <- lm(hwy ~ NULL, data = mpg) Empty.model

The mean

Let's say we want to compare the Light model for weight gain (WgtGain4 = Light + error) to the empty model (WgtGain4 = mean + error). What does the "mean" in the empty model word equation refer to?

The mean of WgtGain4 for all the mice

If the z score for a mouse's weight gain is -0.7, what does that mean?

The mouse's weight gain is 0.7 standard deviations lower than the mean of WgtGain4.

You probably noticed that the border (the black line representing 65.1 mm) is not labeled 65.1. Instead, it is labeled "z = 0.57." What does this z score mean?

The number of standard deviations that fit between the mean and 65.1 mm

Why do we prefer to use the standard deviation instead of variance when describing the average variability in our distribution?

The standard deviation is in the original units whereas variance is in squared units

Below is the histogram for hwy. What would you get if you were to total up the height (the "count") of all the bars?

The total number of cars in the mpg data frame

The sum of squares gets larger as:

The variation increases The sample size increases The spread of the distribution increases

aggregation

This process, in which multiple independent variables get summed together and this results in a normal distribution

True or False: Continuous data is always number data.

True

how is variance (S^2) different from the sum of squares?

Variance measures average error (or variability) whereas sum of squares measures total error (or error) Variance is the sum of squares divided by n-1

How is variance different from the sum of squares

Variance measures average variability whereas sum of squares measure total variability variance is the sum of squares divided by n-1 Variance accounts for sample size whereas sum of squares grows with sample size

What if, in addition to knowing Zelda's thumb length is 65.1 mm, we know also that the mean of the distribution of thumb lengths is 60.1 mm? What does this tell us that we didn't know before we knew the mean?

We now know that this thumb is longer than average and that's about it.

If a data point is very far away from the mean, what would you expect for the residual?

When farther away, the larger the absolute value of the residual

What is one thing we can do if we want to treat data from Likert-scale questions as numerical data?

You can add up responses across multiple Likert-scale questions You can take the average multiple Likert scale questions

Let's say you've calculated the sum of squares for hwy. What would the advantage be of dividing that number by n - 1 (i.e., dividing it by the df)?

You can use it to compare error across samples of different sizes.

If you want to find your z-score on the personality trait Agreeableness, what 3 pieces of information do you need?

Your Agreeableness score, the mean Agreeableness score, The sd for Agreeableness

An event

a subset of the sample space. In principle it could be one or more of possible outcomes in the sample space, but here we will focus primarily on elementary events which consist of exactly one possible outcome. For example, this could be obtaining heads in a single coin flip, rolling a 4 on a throw of the die, or taking 21 minutes to get home by the new route.

What is a standard deviation?

average distance of scores from the mean

A probability distribution

describes the probability of all of the possible outcomes in an experiment

theoretical probability distribution

expresses the likelihood that something will occur

An experiment

is any activity that produces or observes an outcome. Examples are flipping a coin, rolling a 6-sided die, or trying a new route to work to see if it's faster than the old route.

which of the following is most likely? p(A) p(A) * p(B) p(A) + p(B) p(A) * p(B) * p(C)

p(A) + p(B) add probabilities for union of events (A or B happening)

You want to know the probability that it will snow and you will win the lottery. How will you calculate this?

p(snow) * p(win) this will show you the intersection or overlap of the probabilities

what does the denominator mean in 𝑧𝑖=𝑌𝑖−𝑌¯/𝑠

sample standard deviation

What R code will output the standard deviation for hwy?

sd(mpg$hwy) sqrt(var(mpg$hwy)) favstats(~ hwy, data = mpg)

If you had to write one line of code to represent the Sum of Absolute Deviations (SAD), which would it be?

sum(abs(resid(TinyEmpty.model)))

If you had to write one line of code to represent the sum of squared deviations (SS), which would it be?

sum(resid(Empty.model)^2)

Which of these lines of R code do you think would give you SS?

sum(resid(Empty.model)^2)


Ensembles d'études connexes

PrepU Newborn Assessment (Ch 18)

View Set

Chapter 57: Care of Patients with Inflammatory Intestinal Disorders

View Set

Solving Systems of Linear Equations: Graphing

View Set

Chapter 6: Inventor and Cost of Goods Sold

View Set

Sociology Lesson 1: What is Sociology?

View Set