Statistics 101: Principles of Statistics Study.com

Ace your homework & exams now with Quizwiz!

Find the expected value (the amount of money you expect to win per roll) when rolling 2 dice, losing $2 for rolling a 4 or 5, winning $1 for rolling a 2 or 3, and winning $3 for any other number.

-$2(7/36) + $1(3/36) + $3(26/36) = $1.86

Find the area that falls outside z = 1 and z = -1

1. Area between z = 1 and z = -1: 0.68268 2. 1 - 0.68268 = 0.31732

Disadvantages of convenience sampling

1. Data bias issues, both from self selection and researcher bias. 2. Parameter issues, such as creating a sample that doesn't reflect the larger population.

Determining the Area Outside Two Z-Scores

1. Find the area between the two z-scores. 2. Subtract the area from 1.

Characteristics & Example Determining the Area Between Two Z-Scores

1. Locate z-scores on 'Z-Scores and Normal Curve Areas' table 2. Subtract values by 0.500 if table measures from 0 (if area for 1 = 0.8413) 3. Subtract larger z-score area by smaller z-score area

Advantages of convenience sampling

1. Researcher has easy access to the sample. 2. Date collection can be quick and fast. 3. It requires fewer resources.

Calculating the Z-Score

1. Subtract the data point by the mean. 2. Divide the difference by the standard deviation. If result is negative it is to the left of the mean on the graph of normal distribution

A long jumper has a mean jump distance of 18 feet with a standard deviation of 2.5. Find the z-score for a jump distance of 23 feet

1. Subtract the data point by the mean: 23 - 18 = 5 2. Divide the difference by the standard deviation: 5 / 2.5 = 2 The jump is 2 standard deviations to the right of the mean

Characteristics of Binomial Experiments

1. The outcomes must be independent, so the probability (P) of each trial must be the same. 2. There must be only two possible outcomes. 3. There must be a fixed

Find the area that falls between z = 1 and z = -1

1. z = 1: 0.84134 z = -1: 0.15866 2. 0.84134 - 0.500 = 0.34134 0.15866 - 0.500 = -0.34134 3. 0.34134 - -0.34134 = 0.68268 The average whisker length of tabby cats is 4.4 inches with a standard deviation of 0.2. Determine the percentage of cats with a whisker length between 4.2 and 4.8 inches, Z-score for 4.2 inches: (4.2 - 4.4) / 0.2 = -1 Z-score for 4.8 inches: (4.8 - 4.4) / 0.2 = 2 -1.0 on table: 0.15866 2.0 on table: 0.97725 0.97725 - 0.15866 = 0.81859 = 81.9% = 82%

You're packing for vacation. You have 2 red shirts, 3 green shirts, 5 blue shirts, and 1 white shirt. If the first shirt you packed was red, what is the probability that the second shirt is red?

1/10

When two dice are rolled, what is the probability of rolling a sum of 10?

1/12

A coin is flipped twice. What is the probability of getting one heads and one tails?

1/2

If I roll a die and get a three, what is the probability of rolling a three the second time I roll the die?

1/6

On a d20, or 20-sided die, each side shows a number from 1-20. If I roll one three times in a row, what's the probability of rolling a 10 three times in a row?

1/8000

If you draw two cards at random from a standard deck of 52 cards (and don't replace the first one), what is the probability of drawing two spades?

156/2652 = 3/51 The probability of the first spade is 13/52. The probability of drawing a second spade after not replacing the first is 12/51. 13/52 * (12/51) = 156/2652 = 3/51

When drawing two cards at the same time from a standard deck of 52 cards, what is the probability of drawing a 10 and an ace?

2/13 The probability of a 10 is 4/52 or 1/13. The probability of an ace is also 4/52 or 1/13. These are mutually exclusive. Add these together and you get 2/13.

On a d8, or 8-sided die, each side shows a number from 1-8. If you roll two d8s, what is the probability that the sum of the numbers will be 7?

3/32 This is an example where you use the number of favorable outcomes over the total number of possible outcomes. Favorable outcomes for this event include dice 1 : dice 2 1 : 6 2 : 5 3 : 4 4 : 3 5 : 2 6 : 1

Among 100 people, 12 have blonde hair, 50 have red hair, 28 have brown hair, and 10 have black hair. If two people are selected at random, what is the probability that both will have red hair?

49/198

Find the probability of being dealt a five-card hand containing 5 red cards from a standard 52-card deck.

5 red cards out of 26 red cards in deck = 26! / 5!(26 - 5)! = 65,780 Possible outcomes: 2,598,960 Probability: 65,780/2,598,960 = 0.0253

Quartile

A block showing the cut-off points for a grouping of four sets of numbers in the data.

Histogram

A graphical representation of data very similar to a bar graph showing how many times certain numbers appear in the data. ''How many times'' is called the ''frequency.''

Spread in Data

A measure of how far from the middle of a data set the individual values are Examples: range, interquartile range, variance, and standard deviation

Sample

A part of the larger population that is meant to represent the whole. It is used in inferential statistics.

Box Plot

A representation of data on a number line showing quartiles, a minimum value, a maximum value, extreme values and a range of values.

Ratio

A type of measurement used in statistics to compare two numbers. It uses a colon - for example, 3:1.

Convenience sampling

A type of sampling where the samples are chosen based on how easy it is to obtain them.

Normal Distribution

Also known as Gaussian Distribution this is a continuous distribution of data. The graph takes the shape of a bell curve centered on the mean with both sides as mirror images of each other

Binomial Experiment

An experiment in which there are only two discrete outcomes: success or failure. The outcomes of individual trials are independent of one another.

Find the mode of the following data set: 10, 14, 15, 12, 10, 14, 19, 12, 14

Arranged in increasing order: 10, 10, 12, 12, 14, 14, 14, 15, 19 Mode = most frequent number = 14

Binomial Probability Equation

B(x(semi-colon) n, P) = nCx* Px (asterisk) (1 - P) n-x nCx = combination of n trials with x successes x = number of successes n = number of trials P = probability of success on individual trial

Calculate the probability of getting at least 12 heads when flipping a fair coin 40 times

Calculate the probability of getting at least 12 heads when flipping a fair coin 40 times, Mean = np = 40 x 0.5 = 20 Standard deviation = √(40 x 0.5 x (1 -0.5)) = 3.2 Z-score of 12: (12 - 20) / 3.2 = -2.5 -2.5 on table: 0.0062 1 - 0.0062 = 0.9938 = 99%

Nominal data

Categorical data which gives number values that equate to an attribute of a non-number. For example, in a study about types of animals, all puppies are given the code 7.

A population of tropical birds has a mean wing span of 18 inches with a standard deviation of 0.6. Find the wing span of a bird with a z-score of 2.4.

Complete the steps in reverse: 2. Divide the difference by the standard deviation: x / 0.6 = 2.4; x = 1.44 1. Subtract the data point by the mean: x - 18 = 1.44; x = 19.44 inches

In a game of blackjack, a person is dealt 2 cards: a 4 and a 9. Assuming aces count as 1, find the probability that the next card dealt makes the blackjack total greater than 21.

Current total: 13 13 + ace through 8 ≤ 21 (31 total cards because one 4 was already used: 31/50 chance) Any other card: 19/50 = 0.38 = 38% chance

In a game of blackjack, a person is dealt 2 cards: a 9 and a 10. Assuming aces count as 1, find the probability that the next card dealt makes the blackjack total greater than 21.

Current total: 9 + 10 = 19 19 + ace (1) or 19 + 2 ≤ 21 So, any ace or 2 would give a total of 21 or less. (8 total cards: 8/50 chance) Any other card: 42/50 = 0.84 = 84% chance

Continuous data

Data that can continually be divided without end and can be any value.

Discrete data

Data that is already at its smallest size and can't be further divided. It can only have certain specific values.

Categorical data

Data that is grouped by topic or category.

Quantitative data

Data that is measurable and that you can put in order.

T-distribution

Distribution to use when we don't know the standard deviation of the population mean and we have a small sample

Find the expected value (the amount of money you are expected to win or lose per roll) when rolling two dice, losing $2 for rolling a 3 or 4, and winning $3 for any other number.

E(X) = -$2(5/36) + $3(31/36) = $2.31

Find the probability of being dealt a five-card hand containing 4 jacks from a standard 52-card deck.

Favorable outcomes: 4 jacks in deck = 4! / 4!(4 - 4)! = 1 1 card out of 48 remaining cards = 48! / 1!(48 - 1)! = 48 1*48 = 48 Possible outcomes: 2,598,960 Probability: 48/2,598,960 = 1/54,145

Empirical Rule (68-95-99.7 Rule)

For normally distributed data: --68% of data within 1 standard deviation of the mean. --95% of data within 2 standard deviations of the mean. --99.7% of data within 3 standard deviations of the mean.

Variance

Found by subtracting each point from a data set by the mean squaring each answer and calculating the average. Square root of variance = standard Deviation

Graph of Normal Distribution

Graph using the normal distribution to approximate the binomial distribution

Find the mode of the following data set: {1, 2, 3, 4, 1, 2, 3, 1, 5, 7}

In increasing order: 1, 1, 1, 2, 2, 3, 3, 4, 5, 7 Mode = most frequent number = 1

Find the interquartile range for the following data set: {70, 68, 84, 85, 82, 81, 90, 92, 94, 72}

In increasing order: 68, 70, 72, 81, 82, 84, 85, 90, 92, 94 Median (quartile 2): 83 Median of 1st half (quartile 1): 72 Median of 2nd half (quartile 3): 90 90 - 72 = 18

Ordinal data

Information that can be ordered and ranked. For instance, 1st place, 2nd place, etc.

Interval measurement

Information that is evenly distributed into groups based on how each group differs from each other. For instance, dogs might be between 10-20 lbs, 21-30 lbs, etc.

Using the normal distribution calculate the probability of getting between 25 and 30 heads when flipping a fair coin 40 times

Mean = 40 * 0.5 = 20 Std. dev. = √(40*0.5* (1-0.5) = 3.2 Z-score of 25: (25-20)/3.2 = 1.56 Z-score of 30: (30-20)/3.2 = 3.13 1.56 on table: 0.941 3.13 on table: 0.999 0.999 - 0.941 = 0.058 = 5.8%

How to find the probability of compound events

Multiply the probabilities together

If your data analysis shows a difference in two sample groups at a level of P<0.20, can you conclude this difference exists in the entire population at the 10% confidence level?

No, there is not enough evidence that the difference exists.

Graph of Z-Scores

Normally distributed data Mean = 0 (if z-score = 0, then data point = mean) Standard deviation = 1

Formula for the probability of a single event

Number of favorable outcomes ____________________________________________ Number of total outcomes

Find the probability of being dealt two jacks in a row from a standard deck of 52 cards.

Number of jacks in the deck: 4 Probability of first jack: 4/52 Probability of second jack: 3/51 (4/52) * (3/51) = 12/2652 = 1/221

Stem and Leaf Display

One part of the value is shown on the left side of a display while another part is shown on the right. This type of display shows the actual data while also plotting the distribution of the data.

If your data plots as a linear normal probability with no outliers, which paired test is reasonable?

Paired t-test

Using the binomial probability formula, find the probability that 1 person voted out of 5 if each person had a 50% chance of voting.

Probability of zero voting: x=0, n=5, P=0.50 nCx = 5! / 0!(5-0)! = 1 B(x; n, P) = nCx * Px * (1-P)n-x = 1 * (0.50)0 * (1 - 0.50)5-0 = 0.0313 1 - 0.0313 = 0.9687 = 96.9% chance that 1 person out of 5 voted

P-value less than the significance level

Reject the null hypothesis

Find the probability of winning $2 on a roll of two dice if you lose $5 for rolling a 4, 5, or 6, and win $2 for rolling anything else.

Roll a 4: 1-3, 3-1, 2-2 (3); Roll a 5: 1-4, 4-1, 2-3, 3-2 (4); Roll a 6: 1-5, 5-1, 2-4, 4-2, 3-3 (5) (3 + 4 + 5) / 36 = 12/36 chance of losing $5 36 - 12 = 24/36 = 0.67 = 67% chance of winning $2

A paired sample from census data is used to determine if the average salary of Boston residents and California residents is different. What variable should be used?

Salary

The average number of stripes on a tabby cat's tail is 8 with a variance of 16. Determine the number of cats that will have fewer than 10 stripes in a population of 100

Standard deviation = square root of variance = 4 Z-score for 10 stripes: (10 - 8) / 4 = 0.5 0.5 on table: 0.69146 0.69146 x 100 = 69.146 = 69 cats

Characteristics of Normal Distribution Graph

Symmetrical bell curve. Centered on the mean, which is equal to the median and the mode. Width dependent on size of standard deviation (larger standard deviation = larger spread)

Bar Graph

The amounts (frequency) of different data groups shown on a graph. This type of plot is very similar to a histogram, except that the horizontal axis is not necessarily numerical.

A paired sample from census data is used to determine if the average salary of Boston residents and California residents is different. State the alternative hypothesis.

The average salary of Boston residents is not equal to that of California residents.

Population

The group from which data will be gathered.

Data

The information that is taken from the designated population being studied.

Hypothesis test decision criteria for the p-value

The null hypothesis is rejected if the pvalue is less than or equal to the desired significance.

Skewed Distribution

The peak in a graph is not centered, but is off to the left or right. Data on the side without the peak has fewer observations, and therefore lower values.

Hypothesis Testing Conditions

The population size is at least 20 times larger than the sample. The sampling method must be random. There are only two outcomes for each sample. There are at least 10 successes and 10 failures.

Hypothesis Testing

The procedure used in statistics to see whether a particular hypothesis is acceptable.

Requirements to use the z test for a difference between two proportions

The samples must be independent. n_1*p_1, n1*q_1, n_2*p_2 and n_2*q_2 must all be greater than or equal to 5.

Median

The value in the middle of an ordered set of numbers. Half of the numbers are above this value and half are below.

Quartile

The values that separate a data set into 4 groups 2nd quartile: the median of the data 1st quartile: the median of the first half of the data 3rd quartile: the median of the second half of the data

Z-Score

This describes the number of standard deviations a data point is from the mean. It's useful for quickly and accurately determining normal distribution probabilities

Descriptive statistics

This gives you information about data that is descriptive, such as mean, median, and mode.

Inferential statistics

This is information that infers from the data you have gathered. It uses samples to make assumptions about larger populations.

Probability Distribution Function or P(X)

This is used to assign a probability to all possible values of X. It is always a number between 0 and 1.

Binomial Probability Distribution

This is used to calculate probabilities of processes that have success and failure as the two possible outcomes. It's used frequently in real-world problem solving

Bar Graph: Uses

You can use this type of graph to assess the relative frequency of how often things happen, how many people participate in a certain thing, or a value range.

Type I error

You conclude that the null hypothesis is false, but it is actually true.

Type II error

You conclude that the null hypothesis is true, but it is false.

Continuous Probability Distribution

You use this process model when dealing with a number of possible outcomes that you cannot count. If you can count outcomes, you should use discrete probability distribution

The average tail length of tabby cats is 12.5 inches with a standard deviation of 0.5. Determine the number of tabby cats that will have a tailless than than 11 inches in a population of 1,000

Z-score for 11 inches: (11 - 12.5) / 0.5 = -3 -3.0 on table: 0.00135 0.00135 x 1000 = 1.35 = 1 cat

The average tail length of tabby cats is 12.5 inches with a standard deviation of 0.5. Determine the percentage of cats with a tail length greater than 13 inches

Z-score for 13 inches: (13 - 12.5) / 0.5 = 1 1.0 on table: 0.84134 Subtract from 1 to find percent greater (to the right): 1 - 0.84134 = 0.15866 = 15.9% = 16%

The average body length of tabby cats is 18 inches with a standard deviation of 1.0. Determine the percentage of cats with a body length between 16 and 19 inches.

Z-score for 16 inches: (16 - 18) / 1.0 = -2 Z-score for 19 inches: (19 - 18) / 1.0 = 1 -2.0 on table: 0.02275 1.0 on table: 0.84134 0.84134 - 0.02275 = 0.81859 = 81.9% = 82%

The average life span of tabby cats is 18 years with a standard deviation of 2. Determine the percentage of tabby cats that will live longer than 22 years

Z-score for 20 years: (22 - 18) / 2 = 2 2.0 on table: 0.97725 Subtract from 1 to find percent greater (to the right): 1 - 0.97725 = 0.02275 = 2.275%

Using the binomial probability formula, find the probability that 0 or 1 people voted out of 5 if each person had a 50% chance of voting.

add image "bin"

For what value of alpha should you reject the null hypothesis when the P-value is 0.06?

alpha = 0.10

How to use the normal distribution to approximate binomial distribution

n = number of trials, p = probability of success Mean = np Standard deviation = √(np(1-p)) To solve: calculate z-score(s), find value(s) on normal distribution table, calculate probability

Methods to determine acceptance or rejection of null hypothesis

p-value and region of acceptance

A sample of 300 US cyclists shows 150 are happy with the rules. Find p_hat (an estimate of the proportion of rule-happy US cyclists).

p_hat = 150 / 300 = 0.5

As the significance levels increase ______________.

the width of the confidence interval decreases (the null hypothesis is more likely to be rejected)

If set A = {2, 4, 6, 8, 10, 12} and set B = {3, 6, 9, 12}, find the union of A and B.

there are 6 favorable outcomes. There are 8 possible outcomes for dice 1 and 8 possible outcomes for dice 2. This gives us a total of 64 possible outcomes (8*8) Favorable outcomes / Total outcomes = 6 / 64 = 3 / 32

Using the binomial probability formula, find the probability that 2 people voted out of 5 if each person had a 50% chance of voting.

x = 2, n = 5, P = 0.50 nCx = 5! / 2!(5 - 2)! = 10 B(x; n, P) = nCx * Px * (1 - P)n - x = 10 * (0.50)2 * (1 - 0.50)5 - 2 = 0.3125 = 31.2%

Using the binomial probability formula, find the probability that x = 3 if P = 0.30 and n = 4.

x = 3, n = 4, P = 0.30 nCx = 4! / 3!(4 - 3)! = 4 B(x; n, P) = nCx * Px * (1 - P)n - x = 4 * (0.30)3 * (1 - 0.30)4 - 3 = 0.0756 = 7.6%

Find the probability of winning 4 out of 5 games of craps if there is a 0.493 chance of winning each game.

x = 4, n = 5, P = 0.493 nCx = 5! / 4!(5 - 4)! = 5 B(x; n, P) = nCx * Px * (1 - P)n - x = 5 * (0.493)4 * (1 - 0.493)5 - 4 = 0.1497 = 15%

If set A = {2, 4, 6, 8, 10, 12} and set B = {3, 6, 9, 12}, Find the intersection of A and B.

{6, 12}


Related study sets

Unit 1 Test STA 2023 McGraw Hill

View Set

Sociology - EXAM 2 - Study Guide

View Set

Chapter 7 Communication in Relationships

View Set