Statistics 101: Principles of Statistics Study.com
Find the expected value (the amount of money you expect to win per roll) when rolling 2 dice, losing $2 for rolling a 4 or 5, winning $1 for rolling a 2 or 3, and winning $3 for any other number.
-$2(7/36) + $1(3/36) + $3(26/36) = $1.86
Find the area that falls outside z = 1 and z = -1
1. Area between z = 1 and z = -1: 0.68268 2. 1 - 0.68268 = 0.31732
Disadvantages of convenience sampling
1. Data bias issues, both from self selection and researcher bias. 2. Parameter issues, such as creating a sample that doesn't reflect the larger population.
Determining the Area Outside Two Z-Scores
1. Find the area between the two z-scores. 2. Subtract the area from 1.
Characteristics & Example Determining the Area Between Two Z-Scores
1. Locate z-scores on 'Z-Scores and Normal Curve Areas' table 2. Subtract values by 0.500 if table measures from 0 (if area for 1 = 0.8413) 3. Subtract larger z-score area by smaller z-score area
Advantages of convenience sampling
1. Researcher has easy access to the sample. 2. Date collection can be quick and fast. 3. It requires fewer resources.
Calculating the Z-Score
1. Subtract the data point by the mean. 2. Divide the difference by the standard deviation. If result is negative it is to the left of the mean on the graph of normal distribution
A long jumper has a mean jump distance of 18 feet with a standard deviation of 2.5. Find the z-score for a jump distance of 23 feet
1. Subtract the data point by the mean: 23 - 18 = 5 2. Divide the difference by the standard deviation: 5 / 2.5 = 2 The jump is 2 standard deviations to the right of the mean
Characteristics of Binomial Experiments
1. The outcomes must be independent, so the probability (P) of each trial must be the same. 2. There must be only two possible outcomes. 3. There must be a fixed
Find the area that falls between z = 1 and z = -1
1. z = 1: 0.84134 z = -1: 0.15866 2. 0.84134 - 0.500 = 0.34134 0.15866 - 0.500 = -0.34134 3. 0.34134 - -0.34134 = 0.68268 The average whisker length of tabby cats is 4.4 inches with a standard deviation of 0.2. Determine the percentage of cats with a whisker length between 4.2 and 4.8 inches, Z-score for 4.2 inches: (4.2 - 4.4) / 0.2 = -1 Z-score for 4.8 inches: (4.8 - 4.4) / 0.2 = 2 -1.0 on table: 0.15866 2.0 on table: 0.97725 0.97725 - 0.15866 = 0.81859 = 81.9% = 82%
You're packing for vacation. You have 2 red shirts, 3 green shirts, 5 blue shirts, and 1 white shirt. If the first shirt you packed was red, what is the probability that the second shirt is red?
1/10
When two dice are rolled, what is the probability of rolling a sum of 10?
1/12
A coin is flipped twice. What is the probability of getting one heads and one tails?
1/2
If I roll a die and get a three, what is the probability of rolling a three the second time I roll the die?
1/6
On a d20, or 20-sided die, each side shows a number from 1-20. If I roll one three times in a row, what's the probability of rolling a 10 three times in a row?
1/8000
If you draw two cards at random from a standard deck of 52 cards (and don't replace the first one), what is the probability of drawing two spades?
156/2652 = 3/51 The probability of the first spade is 13/52. The probability of drawing a second spade after not replacing the first is 12/51. 13/52 * (12/51) = 156/2652 = 3/51
When drawing two cards at the same time from a standard deck of 52 cards, what is the probability of drawing a 10 and an ace?
2/13 The probability of a 10 is 4/52 or 1/13. The probability of an ace is also 4/52 or 1/13. These are mutually exclusive. Add these together and you get 2/13.
On a d8, or 8-sided die, each side shows a number from 1-8. If you roll two d8s, what is the probability that the sum of the numbers will be 7?
3/32 This is an example where you use the number of favorable outcomes over the total number of possible outcomes. Favorable outcomes for this event include dice 1 : dice 2 1 : 6 2 : 5 3 : 4 4 : 3 5 : 2 6 : 1
Among 100 people, 12 have blonde hair, 50 have red hair, 28 have brown hair, and 10 have black hair. If two people are selected at random, what is the probability that both will have red hair?
49/198
Find the probability of being dealt a five-card hand containing 5 red cards from a standard 52-card deck.
5 red cards out of 26 red cards in deck = 26! / 5!(26 - 5)! = 65,780 Possible outcomes: 2,598,960 Probability: 65,780/2,598,960 = 0.0253
Quartile
A block showing the cut-off points for a grouping of four sets of numbers in the data.
Histogram
A graphical representation of data very similar to a bar graph showing how many times certain numbers appear in the data. ''How many times'' is called the ''frequency.''
Spread in Data
A measure of how far from the middle of a data set the individual values are Examples: range, interquartile range, variance, and standard deviation
Sample
A part of the larger population that is meant to represent the whole. It is used in inferential statistics.
Box Plot
A representation of data on a number line showing quartiles, a minimum value, a maximum value, extreme values and a range of values.
Ratio
A type of measurement used in statistics to compare two numbers. It uses a colon - for example, 3:1.
Convenience sampling
A type of sampling where the samples are chosen based on how easy it is to obtain them.
Normal Distribution
Also known as Gaussian Distribution this is a continuous distribution of data. The graph takes the shape of a bell curve centered on the mean with both sides as mirror images of each other
Binomial Experiment
An experiment in which there are only two discrete outcomes: success or failure. The outcomes of individual trials are independent of one another.
Find the mode of the following data set: 10, 14, 15, 12, 10, 14, 19, 12, 14
Arranged in increasing order: 10, 10, 12, 12, 14, 14, 14, 15, 19 Mode = most frequent number = 14
Binomial Probability Equation
B(x(semi-colon) n, P) = nCx* Px (asterisk) (1 - P) n-x nCx = combination of n trials with x successes x = number of successes n = number of trials P = probability of success on individual trial
Calculate the probability of getting at least 12 heads when flipping a fair coin 40 times
Calculate the probability of getting at least 12 heads when flipping a fair coin 40 times, Mean = np = 40 x 0.5 = 20 Standard deviation = √(40 x 0.5 x (1 -0.5)) = 3.2 Z-score of 12: (12 - 20) / 3.2 = -2.5 -2.5 on table: 0.0062 1 - 0.0062 = 0.9938 = 99%
Nominal data
Categorical data which gives number values that equate to an attribute of a non-number. For example, in a study about types of animals, all puppies are given the code 7.
A population of tropical birds has a mean wing span of 18 inches with a standard deviation of 0.6. Find the wing span of a bird with a z-score of 2.4.
Complete the steps in reverse: 2. Divide the difference by the standard deviation: x / 0.6 = 2.4; x = 1.44 1. Subtract the data point by the mean: x - 18 = 1.44; x = 19.44 inches
In a game of blackjack, a person is dealt 2 cards: a 4 and a 9. Assuming aces count as 1, find the probability that the next card dealt makes the blackjack total greater than 21.
Current total: 13 13 + ace through 8 ≤ 21 (31 total cards because one 4 was already used: 31/50 chance) Any other card: 19/50 = 0.38 = 38% chance
In a game of blackjack, a person is dealt 2 cards: a 9 and a 10. Assuming aces count as 1, find the probability that the next card dealt makes the blackjack total greater than 21.
Current total: 9 + 10 = 19 19 + ace (1) or 19 + 2 ≤ 21 So, any ace or 2 would give a total of 21 or less. (8 total cards: 8/50 chance) Any other card: 42/50 = 0.84 = 84% chance
Continuous data
Data that can continually be divided without end and can be any value.
Discrete data
Data that is already at its smallest size and can't be further divided. It can only have certain specific values.
Categorical data
Data that is grouped by topic or category.
Quantitative data
Data that is measurable and that you can put in order.
T-distribution
Distribution to use when we don't know the standard deviation of the population mean and we have a small sample
Find the expected value (the amount of money you are expected to win or lose per roll) when rolling two dice, losing $2 for rolling a 3 or 4, and winning $3 for any other number.
E(X) = -$2(5/36) + $3(31/36) = $2.31
Find the probability of being dealt a five-card hand containing 4 jacks from a standard 52-card deck.
Favorable outcomes: 4 jacks in deck = 4! / 4!(4 - 4)! = 1 1 card out of 48 remaining cards = 48! / 1!(48 - 1)! = 48 1*48 = 48 Possible outcomes: 2,598,960 Probability: 48/2,598,960 = 1/54,145
Empirical Rule (68-95-99.7 Rule)
For normally distributed data: --68% of data within 1 standard deviation of the mean. --95% of data within 2 standard deviations of the mean. --99.7% of data within 3 standard deviations of the mean.
Variance
Found by subtracting each point from a data set by the mean squaring each answer and calculating the average. Square root of variance = standard Deviation
Graph of Normal Distribution
Graph using the normal distribution to approximate the binomial distribution
Find the mode of the following data set: {1, 2, 3, 4, 1, 2, 3, 1, 5, 7}
In increasing order: 1, 1, 1, 2, 2, 3, 3, 4, 5, 7 Mode = most frequent number = 1
Find the interquartile range for the following data set: {70, 68, 84, 85, 82, 81, 90, 92, 94, 72}
In increasing order: 68, 70, 72, 81, 82, 84, 85, 90, 92, 94 Median (quartile 2): 83 Median of 1st half (quartile 1): 72 Median of 2nd half (quartile 3): 90 90 - 72 = 18
Ordinal data
Information that can be ordered and ranked. For instance, 1st place, 2nd place, etc.
Interval measurement
Information that is evenly distributed into groups based on how each group differs from each other. For instance, dogs might be between 10-20 lbs, 21-30 lbs, etc.
Using the normal distribution calculate the probability of getting between 25 and 30 heads when flipping a fair coin 40 times
Mean = 40 * 0.5 = 20 Std. dev. = √(40*0.5* (1-0.5) = 3.2 Z-score of 25: (25-20)/3.2 = 1.56 Z-score of 30: (30-20)/3.2 = 3.13 1.56 on table: 0.941 3.13 on table: 0.999 0.999 - 0.941 = 0.058 = 5.8%
How to find the probability of compound events
Multiply the probabilities together
If your data analysis shows a difference in two sample groups at a level of P<0.20, can you conclude this difference exists in the entire population at the 10% confidence level?
No, there is not enough evidence that the difference exists.
Graph of Z-Scores
Normally distributed data Mean = 0 (if z-score = 0, then data point = mean) Standard deviation = 1
Formula for the probability of a single event
Number of favorable outcomes ____________________________________________ Number of total outcomes
Find the probability of being dealt two jacks in a row from a standard deck of 52 cards.
Number of jacks in the deck: 4 Probability of first jack: 4/52 Probability of second jack: 3/51 (4/52) * (3/51) = 12/2652 = 1/221
Stem and Leaf Display
One part of the value is shown on the left side of a display while another part is shown on the right. This type of display shows the actual data while also plotting the distribution of the data.
If your data plots as a linear normal probability with no outliers, which paired test is reasonable?
Paired t-test
Using the binomial probability formula, find the probability that 1 person voted out of 5 if each person had a 50% chance of voting.
Probability of zero voting: x=0, n=5, P=0.50 nCx = 5! / 0!(5-0)! = 1 B(x; n, P) = nCx * Px * (1-P)n-x = 1 * (0.50)0 * (1 - 0.50)5-0 = 0.0313 1 - 0.0313 = 0.9687 = 96.9% chance that 1 person out of 5 voted
P-value less than the significance level
Reject the null hypothesis
Find the probability of winning $2 on a roll of two dice if you lose $5 for rolling a 4, 5, or 6, and win $2 for rolling anything else.
Roll a 4: 1-3, 3-1, 2-2 (3); Roll a 5: 1-4, 4-1, 2-3, 3-2 (4); Roll a 6: 1-5, 5-1, 2-4, 4-2, 3-3 (5) (3 + 4 + 5) / 36 = 12/36 chance of losing $5 36 - 12 = 24/36 = 0.67 = 67% chance of winning $2
A paired sample from census data is used to determine if the average salary of Boston residents and California residents is different. What variable should be used?
Salary
The average number of stripes on a tabby cat's tail is 8 with a variance of 16. Determine the number of cats that will have fewer than 10 stripes in a population of 100
Standard deviation = square root of variance = 4 Z-score for 10 stripes: (10 - 8) / 4 = 0.5 0.5 on table: 0.69146 0.69146 x 100 = 69.146 = 69 cats
Characteristics of Normal Distribution Graph
Symmetrical bell curve. Centered on the mean, which is equal to the median and the mode. Width dependent on size of standard deviation (larger standard deviation = larger spread)
Bar Graph
The amounts (frequency) of different data groups shown on a graph. This type of plot is very similar to a histogram, except that the horizontal axis is not necessarily numerical.
A paired sample from census data is used to determine if the average salary of Boston residents and California residents is different. State the alternative hypothesis.
The average salary of Boston residents is not equal to that of California residents.
Population
The group from which data will be gathered.
Data
The information that is taken from the designated population being studied.
Hypothesis test decision criteria for the p-value
The null hypothesis is rejected if the pvalue is less than or equal to the desired significance.
Skewed Distribution
The peak in a graph is not centered, but is off to the left or right. Data on the side without the peak has fewer observations, and therefore lower values.
Hypothesis Testing Conditions
The population size is at least 20 times larger than the sample. The sampling method must be random. There are only two outcomes for each sample. There are at least 10 successes and 10 failures.
Hypothesis Testing
The procedure used in statistics to see whether a particular hypothesis is acceptable.
Requirements to use the z test for a difference between two proportions
The samples must be independent. n_1*p_1, n1*q_1, n_2*p_2 and n_2*q_2 must all be greater than or equal to 5.
Median
The value in the middle of an ordered set of numbers. Half of the numbers are above this value and half are below.
Quartile
The values that separate a data set into 4 groups 2nd quartile: the median of the data 1st quartile: the median of the first half of the data 3rd quartile: the median of the second half of the data
Z-Score
This describes the number of standard deviations a data point is from the mean. It's useful for quickly and accurately determining normal distribution probabilities
Descriptive statistics
This gives you information about data that is descriptive, such as mean, median, and mode.
Inferential statistics
This is information that infers from the data you have gathered. It uses samples to make assumptions about larger populations.
Probability Distribution Function or P(X)
This is used to assign a probability to all possible values of X. It is always a number between 0 and 1.
Binomial Probability Distribution
This is used to calculate probabilities of processes that have success and failure as the two possible outcomes. It's used frequently in real-world problem solving
Bar Graph: Uses
You can use this type of graph to assess the relative frequency of how often things happen, how many people participate in a certain thing, or a value range.
Type I error
You conclude that the null hypothesis is false, but it is actually true.
Type II error
You conclude that the null hypothesis is true, but it is false.
Continuous Probability Distribution
You use this process model when dealing with a number of possible outcomes that you cannot count. If you can count outcomes, you should use discrete probability distribution
The average tail length of tabby cats is 12.5 inches with a standard deviation of 0.5. Determine the number of tabby cats that will have a tailless than than 11 inches in a population of 1,000
Z-score for 11 inches: (11 - 12.5) / 0.5 = -3 -3.0 on table: 0.00135 0.00135 x 1000 = 1.35 = 1 cat
The average tail length of tabby cats is 12.5 inches with a standard deviation of 0.5. Determine the percentage of cats with a tail length greater than 13 inches
Z-score for 13 inches: (13 - 12.5) / 0.5 = 1 1.0 on table: 0.84134 Subtract from 1 to find percent greater (to the right): 1 - 0.84134 = 0.15866 = 15.9% = 16%
The average body length of tabby cats is 18 inches with a standard deviation of 1.0. Determine the percentage of cats with a body length between 16 and 19 inches.
Z-score for 16 inches: (16 - 18) / 1.0 = -2 Z-score for 19 inches: (19 - 18) / 1.0 = 1 -2.0 on table: 0.02275 1.0 on table: 0.84134 0.84134 - 0.02275 = 0.81859 = 81.9% = 82%
The average life span of tabby cats is 18 years with a standard deviation of 2. Determine the percentage of tabby cats that will live longer than 22 years
Z-score for 20 years: (22 - 18) / 2 = 2 2.0 on table: 0.97725 Subtract from 1 to find percent greater (to the right): 1 - 0.97725 = 0.02275 = 2.275%
Using the binomial probability formula, find the probability that 0 or 1 people voted out of 5 if each person had a 50% chance of voting.
add image "bin"
For what value of alpha should you reject the null hypothesis when the P-value is 0.06?
alpha = 0.10
How to use the normal distribution to approximate binomial distribution
n = number of trials, p = probability of success Mean = np Standard deviation = √(np(1-p)) To solve: calculate z-score(s), find value(s) on normal distribution table, calculate probability
Methods to determine acceptance or rejection of null hypothesis
p-value and region of acceptance
A sample of 300 US cyclists shows 150 are happy with the rules. Find p_hat (an estimate of the proportion of rule-happy US cyclists).
p_hat = 150 / 300 = 0.5
As the significance levels increase ______________.
the width of the confidence interval decreases (the null hypothesis is more likely to be rejected)
If set A = {2, 4, 6, 8, 10, 12} and set B = {3, 6, 9, 12}, find the union of A and B.
there are 6 favorable outcomes. There are 8 possible outcomes for dice 1 and 8 possible outcomes for dice 2. This gives us a total of 64 possible outcomes (8*8) Favorable outcomes / Total outcomes = 6 / 64 = 3 / 32
Using the binomial probability formula, find the probability that 2 people voted out of 5 if each person had a 50% chance of voting.
x = 2, n = 5, P = 0.50 nCx = 5! / 2!(5 - 2)! = 10 B(x; n, P) = nCx * Px * (1 - P)n - x = 10 * (0.50)2 * (1 - 0.50)5 - 2 = 0.3125 = 31.2%
Using the binomial probability formula, find the probability that x = 3 if P = 0.30 and n = 4.
x = 3, n = 4, P = 0.30 nCx = 4! / 3!(4 - 3)! = 4 B(x; n, P) = nCx * Px * (1 - P)n - x = 4 * (0.30)3 * (1 - 0.30)4 - 3 = 0.0756 = 7.6%
Find the probability of winning 4 out of 5 games of craps if there is a 0.493 chance of winning each game.
x = 4, n = 5, P = 0.493 nCx = 5! / 4!(5 - 4)! = 5 B(x; n, P) = nCx * Px * (1 - P)n - x = 5 * (0.493)4 * (1 - 0.493)5 - 4 = 0.1497 = 15%
If set A = {2, 4, 6, 8, 10, 12} and set B = {3, 6, 9, 12}, Find the intersection of A and B.
{6, 12}
