Applied Probability and Statistics C955
Box Plot
A graph that displays the highest and lowest quarters of data as whiskers, the middle two quarters of the data as a box, and the median
Composite Number
A number that is not prime. It has 2 or more positive factor, including 1 and itself.
Prime Number
A prime number is a number that has exactly two positive factors; 1 and itself.
Frequency Distribution
A record of the number of times data occurs withing a certain category
Probability Experiment
A situation which the probability is being examined
Right Skewed
Positively skewed, tail stretches to the right of the peak
Probability - Event
A collection of desired outcomes
Probability - Outcome
A possible result of an experiment
Unimodal
One mode
Explanatory Variable (Relationship between two variables)
(x) - presumed to possibly cause changes in the response variable; also known as the independent variable. A variable that helps explain or influences changes in a response variable. Otherwise known as an independent variable.
Response Variable (Relationship between two variables)
(y) - presumed to be affected by the explanatory variable; also known as the dependent variable. Variable affected by an explanatory variable. Otherwise knows as a dependent variable.
Law of Total Probability
If you multiply along each branch, the sum of all branches equals 1, or 100%.
Common Metric Conversions
1 L = 1000 mL 1kg = 1000 g 1 g = 1000 mg 1 mg = 1000 mcg
Conversions between Household and Metric Units
1 cc (Cubic Centimeter) = 1 mL 1 oz = 30 mL 1 L = 1.057 qt 1 tsp = 5 mL 1 kg = 2.2 lbs 1 oz = 28.35 g
Unit Conversions for Household Measures of Volume
1 tablespoon = 3 teaspoons 1 oz = 2 tablespoons 1 cup = 8 oz 1 pint = 2 cups 1 quart = 1 pints 1 gallon = 4 quarts
Sign rule for Multiplication and division
1. +# x +# = +# 2. -# x -# = +# A product or division of two numbers of the same sign will result in a positive number 3. -# x +# = -# 4. +# x -# = -# A product or division of two numbers of different signs will result in a negative number
Causation
A change in one variable creates a change in the other variable. Can be difficult to establish. Established by an experimental study. Association or correlation does not always causation. Lurking variables may not be included in the study, but affect the variables that were included.
Skewed Left
A skewed distribution with a tail that stretches left, toward the larger values. The long tail of the curve is on the negative side of the peak.
Skewed Right
A skewed distribution with a tail that stretches right, toward the larger values. The long tail of the curve is on the positive side of the peak.
Standard Deviation Rule
A standard proportion or percentage of data points that lie within each standard deviation away from the mean for a normal distribution. 68% percent of the data will fall within 1 standard deviation of the mean, 95% of the data will fall within 2 standard deviations of the mean, and 99.7% of the data will fall within 3 standard deviations of the mean.
Check Sheet
A structured form or table that allows data to be collected by marking how often an event has occurred in a certain interval.
Measure of Central Tendency
A summary measure that is used to describe an entire set of data with one value that represents the middle or center of the distribution. There are 3 main measures: mean, median, or mode.
Stem Plot (Stem and Leaf Plot)
A visual representation of data in which individual data points are plotted to the right of a vertical line, or chart, and the left (stem) shows the interval categories.
Adding Fractions with the Same Denominator
Add or subtract the numerators of all the fractions in the expression Keep the same denominator! (The temptation to add the two denominators is very strong—resist.) If necessary, reduce the answer.
Stratified Sample
All groups are chosen, only some people within each group are studied
Categorical Data
Also called qualitative data, consists of values that can be sorted in to groups or categories. Consists of data that are groups, such as names or labels, and are not necessarily numerical.
Probability - Fair
An experiment where all outcomes are equally likely.
Outlier
An observation point that is distant from other observations
Negative Correlation
As the explanatory variable increases, the response variable decreases
Positive Correlation
As the explanatory variable increases, the response variable increases
Law of Large Numbers (LLN)
As the number of trials of an experiment increases, the empirical probability gets closer to the true probability
Mean
Average, extreme values greatly influence the mean in the direction of the skew
Multimodial
Bimodial, A data set that has more than two modes
Grand Total
Bottom Right corner of table. The total size of the dataset
Two-way Frequency Table (Contingency Table)
C -> Q, rows show one variables categories, columns, columns show the other variable's categories
Centigrade/Fahrenheit Conversions
C = (F - 32) X 5/9 F = (C X 9/5) + 32
Complement Formula
Calculating the probability of something not happening. P(not A) = 1-P(A) P(at least one) = 1-P(none)
Graphical Displays of Two Variable Data
Categorical: Categorical, (C -> C), Two-way table Categorical: Quantitative, (C -> Q), Side-by-side Box Plot Quantitative: Quantitative, (Q -> Q), Scatterplot
Dividing Fractions
Change the division sign ( ÷ ) to a multiplication sign ( × ). Write the reciprocal of the second fraction. Multiply the numerators. Multiply the denominators. Write the answer in the form of a fraction. Reduce the fraction to the lowest terms, if necessary. *Mixed Number Fractions:* Change any mixed numbers to improper fractions. Change the division sign ( ÷ ) to a multiplication sign ( × ). Write the reciprocal of the second fraction. Multiply the numerators and denominators as usual. Change the improper fraction to a mixed number. Reduce the fraction to the lowest terms, if necessary.
Adding and Subtracting Mixed Number Fractions
Change the mixed numbers* to improper fractions*. Find the least common denominator (LCD) if the fractions have different denominators and convert to equivalent fractions with the LCD. Add or subtract the numerators of all the fractions in the expression. Keep the denominator the same. Change improper fractions to a mixed number (if needed). If necessary, reduce the fraction to lowest form. Subtract the following mixed numbers: 8 5/6 − 5 1/2 Step 1: Change the mixed number to an improper fraction. 8 5/6 = 53/6 and 5 1/2=11/2 Step 2: Find equivalent fractions with the least common multiple. Convert the fractions to equivalent fractions with the LCM. The least common multiple is 6 , therefore: 53/6 does not need to change Multiply the numerator and denominator of 11/2 by 3 53/6 − 11/2 = 53/6 − 33/6 Step 3: Subtract like fractions. Next, subtract the fractions. 53 / 6−336=206 Step 4: To complete the problem, convert any improper fractions to lowest terms. 20/6 = 3 2/6 Finally, reduce the fraction to its lowest form. 3 2/6 = 3 1/3
Overall Percentages
Computed by dividing each frequency by the grand total
Conditional Percentages
Computed by dividing each joint frequency by the corresponding explanatory variable marginal frequency
U-Shaped
Contains a valley rather than a peak
Probability - Disjoint Events
Contains no common outcomes, cannot happen simultaneously
Theoretical (Classical) Probability
Count the number of outcomes and divide by the total possible outcomes. Theoretical probability requires that the experiment be fair.
Butterfly Method
Cross Multiply, if a/b = c/d then a x d = b x c
Principle of Equality
If you perform equivalent operations to both sides of an equation, the result will always be an equivalent equation.
Bimodal
Data set with more than one mode
Distributive Property
Distribute the term outside the parentheses to each of the terms inside the parentheses a(b + c) = ab + ac
Transforming Fractions
Divide the least common denominator by the current denominator. Multiply both the numerator and the denominator by this integer. Let's now transform 1/3 and −2/7 each into their equivalent fractions that share a common denominator of 21 , which we found in the example above. First, convert 1/3 to an equivalent fraction with a denominator of 21 . Divide the LCD (the new denominator) by the current denominator. 21÷3=7 Multiply the numerator and denominators by 7 . (1×7)/(3×7)=721
Regression Equation
Equation modeling relationship between quantitative variables.
Probability - Complement
Everything not in the set
Butterfly Method in Algebraic Equations
Example Solve the following algebraic equation: 23/21 x j = 25/13 . Write the algebraic equation in the form ax/b=c/d Remember that a/b(x)=ax/b since x can be converted into x/1 . Now draw "butterfly wings" around the opposite terms. Multiply the numbers in each of the butterfly wings:
Adding Fractions with Different Denominators
Find the least common denominator (LCD). Use the LCD to find the equivalent fractions and rewrite the expression. Add or subtract the numerators of all the fractions in the expression. Keep the denominator the same—do not add them. If necessary, reduce the answer.
Getting and equation into Slope-Intercept Form
First, make sure the y term is on the left side of the equation. Put all x 's and constants on the right side of the equation. Multiply or divide to make the coefficient of y be 1 .
Empirical Rule
For the normal distribution (bell-shaped distribution), approximately 68% of the measurements are within one standard deviation of the mean, approximately 95% of the measurements are within two standard deviations of the mean, and approximately 99.7% of the measurements are within three standard deviations of the mean.
Discrete Data
Has distinct values, can be counted, has unconnected points (dots)
Continuous data
Has values within a range, measured (not counted) does not have gaps between data points. (connected lines or curves)
Steps for Combining Terms
Identify Like Terms Move Terms Next to Each other Add or Subtract
Changing Improper Fractions and Mixed Numbers
Improper fractions can be converted to mixed numbers by following these steps: Write division problem with numerator divided by denominator. Divide to determine quotient and remainder. Write mixed number with the quotient as the whole number and the remainder as the numerator over the same denominator.
Coefficient
In algebra, constants* often take the form of coefficients*. A coefficient is a number by which a variable is being multiplied. Coefficients are written in front of variables. So, in 16x , 16 is the coefficient and x is the variable. If a variable is without a number in front of it, the coefficient is 1 . Though it is not written, there is essentially an invisible 1 in front of any variable without a numerical coefficient.
Potential Problems with Simple Linear Equation
Inappropriate extrapolation: Trends do not continue indefinitely. Association is not causation: Watch for lurking variables. Not representative of the sample Small Sample Size
Symmetric
Left is roughly the same as the right
Correlation Coefficient (r)
Measures the strength of the linear relationship between variables. r is always between -1 and 1 The closer r is to 1, the stronger the positive linear correlation The closer r is to -1, the stronger the negative linear correlation. R=0 indicates no linear correlation, but that does not rule out non-linear relationships.
Measures of Spread
Measures used to describe the distance of data from the center of the dataset, such as range and standard deviation
Five-Number Summary (FNS)
Minimum, 1st quartile, median, 3rd quartile, and maximum
Changing Mixed Numbers Into Improper Fractions
Mixed numbers can also be converted to improper fractions by following these steps: Multiply the whole number by the denominator of the fraction. To the product given by step 1, add the number of the numerator. Write the result of step 2 as the numerator of the improper fraction. The denominator of the improper fraction should be the denominator of the original fraction. Simplify the improper fraction by diving the numerator and denominator by all common factors.
Simple Linear Equation
Models the data with a line. x is the explanatory variable y is the response variable Equation given is y = mx + b, where m is the slope and b is the y-intercept The sign of the slope (+ or -) matches the sign of r
Multiplying Fractions
Multiply the numerators to obtain a new numerator. Multiply the denominators to obtain a new denominator. Write the answer in fraction form and reduce it to the lowest terms, if necessary. Change any improper fractions to mixed numbers. *Mixed Number Fractions:* Change any mixed numbers to improper fractions. Multiply the numerators to obtain a new numerator. Multiply the denominators to obtain a new denominator and write the answer in fraction form. Change the improper fraction back to a mixed number. Reduce the mixed number to the lowest terms, if necessary.
Direction of Inequalities
Multiplying or dividing both sides by a negative number will cause the direction of an inequality to reverse.
Left Skewed
Negatively skewed, tail stretches to the left of the peak
Qualitative Data
Non-numeric information based on some quality or characteristic
Which measures of center and spread should you use?
Normal, symmetric data: Use the mean and standard deviation. Skewed: Use the median or IQR. Categorical: Use the mode and no measure of spread.
Multiples of a number
Numbers that can be obtained by multiplying the given number by 1, 2, 3, 4, etc.
Quantitative Data
Numerical data, consists of data values that are numerical, representing quantities that can be counted or measured.
Simpson's Paradox
Occurs when a result that appears in a group of data disappears when the groups are combined. Can only occur when the sizes of the groups are inconsistent.
Bias in Sampling
Occurs when the sample frame does not accurately represent the population
1.5 IQR Criterion Rule
Outliers are defined to be any points that are more than 1.5 × IQR above Q3 or below Q1 . IQR * 1.5 = Q3 + (IQR * 1.5) = Upper Outlier Q1 - (IQR * 1.5) = Lower Outlier
General Addition Rule
P( A or B) = P(A) + P(B) - P(A and B)
Probability Formulas for Disjoint Events
P(A and B) = 0 P(A or B) = P(A) + P(B) P(A|B) = P(B|A) = 0
General Multiplication Rule
P(A and B) = P(A) * P(B|A)
Probability Formulas for Independent Events
P(A and B) = P(A) x P(B) P(A or B) = P(A) + P(B) - [P(A) x P(B)] P(A|B) = P(A) P(B|A) = P(B)
Observational Study
Someone observes what is happening in a situation. There is no treatment, just observation
Empirical Probability (Relative Frequency)
Perform the experiment multiple times (trials), count the number of times the event occurs and divided by the total number of trials. Empirical probability does not require the experiment be fair.
Simple Random Sample
Participants are randomly chosen from the entire population
Probability Trees
Potential events are represented in a diagram with a branch for each possible outcome of the events. The probability of each outcome is indicated on the appropriate branch, and these values can be used to calculate the overall impact of risk occurrence in a project. Probabilities are placed on the corresponding branch. Probabilities of each outcome are found by multiplying along the path. Probabilities of events that include more than one outcome are found by adding the products from each path. Law of total probability. Sum of all probabilites = 1
Reducing Fractions Using Common Factors
The steps for the common factors method are as follows: Divide numerator and denominator by a common factor. Continue to divide by common factors. Write the reduced factor. -28/42 -28/42 = (-28/7) / (42/7) = -4/6 Reduce -4/6 = (-4/2)/(6/2) = -2/3
Association
Relationship between two or more variables. A scatter plot can show the pattern of relationship between quantitative variables. Can be established by an observational study.
Experimental Study
Researchers apply treatment to one group and no treatment to another control group. Causation can be determined in a well designed, controlled experiement
Voluntary Sample
Researchers invite everyone from the sampling frame to participate, those who respond are the sample.
Misrepresenting Data with Visualizations
Scale of Axis: The vertical scale should start at 0. Omitting Labels or Units: Leaves the size and categories unspecified. Using a 2-dimensional graph to represent a 1 dimensional measurement: When visualizing data that represents size (big circle vs small circle), our eyes see area. This distorts the true differences we are trying to illustrate.
Non-Linear Relationship
Scatterplot reveals a trend is not a straight line
No Correlation
Scatterplot reveals no trend between variables
Cluster Sample
Some groups are chosen, all people in the groups are studied
Uniform
Straight, all data appears to be equal
Steps for Solving an Equation With Complex Expressions
Substitute any variable's known value for the variable itself Simplify expressions on either side of the equation following order of operations: Distribute Combine Like terms Add and subtract constants Complete any other process that serves to simplify the expression Move terms across the equation, using the Addition and Subtraction Principles of Equality: Move all constants to one side of the equation Get all terms with the variable to be solved on the opposite side of the equation Simplify the expressions on either side of the equation: Combine like terms on one side, if necessary Add and subtract constants on the other side, if necessary Isolate the lone variable on one side of the equation, using the Multiplication and Division Principles of Equality: The variable will be across from its value Check your answer: Plug in your solution to the original equation. Perform the arithmetic on both sides of the equation. If the two sides of the equation are equal, you have successfully solved the equation!
Like Terms
Terms that have the same variable raised to the same power; they can be combined using addition and subtraction
Standard Deviation
The average distance each data point is from the mean
Interquartile Range (IQR)
The difference, in value, between the bottom and top 25% of the sample or population. Q3-Q1
Population (Sampling Method)
The group you want to study
Greatest Common Factor (GCF)
The larges number that divides all the given numbers evenly.
Least Common Denominator
The least common multiple of the denominators of two or more fractions. Let's determine the least common denominator of 13 and −27. Ask "do 3 and 7 have a factor in common?" No. So, this is situation 1. Multiply 3×7=21. 21 is the least common multiple of the numbers 3 and 7, therefore it is the least common denominator for 13 and −27. 21 is the LCD for 13 and −27
Sampling Frame (Sampling Method)
The list of people or things you pull the sample from
Second Quartile
The median of the data set
First Quartile
The median of the lower half of the data set
Third Quartile
The median of the upper half of the data set
Probability - Dependent Events
The occurrence of one event changes the probability of the occurrence of the other event. If A and B are dependent events, then P(A and B) <> P(A)•P(B)
Probability - Independent Events
The occurrence of one event does not affect the probability of the other. If A and B are independent events, then P(A and B) = P(A)•P(B)
Conditional Probability
The probability of event B happening give that A has already happened. P(B|A) P(B|A) = P(A and B) / P(A) Can determine independence. Events A and B are independent if either of the following are true: P(B|A) = P(B) P(A and B) = P(A) x P(B)
Slope of a line
The slope of a line is the ratio of the vertical change between two points on the line to the horizontal change between those two points. rise: (y2 - y1) / run: (x2 - x1)
Reducing Fractions Using Prime Factorization
The steps to reduce a fraction through prime factorization are as follows: List the prime factors of both the numerator and denominator. Cancel the factors that are common to both the numerator and denominator. Multiply across the numerator and denominator. 6/8 6/8 = 2x3/2x2x2x 2x3/2x2x2x = 3/2x2 3/2x2 = 3/4
Sample (Sampling Method)
The subset of the population that is being studied
Mode
The value that occurs most frequently in a given data set.
Probability - Sample Space
Universe! The set of all possible outcomes
Linear Interpolation
Used to predict data in simple linear equation. It is the predictions between two known data points
Linear Extrapolation
Used to predict data in simple linear equation. It is the predictions of the data outside of the known data, or data larger or smaller than the max/min known data points
Joint Frequencies
Values at the middle of the table, the amount of data falling in to both the corresponding row and column
Marginal Frequencies
Values on the right and bottom side of tables. Totals of the corresponding row or column
Quartiles
Values that divide a data set into four equal parts
Addition/Subtraction Principle
We can add or subtract the same number to both sides of an equation and the resulting expression remains equal.
Multiplication/Division Principle
We can multiply or divide the same number to both sides of an equation and the resulting expression remains equal. (Divide by 0 is not allowed)
When can two way tables be used?
When both the explanatory and response variables are categorical
Effect of Outliers
When far off the regression line, outliers weaken r. On a scatterplot, the closer the points are laid out in a line, the stronger the correlation
Distribution of Negative Numbers
When in doubt, change all subtraction operations to the addition of negative numbers.
Prime Factorization
Writing the number as a product of only prime numbers.
InterQuartile Range (IQR)
the difference between the first and third quartiles
Range
the difference between the highest and lowest scores in a distribution
Median
the middle score in a distribution; half the scores are above it and half are below it
Statistics
the science that deals with the collection, classification, analysis, and interpretation of numerical facts or data
Least Common Multiple (LCM)
the smallest positive number that can be divided by the given numbers
Slope-Intercept Equation
y = mx + b Where "m" is the slope, "b" is the y-intercept, and "x" and "y" follow the coordinate formula (x,y)
Point-Slope Form
y−y1=m(x−x1) , where m is the slope of the line and (x1,y1) are the coordinates of a known point. A line has a slope of 5 and passes through the point (3,7) . What is the equation of the line in slope-intercept form? To find the equation, we start with the point-slope formula: y−y1=m(x−x1) Next, we fill in the information that is known. Remember, m is the slope of the line and (x1,y1) are the coordinates of a known point (3,7) y−y1=m(x−x1)y−7=5(x−3) Finally, we can use algebra to manipulate the equation into slope-intercept form: y−7=5(x−3)y−7=5x−15y=5x−8 As you can see, starting with just a point and the slope of a line, we can express its linear equation in slope-intercept form.