Stats W21 Midterm
equal if disjoint set theory
0 <= P(AB) <= P(AUB) <= P(A) + P(B)
Axioms of Probability
1) Chances are always at least zero. For any event A, P(A) >= 0 2) The chance that something happens is 100%. P(S) = 100%. 3) If two events cannot both occur at the same time (if they are disjoint or mutually exclusive), the chance that either one occurs is the sum of the chances that each occurs. If AB={}, P(AU) = P(A) + P(B)
Correlation coefficient (r)
= (X1*Y1 + X2*Y2 + ... + Xn*Yn)/n where X and Y are in standard units How nearly the data fall on a straight line (nonlinear curve -> bad summary of association; even if association is strong, if it is nonlinear, r can be small or 0). If two points are on a straight line, r=1. If two variables are perfectly correlated, doesn't mean there is a causal connection. (- slope line) - 1 < r < 1 (+ slope line) always, r=0 if data doesn't cluster along straight line.
SU
= (original value-mean)/SD or (X - E(x))/SE(x); List in standard units -> new mean is 0 and new SD is 1
Original value
= (value in SU)*SD + mean To find normal approximation: convert to SUs and find area under curve between those two points Secular trend is a linear association (trend) with time Not a good approximation if (1) Nonlinear association (2) Heteroschedastic (3) There are outliers
Expected value of Geometric Distribution
= 1/p
IQR
= 75th% - 25th% = middle 50%; resistant/insensitive to extremes IQR = 0 if at least half 3s in list are equal to zero
A & B are independent if P(AB) =
= P(A) * P(B)
P(AUB)
= P(A) + P(B) - P(AB)
Conditional probability of A given B
= P(A|B) = P(AB)/P(B) = P(B|A) * P(A)/(P(B|A)*(A) + P(B|Ac)) = (P(ABC)+ P(ABCc))/P(B)
Probability of Geometric Distribution
= P(X = x) = ((1-p)^x-1)*p
Point of averages (red square)
= a measure of the center of a scatterplot (mean(x), mean(y))
Vertical residual
= difference between the value of Y and the height of the regression line (measure Y) - (estimated Y)
Extrapolation
= estimating value of Y with value bigger/smaller than any observed
Expected value of Hypergeometric Distribution
= n(G/N)
Probability of binomial distribution
= nCk(p)^k(1-p)^(n-k)
Regressing Y on X -> (predicted Y in SU)
= r * (measured X in SU)
Regressing X on Y -> (predicted X in SU)
= r * (measured Y in SU)
Slope of regression line
= rSDu/SDx where |r| <= 1; the regression line for regressing Y on X is not as steep as the SD line of SDx and SDy are > =.
Standard deviation
= sqrt(((sum (x - mean))^2)/n) mean = 0 -> SD = RMS Sd = 0 -> all numbers in list =, IQR = 0, range = 0
Root mean square (rms)
= sqrt((sum of x^2)/(number of entries))
Standard error of Geometric Distribution
= sqrt(1-p)/p
Exhaust
A collection is exhaustive of A if every element of A is in at least one of the sets
Regression Line
Passes through the point of averages, line of which the rms of vertical residuals is the smallest Independent variable = variable that is regressed upon (x-axis); dependent variable = variable being regressed (y-axis)
Common Fallacies of Relevance
Positively Relevant Ad Hominem (personal attack) Bad Motive Tu Quoque (look who's talking) Two Wrongs Make a Right Ad Misericordium (appeal to pity) Ad Populum (bandwagon): it is moral because it is common, not everyone can be wrong Straw Man Red Herring Equivocation Ad Baculum
Equally likely outcomes
Probability assignments depend on the assertion that no particular outcome is preferred over any other by Nature. Probability of each outcome is 100%/(n possible outcomes). Relies on natural symmetries
Frequency theory
Probability is the limit of the relative frequency with which an event occurs in repeated trials; repeat enough under ideal conditions and the percentage it will occur will converge on a %
Types of data
Qualitative data = ordinal (hot, warm, cold) Qualitative data = discrete (countable, ex: annual number of sunny days) or continuous (no min spacing between the values, ex: temperature), categorical (gender, zip code, type of climate)
Geometric Distribution
The number of random draws with replacement from a 0-1 box until the first time a ticket labeled "1" is drawn is a random variable with a geometric distribution with parameter p=G/N, where G is the number of tickets labeled "1" in the box and N is the total number of tickets in the box
Inconsistency
A. Not A. "Nobody goes there anymore. That place is too crowded."
Bad Motive
Addresses motives of person to attack them
Positively Relevant
Adds weight to assertion
Normal Approximation
Approximates probability by the area under part of a special curve, the normal curve (in SU)
Expected value
As the number of successes within a range of EV decreases, percentage of successes within range of EV increases E(x) = x1*P(X = x1) + x2*P(X = x2) + x3 * P(X = x3) + ... EV of the sample sum of n random draws with or without replacement from a box of labeled tickets is n*(average of the labels on all the tickets in the box) (if the draws are without replacement, the number of draws cannot exceed the number of tickets in the box.) If the random variables X and Y are independent, then E(X * Y) = E(X) * E(Y)
Central Limit Theorem
Asserts normal approximations to the probability distributions of the sample sum and mean improve as # draws grows, no matter what #s are on the tickets. Normal curve approximates it well if the sample size is large and p=50% or is not too close to 0% or 100%. Accuracy does not depend on # tickets or mean or SD of tickets
Median
At least half the data are equal to or smaller than the median and equal to or larger than the median ***Choose the left number when it is an even amount of data*** Histogram: Median is where the area is split in half evenly; harder to skew, must corrupt half the data to make the median arbitrarily large or small. (ex: whether a country is affluent, typical salary at a job)
Straw Man
Attack the more vulnerable claim as if it refutes the original
Informal fallacies (error in reasoning)
Non sequitur of relevance: He says X is true. He does Y. Anyone who does Y is a bad person. Therefore, X is false. (If A then B. A. Therefore C). Non sequitur of evidence: All Ys are Zs. Mary says X is a Y. Therefore X is a Z. (Need to add if Mary says X is a Y. X is a Y)
P(AB) ? P(A)
P(AB) <= P(A)
If A is a subset of B, P(AB) = ? and P(AUB) = ?
P(AB) = P(A) P(AUB) = P(B)
SU for Random Variables
X - E(X)/SE(X)
Weak analogy
X is similar to y in some regards. Therefore, everything that is true for x is true for y.
Binomial Probability Histogram
area (in bins) is closest to the area under the normal curve when p is close to 50% and far from 0 and 100% Increase in sample size, normal approx is more accurate. Mean, SD of ticket #s do not influence how large a sample is needed. Skewness does.
Ad Hominem
attack person rather than reasoning
Partition
break up a complicated set without double countin
Fallacious
deductive reasoning that is incorrect
Valid
deductive reasoning that is mathematically correct; when premises are true, conclusion must be true
Red Herring
distraction from the real topic
Standard Error of Affline Transformation
does not dependon additive constant b -> if Y=aX+b, SE(Y) = |a|*SE(X)
Homoscedasticity
equal scatter
Heteroscedasticity
equal scatter depending on where take slice
Interpolation
estimating within actual range
Two Wrongs Make a Right
fine to do something because someone else did
Ad Populum
it is moral because it is common, not everyone can be wrong
Markov's Inequality for Random Variables
limits probability that a non random variable exceeds any multiples of EV a>0, P(X >= a) <= E(X)/a
Chebychev's Inequality for Random Variables
limits probability that a random variable differs from its EV by multiples of SE P(|X - E(X)| >= kSE(X)) <= 1/k^2
Combinations
nCk = n!/k!(n-k)!
Permutations
nPk = n!/(n-k)!
Tu Quoque
person is wrong because they are a hypocrite
Ad Misericordium
pleading with extenuating circumstances
Questionable cause
post hoc ergo propter hoc (after this, therefore because of this), giving coincidences special significance.
Independent
two events can occur in the same trial. The probability of their intersection is the product of their probabilities. The probability of their union is less than the sum of their probabilities, unless at least one of the events has probabilities, unless at least one of the events has probability zero.
Mutually exclusive
two events cannot both occur in the same trial. The probability of their intersection is zero. The probability of their union is the sum of their probabilities. One is incompatible with occurrence of the other. ex: P(A|B) is largest when A and B are mutually exclusive
Equivocation
use fact that word can have more than one meaning
Binomial distribution
with replacement Chance of success must be the same in every trial
Football-shaped graphs
work well with r and are summarized well by mean of x, mean of y, SD of x, SD of y
equation of regression line
y = r SDy/SDx(x) + [mean(Y) - rSDy/SDx (mean(x))] ***If the regression line was computed correctly, the point of averages of the residual plot will be on the x axis, and the residuals will not have a trend (horizontal line good): the correlation coefficient for the residuals and X will be zero. If the residuals have a trend and their average is not zero, then the slope of the regression line was computed incorrectly. A residual plot shows heteroscedasticity, nonlinear association, or outliers iff the original scatterplot does Special cases: r=0 -> line is horizontal and slope = 0; r=1 -> all points fall on a line with positive slope, regression = SD line
Cardinality
# of elements it contains
Probability of Hypergeometric Distribution
(G)C(k) * (N-G)C(n-k) / (N)C(n)
Probability of Negative Binomial Distribution
(k-1)C(r-1) * p^(k-r) * (1-p)^(k-r) * p = (k-1)C(r-1) * p^r * (1-p)^(k-r)
Standard error of Negative Binomial Distribution
(sqrt(r(1-p)))/p
Strategies for Counting
1) divide into smaller, non-overlapping subsets 2) divide by 2 for double counting 3) make a tree
P(not A) = ?
100% - P(A)
Estimating percentiles from histograms
25th = smallest number that is at least as large as 25% of data 50th = smallest number that is at least as large as half the data 75th = the smallest number that is at least as large as 75% of the data general: choose number at least as big as % given pth percentile: approximate point on horizontal axis such that area under to the left of the point is p%
Common formal fallacies
A or B. Therefore A. (It could be B). A or B. A. Therefore, not B (Affirming the Disjunct) Not both A and B are true. Not A. Therefore, B. (Denying the Conjunct; both can be false) If A then B. B. Therefore, A. (Affirming the Consequent) If A then B. Not A. Therefore, not B (Denying the Antecedent) If A then B. C. Therefore, B. (Nonsequitur of Evidence; C sounds like A) If A then B. Not C. Therefore, not A (Nonsequitur of Relevance; if B sounds like C) If A then B. A. Therefore, C. (Nonsequitur of Relevance; If C sounds like B) If A then B. Not B. Therefore, not C (Nonsequitur of Relevance)
Valid Rules of Reasoning
A or not A (Law of the Excluded Middle) Not (A and not A) A. Therefore, A or B. A. B. Therefore, A and B Not A. Therefore, not (A and B) A or B. Not A. Therefore, B. (Denying the Disjunct). Not (A and B. Therefore, (not A) or (not B). (de Morgan) Not (A or B). Therefore, (not A) and (not B). (de Morgan). If A then B. A. Therefore, B. (Affirming the Precedent). If A then B. Not B. Therefore, not A. (Denying the Consequent)
Hypergeometric Distribution
A random sample without replacement of size n from a population of N units. It gives for each k the chance that the sample sum of the labels on the tickets equals k, for a simple random sample of size n from a box of N tickets of which G are labeled "1" and the rest are labeled "0."
Histograms
Base = class interval Area = fraction of data Area of the bin = (fraction of data in the class interval) = (# observations in class interval) / (total # of observations) Height of bin = (relative frequency) / width of class interval OR (fraction of data in the class interval) / (width of class interval)
RMS error of residuals of Y against X = sqrt((1-r^2)(SD(Y))
Basically SD; it is the rms of vertical residuals from the regression line
The Graph of Averages
Divides a scatterplot into class intervals of the horizontal (x) variables and plots the averages of the Y values in those intervals against the midpoints of the intervals, not a line but a cluster of points
Expected value of binomial distribution
EV = np
The SD Line
Goes through the point of averages (a single point) Has slope equal to SDy/SDx if the correlation coefficient R is greater than or equal to zero; -SDy/SDx if r is negative When r>0 most values of Y are above SD line to the left and below SD line to the right When r<0 most values of Y are below SD to the left and above SD line to the right
Mean
Histogram: Mean is where the histogram would balance; Sum of data/ # data, smallest rms difference *Changing one datum can make the mean arbitrarily large or small* Ex: how much can a family afford to spend on housing
Mode
Histogram: highest bump; most common value (if all occur at once, all #s are the mode)
Inappropriate appeal to authority
If A then B. C. Therefore, B All animals with rabies go crazy. Jessie says my cat has rabies. Thus, my cat will go crazy.
Slippery slope.
If A then B. If B then C. If C then D, etc. Eventually, Z. So, you must prevent A.
Chebyshev's Inequality (for lists)
If the mean of a list of numbers is M and the standard deviation of the list is SD, then for every positive number k, [the fraction of numbers in the list that are k*SD or further from M] <= 1/k^2 Inside a range is at least (1-1/k^2); outside a range is at most (1/k^2) **Use whichever produces the smallest number (more restrictive) **
Markov's Inequality (for lists)
If the mean of a list of numbers is M, and the list contains no negative number then [fraction of numbers in the list that are greater than or equal to x] <= M/x (multiply by n for actual #)
Ad Baculum
If you do/don't do something, something bad will happen
Common Fallacies of Evidence
Inappropriate appeal to authority Appeal to ignorance False dichotomy Loaded question Questionable cause Slippery slope Hasty generalization Weak analogy Inconsistency
2 Types of Reasoning
Inductive: Requires correct deductive reasoning; inherently uncertain, generalize from experience Deductive (aka logic): thinking mathematically
False dichotomy
It starts with a premise that is an artificial "either-or". It is possible to do both.
Appeal to ignorance
Lack of evidence that a statement is false is not evidence that the statement is true.
Frequency Tables
Lists frequency (number) or relative frequency (fraction) of observations that fall in various class intervals based on a decided endpoint convention (usually include L boundary and exclude R)
Affine Transformations
Mode/Median/Mean is a*(__ of original) + b Range/SD = |a|*(__ of original) <- not affected by b IQR = a*(__ of original) if a > 0
Regression effect
Second score is less extreme than the first (students landing planes)
Skewness and modes
Skew left: mean < median Skew right: mean > median Unimodal: consists of only one "bump" (usually multimodal/bimodal)
Hasty generalization
Some x are (sometimes) A. Therefore, most x are (always) A. Sample could be biased.
Loaded question
Statements the presuppose something. Did you know that the sun goes around the Earth?
Residual plots (x1,e1), (x2,e2),...,(xn,en)
Tells us whether it is appropriate to use a linear regression, and whether the regression was computed correctly Vertical residual is e1=y1(ax1+b) where y1 is the measurement and (ax1+b) represents the residual y value Points above the regression line are >0 on the residual plot Residuals should average to zero and not have a trend if done correctly Easier to see heteroschedasticity, nonlinear association, and outliers on a residual plot than a scatterplot
Association
property of 2 or more variables (not the same as causation; scatter in x is smaller than SDx) (-) association: larger than average values of one variable have smaller than average values of the other
Expected value of Negative Binomial Distribution
r/p
Sound
reasoning is valid and based on true premises (valid & unsound = factually incorrect b/c one of the premises is false)
Standard error of binomial distribution
sqrt (n(p(1-p)))
Standard error of Hypergeometric Distribution
sqrt((N-n)/(N-1))*sqrt(n)*sqrt((G/N)*(1-(G/N))
Negative Binomial Distribution
the chance that it takes k draws to get a ticket labeled "1" the rth time