Stats 3470 Midterm 1 Content
null event
"empty set" A=0 or A{ } an impossible outcome
P(A)
"the chance of event A happens" proportion of times event A occurs in an infinite repetition of the random phenomenon of interest =(# of times A happens per n )(n) n=# of repetitions
breakdown of binomial random variable
(n over k)= n!/[ k! (n-k)! ] *μ=E(X)=np* *σ² of subscript x= V(X)=np(1-p)=npq* -->*q=1-p* *σ of subscript x= (npq)^(1/2)*
What issues/problems can arise if we try to generalize the results/conclusions based on convenience and voluntary response samples?
**Biased samples Convenience: ex. if you ask everyone around you while you're in stats class if they like math, it will make the result bias Voluntary: if people have the initiative to answer, that can create a bias. for example: if you're asking a survey on if we should protect dogs, people who care about dogs are the ones that want to answer and if people are indifferent they wont care to answer
P(A) for finite sample space
*=N(A)/N(S)= # of outcomes in A/ # of outcomes in S=.... which equals..... (n over k for the possible of A event )/ n over k for the possible of Sample space)*
expected value or mean value of the (distribution of)X
*E(X)="μ of x"=sum[x p(x)]* X=discrete random variables that can be the outcomes "so take a possible value of x and multiple by the probability of getting x then do this to all the values of x."
Addition Rule
*P(AUB)= P(AorB)= P(A) +P(B) -P(AnB)* (gets rid of the double counts of ones that count for both) *P(AUBUC)=P(A) +P(B) +P(C) -P(AnB) -P(AnC) -P(BnC) +P(AnBnC)*
The conditional probability of A given B has occurred
*P(A|B) = P(A n B)/P(B)* as long as P(B)>0
How to check if A & B are independent
*if P(AnB)=P(A)P(B)* or *f P(A|B)=(A)* or *if P(B|A)=P(B)* if any of these are NOT equal then they are NOT independent. so knowing one event does effect value of the other event
constraints on p for p(x;p) to be a legitimate pmf?
0<_p(x;p)<_1 p(0;p) + p(1;p) =1-p+p =1
Distribution of a random variable means:
1) all possible outcomes of X and (the range of X) 2) the probabilities associated with the outcomes
properties of a PMF
1. 0<_p(x)<_1 2. sum of all p(x)=1
How to use calculator to obtain prob. based on binomial distribution
1. 2nd key 2.VARS 3. scroll down to binompdf or binomcdf (pdf for P(X=a) cdf for (P(x<_a) pdf--> pmf as cdf-->cdf 4. enter in trials, p, x val, enter twice ex. .35 prob of making a free throw. Whats the prob of making 4/7 free throws?
what i need to know for expected value or mean value
1. all the possible values of X 2. corresponding probabilities (proportions)
Census
A survey that attempts to include the entire population
Sure event
A=S
Skewed distribution
An asymmetric frequency distribution in which there are some unusually high scores that distort the mean to be greater/less than than the median.
Range
the difference between the largest and smallest observations
Random Phenomenon
if we know what outcomes could happen, but not which particular values will happen
Binomial Random Variable
insert pic.#1
Events A and B are independent if
knowing that event B occurred does NOT change the value of P(A) & gives no info about A. using conditional prob. P(A|B)=P(AnB)/P(B) but since they are independent.... the "given B" does NOT matter so ... *P(A|B)=P(AnB)/P(B)=P(A)* *P(B|A)=P(AnB)/P(A)=P(B)* then using multiplication rule *P(A n B) = P(A)P(B)*
Properties of expectation
let a, b be constants & h(X) be some fn of X 1. E(a)=a 2. E(bX) =b*E(X) =b(μofx) 3. E(bX+a) =E(bX)+E(a) =bE(X)+a =bμofx +a =μofy ...y=bX+a 4. E[h(X)]= sum [h(x) *p(x)} ..... σ²=variance(X)= E[(x-μof x)²] .... σ= [variance (X)]^(1/2) "μ=mean"
Five number summary
minimum, Q1, median, Q3, maximum
Multimodel
more than two clear peaks
n & k in methods of counting
n=# different options k=# of how many can be chosen of these options
Mean
numerical average of all the observations; denoted x with - on top
Unimodel
one peak
Permutations vs. combination
ordered sequences vs. unordered sequence 123 not equal 321 vs. 123=321
random variable has a Poisson distribution if the pmf of X is
p(x;lambda)= (lambda^x)*e^(-lamda)/x! x=0,1,2,3,.... for some lambda>0 (Think of lambda as "expected # of occurences" per unit of time or per area)
line grapgh
shows how the values of the variable changes over time
Modified box plot
shows outliers of min and max (individual points beyond the tails ) often better choice than reg box plot
Binomial distribution
situation where we are interested in counting X=# of successes that occur in a seq of Bernoulli trials ex. series of coin flips
Distribution of a Variable
tells us (1) all possible values that a variable can take on (2) how frequently these values occur
variance
the average (sort of) of the square of the difference between each observation and the sample mean, also referred to as "squared deviation from the mean"; denoted s², in actuality we divide by n-1 rather than by n *s²=[sum(xi-μ)²]/n-1* μ=mean
standard deviation
the average distance between each observation and the sample mean and is denoted *s*.
Sample space (S)
the collection or set of all possible outcomes of a random phenomenon S={O1,O2,O3,O4,......Ok} k>or equal to 2
possible methods of counting
1. ordered w/out replacement: *P(k,n)* = "# of permutations of k obj. taken from a set of n distinct objs. " 2. ordered w/ replacement: *n^k* 3. unordered w/out replacement aka binoomial coefficient : *C(k,n)=(n over k)* = "# of combinations of size k objs. from a set of n distinct objs" 4. unordered w/ replacement: *(n+k-1 over k)*
Conditions of a binomial experiment that must be met:
1. sequence of "n" (Bernoulli) trials, where *n is fixed in advance* 2. each trial is identical & results in only *2 possible outcomes*. Success or Failure 3. *Trials are independent.* (knowledge of one trial does NOT influence the outcome of another trial) 4. *p=P(Success) is constant* from trial to trial
units of lambda
: # of occurences so if given #occurences /time=a then lambda=a(time)
Cumulative Distribution Function (CDF)
:prob. that a val. X will be less than or equal to val of x =*F(x) = P(X ≤ x)* Represents the sum of the probabilities of the outcomes up to and including a specific outcome.
Variable
A characteristic of an individual or object
Left skewed distribution
A density curve where the left side of the distribution extends in a long tail; (mean < median)
Right skewed distribution
A density curve where the right side of the distribution extends in a long tail; (mean > median)
Statistic
A number that describes a sample. The value is known when we have taken a sample, but it can vary from sample to sample. We often use to estimate an unknown parameter.
Parameter
A number that describes the population. A fixed number, but in practice is often unknown
The remaining outcome of a Bernuuli variable
Failure
proposition formula of CDF
For any two values, a<_b, *P(a<_X<_b)=P(X<_b)-P(X<a)=F(b-(a-)* "a-": largest possible of X that is strictly less than a.
Axes on Line Graph
Horizontal: time V: response variable (y)
Bayes' Theroem
Let A1, A2, A3 be mutually exclusive and exhaustive events with prior probabilities P(Ai),i = 1, 2, k. and B be any event for which P(B) > 0. !! Then the posterior probability of Aj given B has occurred is given by: *P(Aj|B) = P(Ai n B)/P(B) = P(Ai n B)/Sum P(Ai n B) = P(B|Aj)P(Aj)/Sum P(B|Ai)P(Ai)*
variance of X
Let X have a pmf p(x) & mean μ. If we left h(X) =(X-μ)² , then the variance of X is: *V(X) =E[h(X)] =E[(X-μ)²] =σ² =(Σ(xi - μ)²)*p(xi)*
Why is random sampling a better approach
Minimizes bias by selecting more representative samples Allows us to quantify sampling error (random variation)
Outliers
Observations that fall outside the overall pattern of a distribution
Bernoulli Family
Only two possible outcomes X=1=success & X=0=failure Suppose p(1)=P(X=1) & p(0)=P(X=0)=1-p Then we consider p a parameter & the collection of all such pmf's the Bernoulli family p(x;p)= 1-p ; if x=0 (failure) p ; if x=1 (success) 0 ; otherwise
Possible methods of counting
Ordered: 1. w/out replacement =*n!/(n-k)!* 2. w/ replacement: *n^k* Unordered: 1. w/out replacement: n chooses k= *n!/[k!(n-k)!]* 2. w/ replacement: (n+k-1) chooses k
Intersection of A and B
Outcomes common to both A or B but not exclusive to either denoted *A n B* (So even though in A, ONLY count if in B as well so works like "AND")
Multiplication Rule
P(A n B) = P(A|B)xP(B) = P(B|A)xP(A)*
general strategy for binomial probability
P(P(exactly k successes)= [nCk]*[p^k]*[(1-p)^(n-k)] =(#arrangement)*(p(succ))^(# of succ)*(p(fail))^(#of fails) ex.P(makes 2 of 3 free throws)= ("3C 2")(.9^2)(.1)^2 3C2= "3 chooses C" = *n!/[k!(n-k)!]* p(succ)=.9 p(fail).1
Random sampling
Sampling that is rooted in probability and as such is a fair method of choosing a sample that avoids systemic deviations from the population.
Convenience Samples
Selection of individuals who are easiest to reach
The outcome of interest in a Bernuuli variable
Success
Population
The entire group of individuals, objects, or units about which we want information
Modality
The hills of a distribution graph
Sample
The part of the population from which we actually collect information, used to draw conclusions about the whole
Law of Total Probability
The sum of the probabilities of all individual outcomes must equal 1. Mutually exclusive events such that A1 u A2 u A3 =S-------- then: *P(B) = Sum of P(B n Ai) = Sum of P(B|Ai)xP(Ai)* ( using multiplication rule)
There are always less ___ than there are ___ (Comb or Perms)
There are always less combinations than there are permutations
Short cut variance formula
V(X) = E(X²)-E(X²) = E(X²)-μ² μ=mean
Dichortome or Bernuuli Variable
a categorical variable that has only two possible responses or outcomes
Poisson distribution
a discrete probability distribution describing the likelihood of a particular number of independent events within a particular interval (using time interval) -similar to binomial distribution : counts succ & trials ("n") Succ=RARE
Symmetric distribution
a distribution in which the data values are uniformly distributed about the mean
range of Discrete random variables
a finite set of values or a countably infinite sequence of values (0,1,2,3....) ex. # of hearts in the 5 cards dealt..[0,....5}
A random variable
a function of the sample space, S, that assigns a number to each outcome in S. More formally, let X represent some random variable. Then X is def. as X: f(S)==>R
Probability Mass Function (PMF)
a mathematical relation that assigns probabilities to all possible outcomes for a discrete random variables
Complement of A rule
all outcomes NOT in A; denoted *A^c or A' = 1-P(A)*
range of Continuous random variable
all possible values in an interval that can include -infinity to infinity (-infinite,infinite) ex. # of times your prof. has to swing at a golf ball before hitting it. (1,..infinite)
Random phenomenon
an activity whose outcome cannot be predicted in advance w/ certainty can be numeric or not
Poisson Distribution Formula
b(x;n,p)=[(n over x)] [(p^x)] [(1-p)^(n-x)]-->p(x;lambda)= *[(lambda^x) 9e^(-lamda)]/x!* *let:* lambda=np & n--> infinity
Histograms
bar graphs where the values of the variables, divided into ranges (classes) of equal width go on the horizontal counts go on the vertical axis
mutually exclusive events
can't happen at the same time so AnB=o "If A, then not B" and "If B, then not A."
parameters
characterization of a distribution that can take on a variety of values
family of probably odistributions
collection of all probability distributions w/ the same fn form for the pmf but vary according to different values of a parameter
Event
collection of outcomes (a subset of the sample space); typically denoted A or B
Union of A and B
combines all outcomes in A and B denoted *A u B* (so even though not in A, count if its in B so works like "OR")
A simple random sample (SRS)
consists of "n" individuals from the population chosen in such a way that every possible collection of "n" individuals is equally likely to be selected
Two types of random variables
discrete & continuous
Think of binomial experiment as
drawing "n" times w/ replacement from a population that consists of proportion "p" =successes & of proportion "1-p" =failures
end of ch.1
end of ch.1
end of ch.2
end of ch.2
Box Plot
graphs the five-number summary the box reps Q1, median, Q3
General Product Rule
if a job consists of k separate tasks, the ith of which can be done in n ways, i = 1, 2, ... k, then the entire job can be done in n*n*...*n ways
Median
the middle number when the observations are ordered from smallest to largest; denoted m
Mode
the most frequent observation
Interquartile range (IQR)
the range of the middle 50% of the data
Sample proportion
the relative frequency of the occurrence of successes in a sample and is denoted by P with a ^ on top
Voluntary response samples
the sample that chooses itself by responding to a general appeal
when a & b are integers & X is a discrete & integer random variable
then! *P(a<_X<_b)=F(b)-F(a-1) P(X=a)+F(a)-F(a-1)* if i understand CDF then i will get this
Bimodel
two peaks
Disjoint
when A and B have nothing in common so * if disjoin then A n B = ZERO*
exhaustive events
when every element in S occurs in one of the events example: if you roll a die, you will get a result that is "1", "2", "3", "4", "5", or "6".
standard deviation (SD) of X
σ=(σ²)^(1/2)
proposition formula for variance
σ² subscript aX+b= V(aX+b)=a² *σ²subscript x --> σ² subscript aX+b= |a|σsubscript x (just took square root of top eq)