Stats Quiz 2
A sampling Problem
- consider a population of N elements of which - r elements are successes, and - N - r are failures - A sample of size n is selected from the population - suppose X is the number of successes in the sample - what is the distribution of X? - the answer depends on the method of sampling
Discrete random variables
a random variable if it can assume countable number of values. - either takes: - finitely many values, or - countable infinitely many values (i.e. its values can be enlisted in an infinite sequence)
more than
- P(X >12) = 1 - P(X<-12) = 1-binomcdf (20, .5, 12)
at most (binomial)
- P(X<-2) = binomcdf(10,1/6, 2)
less than
- P(X <-x) = binomcdf (n, p, x)
Properties: expectations and variance
- expectation is the center of the probability distribution of a random variable - variance and standard deviations are measures of spread of the probability distribution of a random variable - larger the variance/standard deviation, larger the spread (or dispersion) - expected value and standard deviation share the same time unit as the random variable, whereas the variance has the squared unit - effect of linear transformation: For any real numbers a and b, 1) E(aX +/-b)=aE(X) +/- b 2) o(aX+/-b) = /a/o(X) - the above properties are valid for continuous random variables as well
With Replacement Sampling
- in with replacement sampling the selected objects are returned to the population before each draw - hence the population size remains N throughout the sampling, and there is a possibility that the same object may be selected more than once - also P(success) = r/N remain fixed all through the sampling, and each draw is independent of other - so X, the number of successes in the sample follows binomial distribution with n and p= r/N
Without Replacement Sampling
- in without replacement sampling the selected objects are NOT returned to the population before each draw - hence the population size reduces after each draw. Any object can be selected at most once - Draw are not independent of each other - hence X, the number of successes in the sample follows hypergeometric distribution with parameters N, r, and n REMARK: if nothing is mentioned specifically then by "sampling" we usually mean "without replacement sampling"
Chebyshev's and Empirical Rules
- let X be a discrete random variable with pmf p(x), mean M, and standard deviation o - depending on the shape of p(x), the following probability statements can be made: P(M-o<X<M+o) CHEV: >-0 EMP: =.68 P(M-2o<X<M+2o) CHEV: >-.75 EMP: =.95 P(M-3o<X<M+3o) CHEV: >-.889 EMP: =.997 - Chebyshev's rule is for all type of probability distributions, but empirical rule is for unimodal symmetric mound-shaped probability distributions only
Characteristics of Hypergeometric Random Variable
- population contains N elements, r of which are successes and (N-r) of which are failures - the experiment consists of randomly drawing n elements without replacement from the population of N elements - the hypergeometric random variable X is the number of successes in the draw of n elements
Computing binomial probability on Calculator p(x)= P(X=x)= binompdf (n,p,x)
- support X is a binomial random variable with parameters n and p - To compute P(X=x) use binompdf(n,p,x) as follows: 2nd & Vars binompdf format commend: binompdf (n,p,x) - trials= enter the value n here -p = enter the value p here - x value= enter the value x here paste and enter
P(X<-x) binomcdf (n,p,x)
- suppose X is a binomial random variable with parameters n and p. To compute P(X<- x) use binomcdf (n,p,x) as follows: 2nd & Vars binomcdf( format command: bionomodf(n,p,x) -trials: enter the value n here p: enter the value of p here x value: enter the value of x here
Computing Poisson Probability Using Calculator
- suppose X is a poisson random variable with parameter /\. - To compute P(X=x) use poissonpdf(/\, x) as follows: 2nd & Vars poissonpdf( Format of command: poissonpdf(/\, x) /\: enter the value /\ here x value: enter the value x here paste and enter
Characteristics of a Poisson Random Variable
- the experiment consists of counting number of times an events occurs during a given unit of time or in a given area or volume (any unit of measurement) - the probability that an event occurs in a given unit of time, area, or volume is the same for all units - the number of events that occur in one unit to time, area, or volume is independent of the number the occur in any other mutually exclusive unit - the mean number of events in each unit is denoted by /\, a positive real number
Characteristics of a Binomial Experiment
- the experiment consists of n identical trials - there are only two possible outcomes on each trial - "success" and "failure" - the probability of success remains the same from trial to trail. Let P(success) = p, and P(failure) = q. Note that q=1-p - the trials are independent - the binomial random variable X is the number of successes in n trials - X assumes values 0,1,2,....n
Binomial distribution
- the pmf of binomial distribution is: p(x) = (n/x)p^xq^n-x, where - p(x)= P(X=x) = probability of x number of successes, - n= number of trials - x= number of successes in n trials (x=0, 1, ...n) - n-x= number of failures - p= P(success), and q=P(failure) = 1- p
remarks
- these two rules are valid for any type of random variables (not only discrete) as long as M and o are finite - In general, Chebyshev's rule can be written as P(M-ko<X,M+Ko) >- 1- (1/k^2) , where k is real number >-1
P(X<-x) = poissoncdf(/\,x)
-suppose X is a Poisson random variable with parameter /\. - To compute P(X<-x) use poisson (/\, x) as follows: 2nd & vars poissoncdf ( format of command: poissoncdf(/\,x) - /\: enter the value of /\ here - x value: enter the value x here Paste and enter
Computation using Calc
Copmute (20 10) -type first 20 -press MATH, and choose PROB - Select nCr - type 10. On screen it will show 20C10 - press enter
mean
M= E(X) = np
at least
P (X >- 12) = 1- P(X<-11) 1- binomcdf (20,.5,11)
between 2 and 4 (both inclusive)
P(2<-X<-4) = P(X<-4) - P(X<2) = poissoncdf(1.4, 4)- poissoncdf(1.4, 1)
Continuous random variables
a ramdon variable if it can assume uncountably many values. - for a continuous random variable X both of the following holds: - X can assume values corresponding to any of the points contained in one or more intervals - for any real number c, P(X=c)=0
Standard Deviation
Standard deviation of X is the positive square-root of the variance of X: o= +sqrt o^2
where to apply position distribution
examples: - number of customers arriving in 20 minutes - number of strikes per year in the U.S - number of defects per lot (group) of DVDs In general we use Poisson Distribution for - number of events that occur in an interval, - number of events per unit time, - number of events per unit length, - number of events per unit area, - number of events per unit space etc.
variance
for a discrete random variable X with pmf p(x) and expectation M, the variance of X is defined as : o^2= E(X-M)^2= sigma (x- M)^2p(x) - if X takes values x1, x2,...xn with probabilities p(x1), p(x2)...p(xn) respectively. Then o^2= (x1-M)^2p(x1)+...+(xn-M)^2p(xn)
expectation
for a discrete random variable X, the expected value of X (or expectation of X) is defined as the same of the terms value times probability M= E(X) = sigma xp(x) - if X takes values x1, x2.... xn with probabilities p(x1), p(x2), ...p(xn) respectively. then M=x1p(x1) + x2p(x2) +...+xnp(xn) - M represents the average value of X in large numbers of repeated experiment. So M is also called mean of X, or the mean of the probability distribution of X
probability distribution of a discrete random variable
is a graph, table, or formula that specifies the probability associated with each possible value the random variable can assume - for a discrete random variable X, we define probability mass function (pmf) as p(x) = P(X=x), for any real number x - any pmf p(x) satisfies the following two properties: - 0 <- p(x) <- 1, for any real number x. - sigma p(x)=1
Random Variable
is a numerical variable whose values are associated with outcomes of a random experiments, where one and only one numerical value is assigned to each sample point . - tha value taken by a random variable depends on chance. - we usually denote them by capital letters like X, Y, Z etc
Standard Deviation
o = sqrt npq
alternative way of computing variance
o^2= sigma x^2p(x)-M^2
Hypergeometric distribution
p(x)= (r/x)(n-r/n-x) / (N/n) M= nr/N; o^2= r(N-r)n(N-n)/ N^2(N-1)
Hypergeometric distribution
p(x)= probability of x successes, N= total number of elements, r= number of successes in the N elements, x= number of successes in the n selections, where max {0, n-(N-r)}<-x<-min{r, n}