Stats 3470 Midterm 1 Content

¡Supera tus tareas y exámenes ahora con Quizwiz!

null event

"empty set" A=0 or A{ } an impossible outcome

P(A)

"the chance of event A happens" proportion of times event A occurs in an infinite repetition of the random phenomenon of interest =(# of times A happens per n )(n) n=# of repetitions

breakdown of binomial random variable

(n over k)= n!/[ k! (n-k)! ] *μ=E(X)=np* *σ² of subscript x= V(X)=np(1-p)=npq* -->*q=1-p* *σ of subscript x= (npq)^(1/2)*

What issues/problems can arise if we try to generalize the results/conclusions based on convenience and voluntary response samples?

**Biased samples Convenience: ex. if you ask everyone around you while you're in stats class if they like math, it will make the result bias Voluntary: if people have the initiative to answer, that can create a bias. for example: if you're asking a survey on if we should protect dogs, people who care about dogs are the ones that want to answer and if people are indifferent they wont care to answer

P(A) for finite sample space

*=N(A)/N(S)= # of outcomes in A/ # of outcomes in S=.... which equals..... (n over k for the possible of A event )/ n over k for the possible of Sample space)*

expected value or mean value of the (distribution of)X

*E(X)="μ of x"=sum[x p(x)]* X=discrete random variables that can be the outcomes "so take a possible value of x and multiple by the probability of getting x then do this to all the values of x."

Addition Rule

*P(AUB)= P(AorB)= P(A) +P(B) -P(AnB)* (gets rid of the double counts of ones that count for both) *P(AUBUC)=P(A) +P(B) +P(C) -P(AnB) -P(AnC) -P(BnC) +P(AnBnC)*

The conditional probability of A given B has occurred

*P(A|B) = P(A n B)/P(B)* as long as P(B)>0

How to check if A & B are independent

*if P(AnB)=P(A)P(B)* or *f P(A|B)=(A)* or *if P(B|A)=P(B)* if any of these are NOT equal then they are NOT independent. so knowing one event does effect value of the other event

constraints on p for p(x;p) to be a legitimate pmf?

0<_p(x;p)<_1 p(0;p) + p(1;p) =1-p+p =1

Distribution of a random variable means:

1) all possible outcomes of X and (the range of X) 2) the probabilities associated with the outcomes

properties of a PMF

1. 0<_p(x)<_1 2. sum of all p(x)=1

How to use calculator to obtain prob. based on binomial distribution

1. 2nd key 2.VARS 3. scroll down to binompdf or binomcdf (pdf for P(X=a) cdf for (P(x<_a) pdf--> pmf as cdf-->cdf 4. enter in trials, p, x val, enter twice ex. .35 prob of making a free throw. Whats the prob of making 4/7 free throws?

what i need to know for expected value or mean value

1. all the possible values of X 2. corresponding probabilities (proportions)

Census

A survey that attempts to include the entire population

Sure event

A=S

Skewed distribution

An asymmetric frequency distribution in which there are some unusually high scores that distort the mean to be greater/less than than the median.

Range

the difference between the largest and smallest observations

Random Phenomenon

if we know what outcomes could happen, but not which particular values will happen

Binomial Random Variable

insert pic.#1

Events A and B are independent if

knowing that event B occurred does NOT change the value of P(A) & gives no info about A. using conditional prob. P(A|B)=P(AnB)/P(B) but since they are independent.... the "given B" does NOT matter so ... *P(A|B)=P(AnB)/P(B)=P(A)* *P(B|A)=P(AnB)/P(A)=P(B)* then using multiplication rule *P(A n B) = P(A)P(B)*

Properties of expectation

let a, b be constants & h(X) be some fn of X 1. E(a)=a 2. E(bX) =b*E(X) =b(μofx) 3. E(bX+a) =E(bX)+E(a) =bE(X)+a =bμofx +a =μofy ...y=bX+a 4. E[h(X)]= sum [h(x) *p(x)} ..... σ²=variance(X)= E[(x-μof x)²] .... σ= [variance (X)]^(1/2) "μ=mean"

Five number summary

minimum, Q1, median, Q3, maximum

Multimodel

more than two clear peaks

n & k in methods of counting

n=# different options k=# of how many can be chosen of these options

Mean

numerical average of all the observations; denoted x with - on top

Unimodel

one peak

Permutations vs. combination

ordered sequences vs. unordered sequence 123 not equal 321 vs. 123=321

random variable has a Poisson distribution if the pmf of X is

p(x;lambda)= (lambda^x)*e^(-lamda)/x! x=0,1,2,3,.... for some lambda>0 (Think of lambda as "expected # of occurences" per unit of time or per area)

line grapgh

shows how the values of the variable changes over time

Modified box plot

shows outliers of min and max (individual points beyond the tails ) often better choice than reg box plot

Binomial distribution

situation where we are interested in counting X=# of successes that occur in a seq of Bernoulli trials ex. series of coin flips

Distribution of a Variable

tells us (1) all possible values that a variable can take on (2) how frequently these values occur

variance

the average (sort of) of the square of the difference between each observation and the sample mean, also referred to as "squared deviation from the mean"; denoted s², in actuality we divide by n-1 rather than by n *s²=[sum(xi-μ)²]/n-1* μ=mean

standard deviation

the average distance between each observation and the sample mean and is denoted *s*.

Sample space (S)

the collection or set of all possible outcomes of a random phenomenon S={O1,O2,O3,O4,......Ok} k>or equal to 2

possible methods of counting

1. ordered w/out replacement: *P(k,n)* = "# of permutations of k obj. taken from a set of n distinct objs. " 2. ordered w/ replacement: *n^k* 3. unordered w/out replacement aka binoomial coefficient : *C(k,n)=(n over k)* = "# of combinations of size k objs. from a set of n distinct objs" 4. unordered w/ replacement: *(n+k-1 over k)*

Conditions of a binomial experiment that must be met:

1. sequence of "n" (Bernoulli) trials, where *n is fixed in advance* 2. each trial is identical & results in only *2 possible outcomes*. Success or Failure 3. *Trials are independent.* (knowledge of one trial does NOT influence the outcome of another trial) 4. *p=P(Success) is constant* from trial to trial

units of lambda

: # of occurences so if given #occurences /time=a then lambda=a(time)

Cumulative Distribution Function (CDF)

:prob. that a val. X will be less than or equal to val of x =*F(x) = P(X ≤ x)* Represents the sum of the probabilities of the outcomes up to and including a specific outcome.

Variable

A characteristic of an individual or object

Left skewed distribution

A density curve where the left side of the distribution extends in a long tail; (mean < median)

Right skewed distribution

A density curve where the right side of the distribution extends in a long tail; (mean > median)

Statistic

A number that describes a sample. The value is known when we have taken a sample, but it can vary from sample to sample. We often use to estimate an unknown parameter.

Parameter

A number that describes the population. A fixed number, but in practice is often unknown

The remaining outcome of a Bernuuli variable

Failure

proposition formula of CDF

For any two values, a<_b, *P(a<_X<_b)=P(X<_b)-P(X<a)=F(b-(a-)* "a-": largest possible of X that is strictly less than a.

Axes on Line Graph

Horizontal: time V: response variable (y)

Bayes' Theroem

Let A1, A2, A3 be mutually exclusive and exhaustive events with prior probabilities P(Ai),i = 1, 2, k. and B be any event for which P(B) > 0. !! Then the posterior probability of Aj given B has occurred is given by: *P(Aj|B) = P(Ai n B)/P(B) = P(Ai n B)/Sum P(Ai n B) = P(B|Aj)P(Aj)/Sum P(B|Ai)P(Ai)*

variance of X

Let X have a pmf p(x) & mean μ. If we left h(X) =(X-μ)² , then the variance of X is: *V(X) =E[h(X)] =E[(X-μ)²] =σ² =(Σ(xi - μ)²)*p(xi)*

Why is random sampling a better approach

Minimizes bias by selecting more representative samples Allows us to quantify sampling error (random variation)

Outliers

Observations that fall outside the overall pattern of a distribution

Bernoulli Family

Only two possible outcomes X=1=success & X=0=failure Suppose p(1)=P(X=1) & p(0)=P(X=0)=1-p Then we consider p a parameter & the collection of all such pmf's the Bernoulli family p(x;p)= 1-p ; if x=0 (failure) p ; if x=1 (success) 0 ; otherwise

Possible methods of counting

Ordered: 1. w/out replacement =*n!/(n-k)!* 2. w/ replacement: *n^k* Unordered: 1. w/out replacement: n chooses k= *n!/[k!(n-k)!]* 2. w/ replacement: (n+k-1) chooses k

Intersection of A and B

Outcomes common to both A or B but not exclusive to either denoted *A n B* (So even though in A, ONLY count if in B as well so works like "AND")

Multiplication Rule

P(A n B) = P(A|B)xP(B) = P(B|A)xP(A)*

general strategy for binomial probability

P(P(exactly k successes)= [nCk]*[p^k]*[(1-p)^(n-k)] =(#arrangement)*(p(succ))^(# of succ)*(p(fail))^(#of fails) ex.P(makes 2 of 3 free throws)= ("3C 2")(.9^2)(.1)^2 3C2= "3 chooses C​" = *n!/[k!(n-k)!]* p(succ)=.9 p(fail).1

Random sampling

Sampling that is rooted in probability and as such is a fair method of choosing a sample that avoids systemic deviations from the population.

Convenience Samples

Selection of individuals who are easiest to reach

The outcome of interest in a Bernuuli variable

Success

Population

The entire group of individuals, objects, or units about which we want information

Modality

The hills of a distribution graph

Sample

The part of the population from which we actually collect information, used to draw conclusions about the whole

Law of Total Probability

The sum of the probabilities of all individual outcomes must equal 1. Mutually exclusive events such that A1 u A2 u A3 =S-------- then: *P(B) = Sum of P(B n Ai) = Sum of P(B|Ai)xP(Ai)* ( using multiplication rule)

There are always less ___ than there are ___ (Comb or Perms)

There are always less combinations than there are permutations

Short cut variance formula

V(X) = E(X²)-E(X²) = E(X²)-μ² μ=mean

Dichortome or Bernuuli Variable

a categorical variable that has only two possible responses or outcomes

Poisson distribution

a discrete probability distribution describing the likelihood of a particular number of independent events within a particular interval (using time interval) -similar to binomial distribution : counts succ & trials ("n") Succ=RARE

Symmetric distribution

a distribution in which the data values are uniformly distributed about the mean

range of Discrete random variables

a finite set of values or a countably infinite sequence of values (0,1,2,3....) ex. # of hearts in the 5 cards dealt..[0,....5}

A random variable

a function of the sample space, S, that assigns a number to each outcome in S. More formally, let X represent some random variable. Then X is def. as X: f(S)==>R

Probability Mass Function (PMF)

a mathematical relation that assigns probabilities to all possible outcomes for a discrete random variables

Complement of A rule

all outcomes NOT in A; denoted *A^c or A' = 1-P(A)*

range of Continuous random variable

all possible values in an interval that can include -infinity to infinity (-infinite,infinite) ex. # of times your prof. has to swing at a golf ball before hitting it. (1,..infinite)

Random phenomenon

an activity whose outcome cannot be predicted in advance w/ certainty can be numeric or not

Poisson Distribution Formula

b(x;n,p)=[(n over x)] [(p^x)] [(1-p)^(n-x)]-->p(x;lambda)= *[(lambda^x) 9e^(-lamda)]/x!* *let:* lambda=np & n--> infinity

Histograms

bar graphs where the values of the variables, divided into ranges (classes) of equal width go on the horizontal counts go on the vertical axis

mutually exclusive events

can't happen at the same time so AnB=o "If A, then not B" and "If B, then not A."

parameters

characterization of a distribution that can take on a variety of values

family of probably odistributions

collection of all probability distributions w/ the same fn form for the pmf but vary according to different values of a parameter

Event

collection of outcomes (a subset of the sample space); typically denoted A or B

Union of A and B

combines all outcomes in A and B denoted *A u B* (so even though not in A, count if its in B so works like "OR")

A simple random sample (SRS)

consists of "n" individuals from the population chosen in such a way that every possible collection of "n" individuals is equally likely to be selected

Two types of random variables

discrete & continuous

Think of binomial experiment as

drawing "n" times w/ replacement from a population that consists of proportion "p" =successes & of proportion "1-p" =failures

end of ch.1

end of ch.1

end of ch.2

end of ch.2

Box Plot

graphs the five-number summary the box reps Q1, median, Q3

General Product Rule

if a job consists of k separate tasks, the ith of which can be done in n ways, i = 1, 2, ... k, then the entire job can be done in n*n*...*n ways

Median

the middle number when the observations are ordered from smallest to largest; denoted m

Mode

the most frequent observation

Interquartile range (IQR)

the range of the middle 50% of the data

Sample proportion

the relative frequency of the occurrence of successes in a sample and is denoted by P with a ^ on top

Voluntary response samples

the sample that chooses itself by responding to a general appeal

when a & b are integers & X is a discrete & integer random variable

then! *P(a<_X<_b)=F(b)-F(a-1) P(X=a)+F(a)-F(a-1)* if i understand CDF then i will get this

Bimodel

two peaks

Disjoint

when A and B have nothing in common so * if disjoin then A n B = ZERO*

exhaustive events

when every element in S occurs in one of the events example: if you roll a die, you will get a result that is "1", "2", "3", "4", "5", or "6".

standard deviation (SD) of X

σ=(σ²)^(1/2)

proposition formula for variance

σ² subscript aX+b= V(aX+b)=a² *σ²subscript x --> σ² subscript aX+b= |a|σsubscript x (just took square root of top eq)


Conjuntos de estudio relacionados

Autism Spectrum Disorder (Part 1 )IRIS

View Set

9. Processing Change Requests and Procurement Documents

View Set

ERGU: Male Reproductive Pathology

View Set

Biology Chapter 8 Mastering Biology

View Set

Geometry FINAL EXAM- Chapters 1, 2, and 3 Combined!!

View Set