Probability & Statistics

¡Supera tus tareas y exámenes ahora con Quizwiz!

General Addition Rule

For any two events (meaning disjoint or not disjoint), A and B, the probability of A or B is: P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

General Addition Rule for Any Two Events

For any two events A and B, P(A or B) = P(A) + P(B) - P(A and B)

General Multiplication Rule

For any two events, A and B, the probability of A and B is P(A and B) = P(A) x P(BlA).

General Addition Rule

For any two events, A and B, the probability of A or B is P(A∪B)=P(A)+P(B)-P(A∩B)

General Addition Rule

For any two events, A and B, the probability of A or B is... P(A U B) = P(A) + P(B) - P(A and B).

General Multiplication Rule

For any two events, then the probability of A and B is P(A∩B)= P(A) X P(B/A)

Intervals and Areas of Density Curves

For continuous probability models using a density curve, events are defined over intervals of values, and probability is computed as areas under the density curve.

Empirical (or 68-95-99.7) Rule

For data sets having a distribution that is approximately bell shaped, the following properties apply:

Class Width

Found by subtracting the lower class limit one from the lower class limit of the next class; can also be used with upper limits

CLASS PROBLEM: In real estate ads it is found that 64% of homes have garages, 9% have pools, and 28% have a finished basement. 5% have a garage and a pool, 19% have a garage and a basement, 4% have a basement and a pool, and 2% have all three. What percentage of homes do not have any of these three?

G=64-2=62-3-17=42 P=9-2=7-3-2=2 B=28-2=26-17-2=7 G&P=5-2=3 G&B=19-2=17 B&P=4-2=2 All=2 100-42-2-7-3-17-2-2=25

P(A or B) = P(A)+P(B) - P(A and B)

General addition rule for unions of two events

Central Limit Theorem Requirements

Given: 1. The random variable x has a distribution (which may or may not be normal) with mean µ and standard deviation σ 2. Simple random samples all of size n are selected from the population. (The samples are selected so that all possible samples of the same size n have the same chance of being selected.)

Frequency Polygon

Graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes

Line Graph

Graph that shows change over time using lines and data from observations.

distribution

a variable tells us what values it takes and how often it takes these values of categorical - gives us either the count of the percent of individuals that fall in each category

ordinal level of measurement

applies to data that can be arranged in order; differences between data are meaningless

nominal level of measurement

applies to data that consist of names, labels, or categories; cannot be ordered

Percentiles

are measures of location. There are 99 percentiles denoted P1, P2, . . . P99, which divide a set of data into 100 groups with about 1% of the values in each group.

class boundaries

are the numbers that separate classes without forming gaps between them.(2.1)

law of large numbers

as an experiment is repeated over and over, the empirical probability of an event approaches the actual probability of the event

the law of large numbers

as the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome

Law of Large Numbers

as you increase the # of times of probability experiment in repeated, the emirical probabiliy (relative frequency) of an event approaches the theoretical probability of the event

the law of averages

assumes that the more something hasn't happened the more likely it becomes

Categorical Data

attribute data; puts individuals into a group or category

Pareto Chart

bar graph in which bars are organized from highest to lowest

Empirical (or statistical) Probability

based on observations obtained from probability experiments

bcdf(n, p, k)

bcdf on calculator

normal distributions

bell curve, symmetric, unimodal density curves

Symmetric Distributions

bell-shaped, triangular, uniform (rectangle)

histogram

breaks the the range of values of a variable into classes and displayus only the count or percent of the observations that fall into each class, no space inbetween each bar

stratified sampling

divide the entire population into distinct subgroups called strata. The strata are based on a specific characteristic. All members of a stratum share the specific charactersitic. Draw random samples from each stratum

Standard Deviation

descriptive measure of the spread of the data from the mean

Interquartile Range (IQR)

difference between 1st and 3rd quartiles IQR = Q3 - Q1

interquartile range

difference between quartiles use 1.5 x IQR to solve for any outliers

Boxplot

displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values

boxplot

displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values

Skewed

distribution of data is skewed if it is not symmetric and extends more to one side than the other

Symmetric

distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half

Three Principles of Experimental Design - Randomization

divide into groups to avoid unintentional selection bias

cluster sampling

divide population into pre-existing segments; select random clusters; include every member of each selected cluster

cluster sampling

divide population into sections then randomly select some of those clusters and then choose ALL members from selected clusters

randomized block design

divide subjects with similar characteristics into blocks, and then within each block, randomly assign subjects to treatment groups.(1.3)

cluster sample

divide the population into groups, called clusters, and select all of the members in one or more (but not all) of the clusters.(1.3)

Bar Graphs

divides the data into bars, each bar is one category. The bars do not touch each other

Pie Chart

divides the data up into slices, where each slice represents one category. The size of each slice is determined by the relative frequency of each category. Used only when data represents parts of one whole

inference

drawing conclusions that go beyond the data at hand

Inferential Statistics

drawing inferences developing and using math tools to make forecasts

Exponential Distribution S(x) =

e^(-lamn*x)

Poisson Distribution: Mx(t) =

e^(lamn(e^t-1))

Treatments of an Experiment

each experimental condition

function

each input has exactly one output

legitimate probability assignment

each probability is between 0 and 1 (inclusive) and the sum of the probabilities is 1

Discrete random variable

either a finite number of values or countable number of values, where "countable" refers to the fact that there might be infinitely many values, but they result from a counting process

symmetry

elements on both sides of a line that have the same shape, size and arrangement

Three Principles of Experimental Design - Replication

ensure randomization creates groups that resemble and increases chances of detecting differences among treatments

Population

entire group of individuals about which we want information

tree diagram

enumerates each outcome in the sample space

E

event

dependent

events do impact each other's probability

Independent

events do not impact each other's probability

Independent events

events for which the occurrence of one has no impact on the occurrence of the other

mutually exclusive event

events that have no common outcome, two events that cannot occour at the same time

Disjoint Events

events that have no outcomes in common

dependent events

events where the probability of one affects the probability of the other

independent events

events whose probability do not affect each other

exploratory data analysis

examination of data and describe its main features

building blocks of probability

experiment, outcome, sample space, event

double blind experiments

experiments in which neither the participants nor the people analyzing the results know who is in the control group

clinical trials

experiments that student the effectiveness of medical treatment on actual patients.

Symmetric about the point c if:

f(c + t) = f(c- t)

f(y(giv)x) * fx(x) =

f(x,y)

Central Limit Theorem Description

for a population with any distribution, the distribution of the sample means approaches a normal distribution as the sample size increases.

Sample Space

for a procedure consists of all possible simple events; that is, the sample space consists of all outcomes that cannot be broken down any further

Three Standard Deviation Rule

for any data set almost all data values fall within three standard deviations of the mean

Random Variable:

function on a probability space S

Joint Conditional Distribution: If Independent, f(x,y) =

fx(x) * f(y(giv)X=x)

Joint Distribution: If Independent, f(x,y) =

fx(x) * fy(y)

If x & y are Independent, then f(y (giv) X=x) =

fy(y)

gcdf(p,n)

gcdf on calculator

stem plot

gives a quick picture of the shape of a distribution while including the actual numerical values in the graph

dotplot

graphs a dot for each case against a single axis

blocks

groups of subjects with similar characteristics.(1.3)

quantitative variable

has a value or numerical measurement

disjoint

have no outcomes in common and cannot occur simultaneously

z score

how many standard deviations x lies from the distribution mean

spread

how varaiable the data is; measured by standard deviation, IQR, variance, range

addition rule

if A and B are disjoint events, then the probability of A or B is _______

multiplication rule

if A and B are independent events, then the probability of A and B is _____

The Multiplication Principle

if event A has a possible outcomes and event B has b possible outcomes, then BOTH events considered together have (a*b) outcomes.

Multiplication Rule

if events A and B are independent, then P(A and B) = P(A)P(B)

multiplication rule

if events A and B are independent, then P(A and B) = P(A)P(B)

factorial symbol (n!)

if n ≥ 0 is an integer, the factorial symbol, n!, is defined as follows: n! = n(n-1)∗⋅⋅⋅∗3∗2∗1

transformation

if the data does not appear linearly distributed it may be necessary to do this on the data to change the distribution to a linear distribution

When considering permutations, the order is:

important: {a,b} does not equal {b,a}

sample

in which measurements or observations from part of the population are used

census

in which measurements or observations from the entire population are used

randomized block experiment

individuals are first sorted into blocks, and then a random process is used to assign each individual in the block to one of the treatments

Continuous random variable

infinitely many values, and those values can be associated with measurements on a continuous scale without gaps or interruptions

prior probability

initial probability of a state of nature before sample information is used with Bayes Theorem

For a continuous random variable Mx(t) =

integral (-infinity to infinity): e^(tx) f(x) dx

For Continuour Random Variable E[X] =

integral (-infinity to infinity): x* f(x) dx

Joint Distribution Cumulative Distribution:

integral (-infinity to x) integral (-infinity to y) f(s,t) dt ds

Continuous Uniform Distribution: F(x) =

integral (a to x) f(x) dx = (x-a) / (b - a)

Marginal Distribution fx(x) =

integral of f(x,y) dy (continuous) sum(y's) f(x,y) (discrete)

Descriptive statistics

involves methods of organizing, picturing, and summarizing information from samples or populations

inferential statistics

involves methods of using information from a sample to draw conclusions regarding the population

When considering combinations, the order of the elements in the set is:

irrelevant: {a,b} = {b,a}

frequency histogram

is a bar graph that represents the frequency distribution of a data set.(2.1)

census

is a count or measure of an entire population.(1.3)

sampling

is a count or measure of apart of a population, more commonly used in statistical studies.(1.3)

frequency polygon

is a line graph that emphasizes the continuous change in frequencies.(2.1)

parameter

is a numerical description of a population characteristic.(1.1)

statistic

is a numerical description of a sample characteristic.(1.1)

outlier

is a point lying far away from the other data points.

randomization

is a process of randomly assigning subjects to different treatment groups.(1.3)

systematic sample

is a sample in which each member of the population is assigned a number. The members of the population are ordered in some way, a starting number is randomly selected, and then sample members are selected at regular intervals from the starting number.(1.3)

simple random sample

is a sample in which every possible sample of the same size has the same chance of being selected.(1.3)

hypothesis test (or test of significance)

is a standard procedure for testing a claim about a property of a population.

sample

is a subset of a population.(1.1)

frequency distribution

is a table that shows classes or intervals of data entries with a count of the number of entries in each class.(2.1)

blinding

is a technique where the subject does not know whether he or she is receiving a treatment or a placebo.(1.3)

survey

is an investigation of one or more characteristics of a population.(1.3)

Range Rule of Thumb

is based on the principle that for many data sets, the vast majority (such as 95%) of sample values lie within two standard deviations of the mean.

random sample

is one in which every member of the population has an equal chance of being selected.(1.3)

descriptive statistics

is the branch of statistics that involves the organization summarization, and display of data.(1.1)

inferential statistics

is the branch of statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics is probability.(1.1)

population

is the collection of all outcomes, responses, measurements, or counts that are of interest.(1.1)

sampling error

is the difference between the results of a sample and those of the population.(1.3)

class width

is the distance between lower (or upper) limits of consecutive classes.(2.1)

frequency f

is the number of data entries in the class.(2.1)

relative frequency

is the portion or percentage of the data that falls in that class. To find the relative frequency of a class, divide the frequency f by the sample size n.(2.1)

five-number summary

minimum, 1st quartile, median, 3rd quartile, maximum

Five Number Summary

minimum, Q1, Q2, Q3, maximum

simulation

models random events by using random numbers to specify event outcomes with relative frequencies that correspond to the true real-world relative frequencies we are trying to model

Multimodal

more than two data values occur with the same greatest frequency

Disjoint Events

mutually exclusive, events that have no outcomes in common

disjoint events

mutually exclusive, events that have no outcomes in common

Finding a Sample Size for Estimating a Population Mean

n = ( [z(α/2) * σ] / E)^2

Given n distinct objects, the number of ways in which the objects may be ordered:

n!

# of ways of choosing a subset without replacement. (no regard to order):

n! / (k! * (n - k)!)

Permutations Rule Formula

n! / (n(1)! * n(2)! ... * n(k))

Choosing ordered subset of size k withouth replacement from n objets:

n! / (n-k)! P(n,k)

integral (0 to infinity) (x^n)(e^-cx)dx =

n! / c^n+1

# of ways of ordering n objects of different types:

n! /( n1! * n2!*.....nt!)

confounding variable

occurs when an experimenter cannot tell the difference between the effects of different factors on a variable.(1.3)

undercoverage

occurs when some groups in the population are left out of the process of choosing the sample

probability

of an outcome is defined as the long-term proportion of times the outcome occurs; a number that indicates how likely the particular outcome is

lurking variable

one for which no data have been collected but that nevertheless has influence on other variables in the study

completely randomized experiment

one in which a random process is used to assign each individual to one of the treatments

Randomized Block Design

one layer of classification not random

Combination

order doesn't matter nCr = n! / r!(n-r)!

Permutation

order matters nPr = n! / (n-r)!

equally likely outcomes

outcomes that have the same probability of occurring

examining terms for distribution

overall pattern, deviations, shape, center, spread, outlier

Geometric Distribution: Mx(t) =

p / (1-qe^t)

Binomial Distrubution: p, q, x, n =

p = probability success, q = probability failure, x = successes, n = independent trials

Test Statistic for Two Proportions - cont

p(1) - p(2) = 0 (assumed in the null hypothesis) phat(1) = x(1) / n(1) phat(2) = x(2) / n(2)

Binomial Distribution with Parameters n & p:

p(x) is the probability that there will be exactly x successes in the n trials

x:N(µ,σ)

parameters of a normal curve

z:N(0,1)

parameters of a standard normal curve

Sample

part of the population from which info is obtained

Pooled Sample Proportion

pbar = ( x(1) + x(2) ) / ( n(1) + n(2) ) qbar = 1 - pbar

response bias

people answer questions the way they think you want them answered. There are some questions they simply don't want to answer truthfully.

literacy

percent of adults in a given country who can read and write

Percentiles

percentage of scores / data values that fall below a point

nPr

permutation of n objects taken r at a time

trend

persistent, long term rise or fall

Obtaining P

phat sometimes is given directly phat sometimes must be calculated: phat = x/n

influential points

points that strongly affect the graph of the regression line.

Continuous data

possible values are infinite; no gaps

Levels

possible values of a factor

y-intercept

predicted value when the x variable is zero

joint probabilities

probabilities that correspond to the events represented in the cells of the contingency table

marginal probabilities

probabilities that correspond to the events represented in the margin of the contingency table.

frequentist interpretation of probability

probability of an event proportional to number of times event occurs in a large number of repetitions of the experiment

P(X>n) = q^n

probability of more than n trials before success

institutional review board

protects the rights and welfare of humans subjects participating in research activities

Law of Large Numbers

states that as an experiment is repeated over and over, the observed frequency of an outcome gets closer to its expected frequency.

benford's law, also called the first-digit law

states that for certain kinds of data, the first digit in each data value has a curious frequency -this can be used to access the legitimacy of certain date -for appropriate data, first digits have the following distribution )with the last value missing -first digit: 1 2 3 4 5 6 7 8 9 -Probability: .301 .176 .125 .097 .079 .067 .058 .051 ? -1. what is the probability that the first digit is 9? .046 -2. What is the probability that the first digit is at least 2? (pp says .699 but I don't understand that)

law of large numbers

states that in the long-run relative frequency of repeated independent events settles down to the TRUE relative frequency as the number of trials increases.

Sx=sqrt(npq)

std. dev of binomial

q/p^2

std. dev of geometric random variable

completely randomized design

subjects are assigned to different treatment groups through random selection.(1.3)

"not"

subtract from 1, 1 - P(example).

least-squares property

sum of the squares of the residuals is the smallest sum possible

Example:

suppose X and Y are independent, and E(X)=120 ox=12 E(Y)=300 ox=16 Find the mean and standard deviation of 2X-5Y E(2X-5Y)=2E(X)-5E(Y)=2(120)-5(300)=-1260 V(2X-5Y)=V(2X)+V(5Y)=4V(X)+25V(Y)=4(144)+25(256)=6976 > o2x-5y=square root of 6976=83.522

Hypothesis Test Statistic for Two Means: Independent Samples

t = ( ( xbar(1) - xbar(2) ) - ( μ(1) - μ(2)) ) / ( (s^2(1) / n(1) ) + (s^2(2) / n(2)) )

Test Statistic for Testing a Claim About a Mean (with σ Not Known)

t = (xbar - μ(xbar) ) / (s / (n)^(1/2) )

distribution

tells us what values it takes and how often it takes these values

Distribution

tells what value the variable takes and how often it takes these values.

splitting stem/ trim

terms to slim down the size of your stem plot. helpful when you have large sets of data

marginal change

the amount that it changes when the other variable changes by exactly one unit

intersection

the and of some events

quotient

the answer of a division problem

product

the answer of a multiplication problem

difference

the answer of a subtraction problem

sum

the answer of an addition problem

mean

the arithmetic average

mean

the arithmetic average of the observations

slope

the average change in the response variable as the explanatory variable increases by one

expectancy

the average number of years a person can expect to live in a given country

mean absolute deviation

the average of the absolute deviations for all the data values in the sample

randomization

the best defense against bias, in which each individual is given a fair, random chance of selection

probability

the chance or likleyhood of getting a certain number, word, or object

sample space

the collection of all possible outcome values; has a probability of 1

sample space

the collection of all possible outcomes

sample space

the collection of all possible outcomes; denoted with an S

sample data

the data are form only some of the individuals of interest

population data

the data are from every individual of interest

residual

the difference between an observed value of the response variable and the value predicted by the regression line

sampling error

the difference between measurements from a sample and corresponding measurements from the respective population; caused by the fact that the sample does not perfectly represent the population

interquartile range

the difference between quartiles

Range

the difference between the highest and lowest scores in a distribution

residual

the difference between the observed value and the predicted value of a regression equation; y - y-hat

Conditional Distribution

the distribution of a variable restricting the who to consider only a smaller group of individuals

Type II Error

the error of failing to reject a null hypothesis when in fact it is false (also called a "false negative"). the probability of a Type II error is commonly denoted β and depends on the effect size.

Experimental Study

the factors who effect to be assessed is manipulated appropriately by devising a suitable design -conceptual -data creates background

control group

the group that does not receive the experimental treatment.

simulation

the imitation of change behavior, based on a model that accurately reflect the phenomenon under consideration

dimension

the length, width, or height of a shape

Probability

the likelihood that a possible future event will occur in any given instance of the event; mathematical ratio: what you want to happen to total outcomes of what could happen

negative slope

the line would decrease from left to right

positive slope

the line would increase from left to right

the law of large numbers

the long run relative frequency of repeated independent events settles down to the true probability as the number of trials increases

probability of an outcome

the long-term proportion with which a certain outcome is observed

probability

the long-term relative frequency

income

the mathematical average amount of money a typical family makes in US dollars

expected value

the mean of the discrete random variable

Arithmetic Mean (Mean)

the measure of center obtained by adding the values and dividing the total by the number of values

median

the midpoint of a distribution

simulation component

the most basic situation in which something happens at random

normal probability distribution

the most widely used continuous probability distribution, which plays a central role in statistical inference; can be used to describe almost all phenomena in real life situations

sampling variability

the natural tendency of randomly drawn samples to differ

Geometric Distribution: X represents:

the number of failures until the first success

counting principle

the number of possible outcomes in an experiment

z Score (or standardized value)

the number of standard deviations that a given value x is above or below the mean

binomial random variable

the number of successes in n trials of a binomial experiment

relative frequency

the number of times an outcome occurs divided by total number of trials

m

the number of ways that an event E can occur

Mode

the number that occurs most often in a set of data

individuals

the objects described in a set of data.

union

the or of two sets

independent

the outcome of one trial doesn't influence or change the outcome of another

density curve

the overall pattern of a distribution can be described by this

density curve

the overall pattern of a distribution, areas underneath give proportions of observations for the distribution

Sample

the part of the population from which we actually collect information and is used to draw conclusions about the whole

Individuals

the people or objects included in the study

Complement Rule

the probability of an event occurring is 1 minus the probability that it doesn't occur

complement rule

the probability of an event occurring is 1 minus the probability that it doesn't occur

probability

the probability of an outcome is the proportion of times that it would occur over many repetitions. -often, people expect the outcomes to settle into some regularity much sooner than they actually do.

conditional probability

the probability of one event given another

complement rule

the probability of one event occurring is 1 minus the probability that it does NOT occur. P(A)= 1-P(A') A' read as A complement

The Mode of the Distribution is the point where:

the probability or density function is maximized.

posterior probability

the probability that a hypothesis is true after consideration of the evidence

complement of an event

the probability that an event does not occur; all outcomes in a sample space that are not outcomes in the event

conditional probability

the probability that an event occurs, given that another event has occurred

what does the notion P(A) stand for?

the probability that outcome A occurred

complement

the probability that something *doesn't* happen

probability

the proportion of times the event occurs in many repeated trials of a random phenomenon (the long-term relative frequency of an event)

block design

the random assignment of units to treatments is carried out separately within each section.

experimental probability

the ratio of the number of times an outcome occurs to the total amount of trials performed

replication

the repetition of an experiment in order to test the validity of its conclusion

outcome

the result of a singe performance of an experiment; a set of outcomes is denoted with braces {}

Outcome

the result of a single trial in a probability experiment

nonsampling error

the result of poor sample design, sloppy data collection, faulty measuring instruments, bias in questionnaires, and so on.

slope

the rise of a line divided by the run of the line

random phenomena

the rules and concepts of probability that give us a language to talk and think about ________

sample space

the sample space is the set of all possible outcomes, denoted S example: toss a coin three times. The sample space is ... S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} -the size of S is denoted lSl. -example: toss a die twice. The sample space is... S={(1,1), (1,2),...(1,6), (2,1), (2,2),...(2,6),...(6,6)} example: pull two cards from a well-shuffled deck. How many elements are in the sample space?

dependent

the sample values are paired

independent

the sample values selected from one population are not related to or somehow paired or matched with the sample values from the other population.

trial

the sequence of several components representing events that we are pretending will take place

Complement of Event E

the set of all outcomes in a sample space that are not included in event E. the complement of event E is denoted by E & is read as "E Prime"

Sample Space

the set of all possible outcomes

sample space (S)

the set of all possible outcomes

Sample Space

the set of all possible outcomes of a probability experiment

Sample Space, S

the set of all possible outcomes of a random phenomenon

joint event

the simultaneous occurence of two events

standard error

the standard deviation of a sampling distribution

statistics

the study of how to collect, organize, analyze, and interpret numerical information from data

Mean

the sum of the observations divided by the number of observations

"something has to happen rule"

the sum of the probabilities of all possible outcomes must be 1

something has to happen rule

the sum of the probabilities of all possible outcomes must be 1

explanatory variables

the treatment (ex. studying or not studying); factors

extrapolation

the use of a regression line to make predictions for the values outside the range of x.

Measure of Center

the value at the center or middle of a data set

median

the value below which 50% of the cases fall

outcome

the value measured, observed, or reported for each trial

Midrange

the value midway between the maximum and minimum values in the original data set

Mode

the value that occurs with the greatest frequency Data set can have one, more than one, or no mode

explanatory variable, predictor variable or independent variable

the x variable

response variable or dependent variable

the y variable

independence (informally)

this happens between two events where the knowing whether or not one event occurs does NOT alter the probability that the other event occurs

Variance of a Data Set

to find, square the standard deviation

scope of inference

to whom the generalization of the inference may be directed

Bimodal

two data values occur with the same greatest frequency

Independent Events

two events in which the outcome of one event does not affect the outcome of the other event.

Mutually exclusive/disjoint

two events that cannot occur simultaneously

complementary

two events that cannot occur together, but one must happen

Disjoint/mutually exclusive

two events that have no outcomes in common (Cannot occur simultaneously)

mutually exclusive events

two events that have no outcomes in common; disjoint events

disjoint events

two events that have no outcomes in common; mutually exclusive events

disjoint

two events that share no outcomes in common, mutually exclusive

Mutually Exclusive Events

two or more events - if no two of them have outcomes in common

confounders

two variables whose effect cannot be distinguished from one another

Unimodal, Bimodal, & Multimodal Distributions

unimodal - has one peak bimodal - has two peaks multimodal - has three or more peaks

modes

unimodal - one peak bimodal - two peaks multimodal- multiple peaks

multistage sampling

use a variety of sampling methods to create successively smaller groups at each stage. The final sample consists of clusters

treatment

used in experimental studies, its what is given to a text subject.

randomization

used to assign individuals to the two treatment groups; helps prevent bias in selecting group members

Line Graph

used to indicate a trend over time. Horizontal axis = time. Vertical axis = observed numerical data. Look for: overall pattern or trend, deviations, seasonal variations, pay specific attention to vertical scale

inferential statistics

used to interpret data and draw conclusions of whole based on sample

classical method of assigning probabilities

used when an experiment has equally likely outcomes

simulation

uses methods such as rolling dice or computer generation of random numbers to generate results from an experiment.

Pictogram

uses pictures as part of the representation. Pictures are not often to scale, should not be used

QuaNtitative Variable

values measured on a numerical side

Factor

variable whose effect on response variable is of interest in the experiment

independence (formally)

when P(B/A)=P(B)

completely randomized design

when all experimental units are allocated at random among all treatments

quartile

when data in a set are arranged in order, quartiles are the numbers that split the data into quarters or fourths

random

when individual outcomes are uncertain but there is a regular distribution of outcomes in a large number of repetitions

confounded

when the effects of one of the two variables canot be distinguished from the effects of the other

independent

when the results of one variable don't affect another

sampling with replacement

when the second draw is exactly like the first

sampling without replacement

when you don't replace the things you select

matched-pairs design

where subjects are paired up according to a similarity. One subject in the pair is randomly selected to receive one treatment while the other subject receives a different treatment.(1.3)

y-intercept

where the graph of a line crosses the y-axis

upper class limit

which is the greatest number that can belong to the class.(2.1)

lower class limit

which is the least number that can belong to the class.(2.1)

where P(A and B) denotes the probability that A and B both occur at the same time as an outcome in a trial of a procedure.

...

where x is the value of the random variable and P(x) is the probability of observing the variable x.

...

μ(xbar) = population mean of all sample means from samples of size n

...

Random samples eliminate bias from the act of choosing a sample, but they can still be wrong because of...

... the variability that results when one chooses at random.

at a hospital, the probability of a patient having surgery is 12%, and obstetric treatment 16% and the probability of both is 2%. What is the probability that a patient will have neither treatment?

.74

If X & Y are independent, then COV[X,Y] =

0

Mx(0) =

1

"At least one"

1 - P(E)

P[A'(giv)B] =

1 - P[A(giv)B]

Exponential Distribution F(x) =

1 - e^(-lamn*x)

1 + 2a + 3a^2 + ........... =

1 / (1-a)^2

1 + 2r + 3r^2 + ........... =

1 / (1-r)^2

Continuous Uniform Distribution: f(x) =

1 / (b - a)

For Discrete Uniform Distribution of N points: p(x) =

1 / N

For Uniform Joint Distribution, pdf =

1 / area of R

Rules of Thumb (1)

1) "and" means multiply when the events are independent -toss a coin three times. what is the probability of all three being tails? -that is, tails first AND tails second AND tails third -since coin flips are independent we multiple, .5x.5x.5=.125

68.26% - 95.44% - 99.74% Rule

1) 68.26% of all observations lie within one standard deviation to either side of the mean 2) 95.44% of all observations lie within two standard deviations 3) 99.74% of all observations lie within three standard deviations

Converting from the kth Percentile to the Corresponding Data Value flowchart

1) Sort the data 2) L = (k/100) * n 3) Is L a whole number? Yes) The value of the kth perecntile is midway between the Lth value and the nest value in the sorted set of data. Find P(k) by adding the Lth value and the next value and dividing by two No) Change L by rounding it up the next larger whole number The value of P(k) is the Lth value, counting from the lowest.

Bell-Shaped Distribution

Has a single peak, tapers odd at either end; and is approximately symmetric

discrete random variable

Has either a finite or countable number of values. The values can be plotted on a number line with space between each point.

continuous random variable

Has infinitely many values. The values can be plotted on a line in an uninterrupted fashion.

P(A∪B) = P(A) + P(B)

IF Mutually Exclusive

Addition Rule 2

If A and B are NOT mutually exclusive, then P(A or B) = P(A) + P(B) - P(A and B)

General Multiplication Rule

If A and B are any two events, then P(A & B) = P(A) x P(B|A)

addition rule

If A and B are disjoint events, then P(A∪B)=P(A)+P(B)

Multiplication Rule, Upside down U

If A and B are disjoint events, then the probability of A and B is P(A and B) = P(A) x P(B).

Addition Rule "or"

If A and B are disjoint events, then the probability of A or B is P(A U B) = P(A) + P(B).

Addition Rule

If A and B are disjoint events: P(A or B)=P(A) + P(B)

addition rule

If A and B are disjoint events: P(A or B)=P(A) + P(B)

multiplication rule

If A and B are independent events, then the probability of A and B is P(A∩B)= P(A) X P(B)

addition rule for disjoint events

If E and F are disjoint events, then P(E or F) = P(E) +P(F)

Conditional Independence:

If P[A(giv)B] = P[A] or P[B(giv)A] = P[B]

The 5% Guideline for Cumbersome Calculations

If a sample size is no more than 5% of the size of the population, treat the selections as being independent (even if the selections are made without replacement, so they are technically dependent).

74.5

A statistics student recieves a score of 85 on a statistics midterm. If the corresponding z-score equals 1.5 and the standard deviation equals 7, the average score on this exam is __________.

contingency table

A table that relates two categories of data; two-way table. Variables are placed in rows and columns; each intersection of variables is a cell in the table.

Experiment

A treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured.

Dotplot

A type of graph where dots are used to represent data. The distribution of the dots can highlight similarities or differences in the data.

Pie Chart

A type of graph which uses a 2 dimensional circle. Sections of the circle are filled out to show differences or similarities in data.

Bar Graph

A type of graph which uses bars to show the differences or similarities in different sets of data.

Random Variable

A variable whose value is a numerical outcome of a random phenom.

multiplication rule of counting

If as task consists of a sequence of choices in which there are p selections for the first choice, q selections for the second choice, r selections for the third choice, etc., then the task of making these selections can be done in p∗q∗r∗⋅⋅⋅ ways

Law of Large Numbers

As a procedure is repeated again and again, the relative frequency probability of an event tends to approach the actual probability.

skew

Asymmetry in the distribution of the data values or how the is distributed across

Uniform Distribution

Basically flat or rectangular

P(B|A) = P(A|B) P(B) / P(A)

Bayes Rule Single Event

P(B|A) = P(A|B) P(B) / P(A|B) P(B) + P(A|B⁻) P(B⁻)

Bayes Rule Two Events

Area and Probability

Because the total area under the density curve is equal to 1, there is a correspondence between area and probability.

Class problem: employee bonuses are awarded at the end of the year. Thomas realizes it is possible for him to get a $5000 bonus, but it is unlikely. He is twice as likely to get a $2000 bonus, seven times as likely to get a $1000 bonus, and ten times as likely to get a $500 bonus. -construct the probability distribution for Thomas's bonus (first call the probability of getting a $5000 bonus p)

Bonus: 5000 2000 1000 500 probability: p 2p 7p 10p

Discrete Variable

Breaks between values -counted Example: # of books in a room

Discrete Distribution:

Can only take on values from a finite infinite sequence.

Round-off Rule for Measures of Center

Carry one more decimal place than is present in the original set of values.

Binary variable

Categorical variable with 2 choices such as gender- male or female

The idea of probability

Chance behavior is unpredictable short-term, but has a regular and predictable pattern in the long run.

Variables

Characteristics of the individual to be measured/observed

exhaustive events

If one includes all the possible outcomes, then A and A' are exhaustive because one of them must happen.

Round-Off Rule for Determining Sample Size

If the computed sample size n is not a whole number, round the value of n up to the next larger whole number.

Interviewer Influence

Factors such as tone of voice, body language, dress, gender, authority, and ethnicity of the interviewer might influence responses.

Conditional probability

Find the probability of an event when we have additional information that some other event has already occurred.

Probability of "at least one"

Find the probability that among several trials, we get at least one of some specified event.

Finding the Median

First sort the values (arrange them in order), the follow one of these 1. If the number of data values is odd, the median is the number located in the exact middle of the list. 2. If the number of data values is even, the median is found by computing the mean of the two middle numbers.

negative

For a normal distribution curve, the z value for an x value that is less than µ is always __________.

Round-Off Rule for Sample Size n

If the computed sample size n is not a whole number, round the value of n up to the next larger whole number.

theoretical probability

The mathematical calculation that an event will happen in theory

binomial probability distribution function

The probability of obtaining x successes in n independent trials of a binomial experiment is given by

Conditional Probability

The probability of some event given that some other event occurs.

Unbiased Estimator

The sample variance s2 is an unbiased estimator of the population variance 2, which means values of s2 tend to target the value of 2 instead of systematically tending to overestimate or underestimate 2.

Point Estimate of the Population Mean

The sample mean xbar is the best point estimate of the population mean µ.

sample proportion

The sample proportion is the best point estimate of the population proportion.

Sample Space S

The set of all possible outcomes of a random process.

Outcomes for a diagnostic test

There are four possible outcomes: - true positive - true negative - false positive - false negative

Subjective Probability

Uses a probability value based on an educated guess or estimate, employing opinions and inexact information.

Inferential Statistics

Using information from a sample to draw conclusions about the population

COV[X,X] =

VAR[X]

VAR[X+Y] =

VAR[X] + VAR[Y] + 2*COV[X,Y]

tree diagram

a display of conditional events or probabilities that is helpful in thinking through conditioning.

skewed

a distribution is this if it's not symmetric and one tail stretches out farther than the other

symmetric

a distribution is this if the two halves on either side of the center look approximately like mirror images of each other

sampling distribution

a distribution of statistics obtained by selecting all the possible samples of a specific size from a population

normal distribution

a family of symmetrical bell-shaped density curves.

scatterplots

a graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation.

block

a group of individuals sharing some common features that might affect the treatment

probability histogram

a histogram in which the horizontal axis corresponds to the value of the random variable and the vertical axis represents the probability of each value of the random variable

regression line

a line that describes how the response variable changes as the explanatory variable changes.

sampling frame

a list of individuals form which a sample is actually selected

probability model

a mathematical description of a random phenomenon consisting of a sample space and a way of assigning probabilities to events

Probability Model

a mathematical description of a random phenomenon consisting of a sample space and way of assigning probability

probability

a measure of how likely it is that some event will occur; science of uncertainty

z score

a measure of how many standard deviations you are away from the norm (average or mean)

probability

a measure of the likelihood of a random phenomenon or chance behavior

Probability

a measure of the likelihood of an event

Standard Deviation

a measure of variability that describes an average distance of every score from the mean

standard deviation

a measure of variability that describes an average distance of every score from the mean

Random variable

a variable (typically represented by x) that has a single numerical value, determined by chance, for each outcome of a procedure

time plot

a variable plots each observation against the time at which it was measured

Fundamental Counting Rule

For a sequence of two events in which the first event can occur m ways and the second event can occur n ways, the events together can occur a total of m n ways.

5-Number Summary

For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value.

-two events are disjoint if they cannot both occur

...

Mean

The average value from a set of observations.

Risk

The probability of getting an undesired outcome.

2) The conditions for a binomial distribution are satisfied.

...

fair die

a die where each possible outcome is equally likely

Complements: The Probability of "At Least One"

"At least one" is equivalent to "one or more."

Prevalence

# of diseased individuals / total # of individuals

Confidence Interval for Estimating a Population Standard Deviation or Variance

( [n - 1] * s^2 ) / X(r)^2 < σ^2 < ( [ n - 1 ] * s^2) / X(L)^2

Binomial Distribution: Mx(t) =

(1 - p + pe^t)^n

the pdf of Normal Distribution f(x) =

(1 / (std * sqr(2pi)) * e^((-(x-mean)^2)/(2(std^2)))

Geometric Distribution: E[X] =

(1-p) / p

Geometric Distribution: VAR[X] =

(1-p) / p^2

2. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables (as described in Section 10-2).

...

Basic Probability: A =

(A & B) or (A & B')

DeMorgan's Laws:

(A or B)' = A' & B' ; (A & B)' = A' or B'

animated

(Adj.) Full of life, lively, alive; (part.) moved to action.

For Discrete Uniform Distribution of N points: E[X] =

(N + 1) / 2

For Discrete Uniform Distribution of N points: Var[X] =

(N^2 - 1) / 12

Bayes's Theorem: P[A(giv)B] =

(P[B(giv)A] * P[A]) / P[B(giv)A] * P[A] + P[B(giv)A'] * P[A']

Midquartile

(Quartile 3 + Quartile 1) / 2

Semi-interquartile Range

(Quartile 3 - Quartile 1) / 2

Basic Rules for Computing Probability (Rule 2) - Classical Approach to Probability

(Requires Equally Likely Outcomes)

Standard deviation (probability distribution)

([∑x^2 • P(x) ] - µ^2)^(1/2)

Continuous Uniform Distribution: E[X] & VAR[X] =

(a + b) / 2 & (b-a)^2 / 12

where E = t(𝞪/2) * ( (s^2(1) / (n(1)) + (s^2(2) / n(2) ) ^ (1 / 2)

...

dissuade

(v.) to persuade not to do something.

culminate

(v.) to reach a hight point of development; to end, climax.

cater

(v.) to satisfy the needs of, try to make things easy and pleasant; to supply food and service.

Confidence Interval Estimate of μ(1) - μ(2): Independent Samples

(x1 - x2) - E < (µ1 - µ2) < (x1 - x2) + E

Mean Absolute Deviation

(∑|x-xbar|)/n

peevish

(adj) cross, complaining, irratable; contrary

available

(adj.) Ready for use, at hand.

literate

(adj.) able to read and write; showing an excellent educational background; having knowledge or training.

Indispensable

(adj.) absolutely necessary, not to be neglected

transparent

(adj.) allowing light to pass through; easily recongnized or understoof; easily seen through or detected

Indignant

(adj.) filled with resentment or anger over something unjust, unworthy, or mean

miscellaneous

(adj.) mixed, of different kinds

unique

(adj.) one of a kind unequaled; unusual; found only in a given, class, place or situation

mutual

(adj.) shared, felt, or shown equally by two or more

Customary

(adj.) usual, espected, routine.

upright

(adj.) verticle, straight; good. honest; (adv) in a vertical position

unscathed

(adj.) wholly unharmed, not injured.

poised

(adj.part.) balanced, suspended; calm, controlled; ready for action

downright

(adv.) throughly; (adj.) absolute, complete; frank, blunt.

Skewed to the left

(also called negatively skewed) have a longer left tail, mean and median are to the left of the mode

Skewed to the right

(also called positively skewed) have a longer right tail, mean and median are to the right of the mode

Poisson Distribution: p(x) =

(lamna parameter) ((e^-lamn(lamn^x))/x!

Midrange formula

(maximum value + minimum value) / 2

Range

(maximum value) - (minimum value)

Negative Binomial Distribution p(x):

(n + x - 1 choose x) p^r q^x

Binomial Distribution: p(x) =

(n choose x)(p^x)(q^n-x)

brood

(n.) a family of young animals, especially birds; any group having the same nature or orgin; (v.) to think over in a worried, unhappy way.

regime

(n.) a goverment in power; a form or system of rule or management; a period of rule.

Indifference

(n.) a lack of interest or concern

drone

(n.) a loafer, idler; a buzzing or humming sound; a remote-control device; a male bee. (v.) to make a buzzing sound; to spead in a dull tone of voice.

entrepreneur

(n.) a person who starts up and takes on the risk of a buisness.

firebrand

(n.) a piece of burning wood; a troublemaker; an extremely energetic or emotional person.

oration

(n.) a public speech for a formal occasion.

plague

(n.) an easily spread disease causing a large number of deaths; a widespread evil; (v.) to annoy or bother

ingredient

(n.) one of the materials in a mixture, recipe, or formula

homicide

(n.) the killing of one person by another

luster

(n.) the quality of giving off light, brightness, glitter, brilliance.

Negative Binomial Distribution Mx(t) =

(p / 1-qe^t)^r

verify

(v) to establish the truth or accuracy of, confirm.

retard

(v.) To make slow, delay, hold back

lubricate

(v.) to apply oil or grease; to make smooth, slippery, or easier to use

seethe

(v.) to boil or foam; to be excited or disturbed.

singe

(v.) to burn slightly (n.) a burn at the ends or edges.

loom

(v.) to come into view; to appear in exaggerated form. (n.) a machine for weaving.

goad

(v.) to drive or urge on. (n.) something used to drive or urge on.

indulge

(v.) to give in to a wish or desire, give oneself up to.

yearn

(v.) to have a strong and earnest desire.

Class Problem: A card is drawn from a deck of 52 cards. -what is the probability that it is neither a diamond nor an ace? -What is the probability that it is either not a diamond or it is not an ace?

-13 cards are diamonds and 3 more are aces, that leaves 36 cards, so 36/52= .6923 -there is only one card that doesn't fit either category-the ace of diamonds, so 51/52= .9808

Random Phenomenon

-An activity whose outcome we can observe or measure but we do not know how it will turn out on any single trial

Events

-An event is some set of outcomes from the sample space. events are denoted by capital letters A,B,C....

Independent Events

-If the knowledge of one event having occurred does not change the probability that the other event occurs

Mutually Exclusive Event

-If they have no outcomes in common -One cannot happen with the other

Multiplication and Division

-Mean, Median, Mode, Range, and SD are affected

Addition and Subtraction

-Mean, Median, and Mode are affected -Cannot subtract SD; Only square, add, and square root -Range is NOT affected

Probability Distribution for a Discrete Random Variable

-Possible values of the discrete random variable together with their respective probabilities

Probability Distribution for a Random Variable

-Possible values of the random variable X together with the probabilities corresponding to those values

Continuous Random Variable

-Random variable that assumes values associated with one or more intervals on the number line

Discrete Random Variable

-Random variable with a countable number of outcomes

Law of Large Numbers

-States that the proportion of successes in the simulation should become, over time, close to the true proportion in population

Event

-Subset of the sample space

Events

-The event A and B is the set of outcomes that belong to both sets (their overlap)

Prosecutor's fallacy

-a man is on trial for a crime, and forensic evidence is found at the scene which implicates him. -a prosecutor has an expert witness testify that the probability of finding this forensic evidence is 1 in 20,000 if the person is innocent -by itself, this argument is misleading... -the defense counters that there are 1,000,000 ppl in this city and so there are 50 people who could have left this evidence. -thus there is still only a 1 in 50 chance that the defendant is the one that left this evidence -the prosecutor would have to make an argument that significantly narrows down this pool of 40 people, like additional evidence. -this is tantamount to someone winning the lottery, and the prosecutor charging them of cheating because the odds of winning were so low.

Rules of Thumb (3 continued)

-if there are 23 people in a room, what is the probability that at least two of them have the same b-day? -P(at least 2) = 1-P(all different)= # different bdays for 23 people/# possible bdays for 23 people = 1- (365*364*363...*343/365*365*365*...*365)=1-.4927 = 50.73%

4.3 random variables

-random variable is a variable that assigns a number to each outcome of an experiment. This is not to be confused with an algebraic variable. -the probability distribution of a random variable is a listing of each possible outcome of a random variable together with that outcomes probability -X: X1, X2, X3... -P(X): P1,P2,P3... example: toss a coin 3 times. Let X=the number of heads -X:0,1,2,3 -P(X): 1/8,3/8,3/8,1/8

Better example

-woman visits her doc and gets tested for rare disease -doc indicates that the test is 99% accurate (false positive=1%) -woman tests positive, she concludes there is a 99% chance she has the disease -this is a rare disease, suppose the incidence in the population is 1 in 50,000. -if 50,000 people are tested, we would expect 500 to test positive even though only one person has the disease -thus, even after testing positive, she only has a 1 in 500 chance of having the disease

-It is denoted A^c and may be thought of as the event "not A"

...

-The complement of an event A is the event that A doesn't happen

...

-The event A or B is both sets taken together.

...

-Two events are Independent if the probability of one occurring is not influenced by the other occurring

...

-disjoint events are sometimes called mutually exclusive, since the occurrence of one excludes the possibility of the other occurring

...

-do not confuse independence with disjoin. Independence cannot be illustrated on a Venn diagram.

...

-sum of probabilities = 1 > 20p=1 > p=0.05

...

-the Empty set, denoted ø is the set containing no elements at all

...

3) The conditions np >= 5 and nq >= 5 are both satisfied, so the binomial distribution of sample proportions can be approximated by a normal distribution with µ = np and σ = (npq)^(1/2) . Note: p is the assumed proportion not the sample proportion.

...

4. If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean.

...

4.1 randomness

...

4.2 probability models

...

A special symbol (such as an asterisk) is used to identify outliers.

...

About 68% of all values fall within 1 standard deviation of the mean.

...

About 95% of all values fall within 2 standard deviations of the mean.

...

About 99.7% of all values fall within 3 standard deviations of the mean.

...

Compare standard deviations of two different data sets only if the they use the same scale and units, and they have means that are approximately the same

...

Complete Regression Analysis 3. Use a histogram and/or normal quantile plot to confirm that the values of the residuals have a distribution that is approximately normal. 4. Consider any effects of a pattern over time.

...

Disadvantage - Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center

...

E(X)=(5000)(.05)+(2000)(.10)+(1000)(.35)+(500)(.5)=1050 V(X)=(5000-1050)^2(.05)+(2000-1050)^2(.10)+(1000-1050)^2(.35)+(500-1050)^2(.5)=(powerpoint says 1,022,500 but I got 378,996,250)

...

Example: P(A)=.2 P(B)= .6 P(A or B)= .68 are A and B independent? (first use addition rule) By the addition rule. P(A and B)=.12 and (.2)(.6)=.12, so A and B are independent.

...

For any event A, the probability of A is between 0 and 1 inclusive. That is, 0 <= P(A) <= 1

...

For many data sets, a value is unusual if it differs from the mean by more than two standard deviations

...

Has the same units of measurement as the original data

...

Hypothesis Test Statistic for Matched Pairs

...

If the two populations have radically different variances, then F will be a large number.

...

If |r| ≤ critical value from Table A-6, fail to reject H0 and conclude that there is not sufficient evidence to support the claim of a linear correlation.

...

Margin of Error: E = (upper confidence limit - lower confidence limit) / 2

...

Note that if A and B are independent events, P(B A) is really the same as P(B).

...

P(A or B) = P(A) + P(B) - P(A and B)

...

P(A) = number of ways A can occur / number of different simple events

...

P(E) = 1 - P(E`) or P(E`) = 1 - P(E)

...

P(F) = 1 - p = q (q = probability of failure)

...

P(x) = nCx p^x(1-p)^(n-x) x = 0,1, 2, ⋅⋅⋅,n where p is the probability of success

...

Population variance: 2 - Square of the population standard deviation

...

Redeeming Features (1)very easy to compute (2)reinforces that there are several ways to define the center (3)Avoids confusion with median

...

Round only the final answer, not values in the middle of a calculation.

...

Sample variance: s2 - Square of the sample standard deviation s

...

The complement of getting at least one item of a particular type is that you get no items of that type.

...

The probability of an event that is certain tooccur is 1.

...

The probability of an impossible event is 0.

...

The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier.

...

The symbol β (beta) is used to represent the probability of a type II error.

...

The symbol 𝞪(alpha) is used to represent the probability of a type I error.

...

Use a nonparametric method or bootstrapping If Population is not normally distributed and n ≤ 30

...

Use t distribution if σ not known and normally distributed population or σ not known and n > 30

...

Values close together have a small standard deviation, but values with much more variation have a larger standard deviation

...

We consider rearrangements of distinct items to be different sequences.

...

We consider rearrangements of the same items to be different sequences. (The permutation of ABC is different from CBA and is counted separately.)

...

We select all of the n items (without replacement).

...

We select r of the n items (without replacement).

...

bonus: 5000 2000 1000 500 Probability: .05 .10 .35 .50

...

nCr = n! / [r!(n-r)!]

...

nPr = n!/(n-r)!

...

Requirements for Testing Claims About a Population Mean (with σ Known)

1) The sample is a simple random sample. 2) The value of the population standard deviation σ is known. 3) Either or both of these conditions is satisfied: The population is normally distributed or n > 30.

Requirements for Testing Claims About a Population Mean (with σ Not Known)

1) The sample is a simple random sample. 2) The value of the population standard deviation σ is not known. 3) Either or both of these conditions is satisfied: The population is normally distributed or n > 30.

Requirements for Testing Claims About a Population Proportion p

1) The sample observations are a simple random sample.

Three Conditions for Bernoulli Trials

1) each trial has two possible outcomes; p = success, q = 1-p 2) trials are independent 3) probability of a success remains the same from trial to trial

Cumulative Distribution Function:

F(x) = P[X<x] (Probability to the left of, and including, the point x)

CLASS PROBLEM: The probability of encountering heavy traffic on a Monday is 0.8, and the probability of encountering heavy traffic on a Tuesday is 0.6 1. someone claims the probability of heavy traffic occurring both days is .3, why is this impossible? 2. the person retracts their claim, but insists that Monday and Tuesday are independent of each other. What is the probability of encountering heavy traffic on Monday or Tuesday? 3. What is the probability of encountering heavy traffic at least one Tuesday in a Month? (successive Tuesdays are independent)

1. 0.8+0.6-0.3=1.1 2. P(M or T)= P(M)+P(T)-P(M and T)= 0.8+0.6-(0.8)(0.6)=0.92 3. P(equal to or greater than 1)=1-P(none)=1-(0.4)^4=0.9744

Complete Regression Analysis

1. Construct a scatterplot and verify that the pattern of the points is approximately a straight-line pattern without outliers. (If there are outliers, consider their effects by comparing results that include the outliers to results that exclude the outliers.) 2. Construct a residual plot and verify that there is no pattern (other than a straight-line pattern) and also verify that the residual plot does not become thicker (or thinner).

Helpful Hints

1. Don't confuse z scores and areas. z scores are points along the horizontal scale, but areas are regions under the normal curve. 2. Choose the correct (right/left) side of the graph. 3. A z score must be negative whenever it is located in the left half of the normal distribution. 4. Areas (or probabilities) are positive or zero values, but they are never negative.

Sample Mean

1. For all populations, the sample mean x is an unbiased estimator of the population mean xbar, meaning that the distribution of sample means tends to center about the value of the population mean μ. 2. For many populations, the distribution of sample means x tends to be more consistent (with less variation) than the distributions of other sample statistics.

Practical Rules Commonly Used

1. For samples of size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution. The approximation gets closer to a normal distribution as the sample size n becomes larger. 2. If the original population is normally distributed, then for any sample size n, the sample means will be normally distributed (not just the values of n larger than 30). standard deviation of sample mean or standard error of the mean~σ(x) = σ / (n)^(1/2)

Binomial Probability Distribution

1.The procedure must have a fixed number of trials. 2. The trials must be independent. 3. Each trial must have all outcomes classified into two categories (commonly, success and failure). 4.The probability of success remains the same in all trials

Using the Regression Equation for Predictions

1.Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well.

Important Properties of the Student t Distribution

1. The Student t distribution is different for different sample sizes (see the following slide, for the cases n = 3 and n = 12). 2. The Student t distribution has the same general symmetric bell shape as the standard normal distribution but it reflects the greater variability (with wider distributions) that is expected with small samples. 3. The Student t distribution has a mean of t = 0 (just as the standard normal distribution has a mean of z = 0). 4. The standard deviation of the Student t distribution varies with the sample size and is greater than 1 (unlike the standard normal distribution, which has a σ = 1). 5. As the sample size n gets larger, the Student t distribution gets closer to the normal distribution.

Properties of the Distribution of the Chi-Square Statistic

1. The chi-square distribution is not symmetric, unlike the normal and Student t distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric. 2. The values of chi-square can be zero or positive, but they cannot be negative. 3. The chi-square distribution is different for each number of degrees of freedom, which is df = n - 1. As the number of degrees of freedom increases, the chi-square distribution approaches a normal distribution. In Table A-4, each critical value of X^2 corresponds to an area given in the top row of the table, and that area represents the cumulative area located to the right of the critical value.

criteria for a binomial probability experiment

1. The experiment is performed a fixed number of times. Each repetition is called a trial. 2. The trials are independent. The outcome of one trial will not affect the outcome of the other trials. 3. For each trial, there are two mutually exclusive (disjoint) outcomes: success or failure. 4. The probability of success is the same for each trial of the experiment.

Rules of probability

1. The probability of any event must be between 0 and 1, inclusive. 0 ≤ P(E) ≤ 1. 2. The sum of the probabilities of all outcomes must equal 1. 3. If E and F are disjoint events, then P(E or F) = P(E) + P(F). If E and F are not disjoint events, then P(E or F) = P(E) + P(F) - P(E and F) 4. If E represents any event and Ec represents the complement of E, then P(Ec) = 1 - P(E) 5. If E and F are independent events, then P(E and F) = P(E)∗P(F)

binomial probability distribution

1. The procedure has a fixed number of trials 2. The trials must be independent. (The outcome of any individual trial doesn't affect the probabilities in the other trials.) 3. Each trial must have all outcomes classified into two categories (commonly referred to as success and failure). 4. The probability of a success remains the same in all trials.

Confidence Interval for Estimating a Population Mean (with σ Known)

1. The sample is a simple random sample. (All samples of the same size have an equal chance of being selected.) 2. The value of the population standard deviation σ is known. 3. Either or both of these conditions is satisfied: The population is normally distributed or n > 30.

Hypothesis Test for Correlation Requirements

1. The sample of paired (x, y) data is a simple random sample of quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.

Comparing Variation in Two Samples Requirements

1. The two populations are independent. 2. The two samples are simple random samples. 3. The two populations are each normally distributed. IT DOES NOT MATTER OF THE POPULATION IS > 30

notation used in binomial probability distribution

1. There are n independent trials of the experiment. 2. P denotes the probability of success for each trial so that 1-p is the probability of failure for each trial. 3. X denotes the number of successes in n independent trials of the experiment. 0 ≤ x ≤ 1.

Procedure for Constructing a Confidence Interval for σ or σ^2

1. Verify that the required assumptions are satisfied. 2. Using n - 1 degrees of freedom, refer to Table A-4 or use technology to find the critical values X(r)^2 and X(L)^2 that correspond to the desired confidence level/ 3. Evaluate the upper and lower confidence interval limits using this format of the confidence interval: ( [n - 1] * s^2 ) / X(r)^2 < σ^2 < ( [ n - 1 ] * s^2) / X(L)^2 4. If a confidence interval estimate of is desired, take the square root of the upper and lower confidence interval limits and change σ^2 to σ. 5. Round the resulting confidence level limits. If using the original set of data to construct a confidence interval, round the confidence interval limits to one more decimal place than is used for the original set of data. If using the sample standard deviation or variance, round the confidence interval limits to the same number of decimals places.

Procedure for Constructing a Confidence Interval for µ (with Known σ)

1. Verify that the requirements are satisfied. 2. Refer to Table A-2 or use technology to find the critical value z(α/2) that corresponds to the desired confidence level 3. Evaluate the margin of error E = z(α/2) * ( σ / (n)^(1/2) ) 4. Find the values of xbar - E and xbar + E. Substitute those values in the general format of the confidence interval 5. Round using the confidence intervals round-off rules.

Procedure for Constructing a Confidence Interval for µ (With σ Unknown)

1. Verify that the requirements are satisfied. 2. Using n - 1 degrees of freedom, refer to Table A-3 or use technology to find the critical value t(α/2) that corresponds to the desired confidence level. 3. Evaluate the margin of error E = t(α/2) • [ s / n^(1/2 ] . 4. Find the values of xbar - E and xbar + E. Substitute those values in the general format for the confidence interval: xbar - E < μ < xbar + E 5. Round the resulting confidence interval limits

Reasons for Sampling 8

1. lower cost 2. less time 3. provides relevant information 4. population might be destroyed 5. population size might be infinite 6. population might not be available 7. Risk factor 8. Avoid administrative problems

Procedure for Constructing a Confidence Interval for p

1.Verify that the required assumptions are satisfied. (The sample is a simple random sample, the conditions for the binomial distribution are satisfied, and the normal distribution can be used to approximate the distribution of sample proportions because np >= 5, and nq >= 5 are both satisfied.) 2. Refer to Table A-2 and find the critical value z(α/2) that corresponds to the desired confidence level. 3. Evaluate the margin of error 4. Using the value of the calculated margin of error, E and the value of the sample proportion, p, find the values of p - E and p + E. Substitute those values in the general format for the confidence interval: p̂ - E < p̂ < p̂ + E 5. Round the resulting confidence interval limits to three significant digits.

Exponential Distribution: E[X] & VAR[X] =

1/lamn & 1/(lamn^2)

told: exponential distribution with mean of 3 (lamna =)

1/lamn = 3 lamna = 1 / theta

Rules of Thumb (2)

2) "or" means add when the events are disjoint -roll two dice. What is the probability that the sum of the faces is 5 or 11? -since the sum cannot be 5 and 11 at the same time, these are disjoint outcomes, so we add: P(sum=5)+P(sum=11)=4/36+2/36=1/6

Rules of Thumb (3)

3) for any probability question, first decide whether it is easier to calculate it directly, or easier to calculate the opposite and subtract from 1. -a coin is tossed 7 times, what is the probability of tails occurring at least once? -easier to answer the opposite: P(tails at least once)=1-P(no tails) -"no tails" means "heads first AND heads second AND..." P(no tails)=.5 x .5 x .5 x .5 x .5 x .5 x .5= .0078 P(at least once)=1-P(no tails)=1-0.0078=.9922

Using the Regression Equation for Predictions cont

3. Use the regression line for predictions only if the data do not go much beyond the scope of the available sample data. (Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.)

10 - 90 Percentile Range

90th percentile - 10th percentile

Basic Probability: A - B =

A & B'

Ω (Sample Space)

A Collection of all Outcomes of an experiment. i.e., Heads, Tails

Events

A Set of Outcomes and a Set of the Sample Space. i.e., gender of a person, card is black

Type I Error

A Type I error is the mistake of rejecting the null hypothesis when it is actually true.

Type II Error

A Type II error is the mistake of failing to reject the null hypothesis when it is actually false.

z-score

A __________ is the distance between a selected value (x) and the population mean (µ) divided by the population standard deviation (σ).

Example: S = {0,1,2,3,4,5,6,7,8} A = {2,3,6,7} B = {0,3,6,8}

A and B = {3,6} A or B = {0,2,3,6,7,8} A^c and B = {0,8} A^c or B^c = {0,1,2,4,5,7,8} (A and B)^c = {0,1,2,4,5,7,8} (A or B)^c = {1,4,5} A and A^c = {ø}

Histogram

A bar graph that shows the frequency of data within equal intervals.

Boxplot skeletal (or regular)

A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3.

causation

A cause and effect relationship in which one variable controls the changes in another variable.

Factorial Rule

A collection of n different items can be arranged in order n! different ways. (This factorial rule reflects the fact that the first item may be selected in n different ways, the second item may be selected in n - 1 ways, and so on.)

Event

A collection of outcomes. Usually, we identify events in order to attach probabilities to them. We denote events with bold capital letters such as A, B, or C.

left

A computed z for x values to the __________ of the mean is negative.

degrees of freedom

A concept used in tests of statistical significance; the number of observations that are free to vary to produce a known outcome.

Uniform Distribution

A continuous random variable has a uniform distribution if its values are spread evenly over the range of probabilities. The graph of a uniform distribution results in a rectangular shape.

correlation

A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way.

critical value

A critical value is any value that separates the critical region (where we reject the null hypothesis) from the values of the test statistic that do not lead to rejection of the null hypothesis. The critical values depend on the nature of the null hypothesis, the sampling distribution that applies, and the significance level 𝞪

Critical Value

A critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur.

Density Curve

A curve that is on or above the horizontal axis and has an area of exactly 1 underneath it. It describes the overall pattern of a distribution. Merely a model - no set of real data is exactly described by a density curve.

Density Curve

A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve must have a vertical height that is 0 or greater. (That is, the curve cannot fall below the x-axis.)

Trial

A single attempt or realization of a random phenomenon.

Venn Diagram

A diagram showing a sample space S and events as areas within S. Overlaps indicate non-disjoint events.

Venn diagram

A diagram that uses circles contained within a rectangle to display elements of different sets. The rectangle represents the sample space, and circles represent events.

Tree Diagram

A display of conditional events or probabilities that is helpful in thinking through conditioning.

J-Shaped Distribution

A few data values on the left that increases as one moves to the right

Probability

A finite measure with any value from 0 to 1

normal distribution

A function that represents the distribution of variables as a symmetrical bell-shaped graph.

Histogram

A graph that displays the data by using adjacent vertical bars of various heights to represent the frequencies of the classes

Pictogram

A graph that uses pictures as representation instead of actual figures or dots.

Sampling Frame

A list of individuals from which a sample is actually selected.

five- number summary

A list of numbers that lists the minimum, first quartile, median, third-quartile, and the maximum of a data set.

lurking variable

A lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

matched pairs design

A matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are randomly assigned to different treatments.

Probability Model

A mathematical description of a random phenom consisting of two parts: a sample space S, and a way of assigning probabilities to events.

Modified Boxplot Construction

A modified boxplot is constructed with these specifications:

less than

A negative z-score indicates that the corresponding value in the original distribution is __________ the mean.

standard normal curve

A normal distribution with mean of zero and standard deviation of one. Probabilities are given in Table A for values of the standard Normal variable.

Trend

A noted tendency of change on a graph or set of data

Personal/Subjective Probability

A number between 0 and 1 that expresses an individual's judgement of how likely the outcome is.

random variable

A numerical measure of the outcome of a probability experiment, so its value is determined by chance. Random variables are typically denoted using capital letters such as X.

Parameter

A numerical measure that describes an aspect of a population

Statistic

A numerical measure that describes an aspect of a sample

statistic

A numerical measurement describing some characteristic of a sample

Random Phenomenon

A phenomenon is random if we know what outcomes could happen, but not which particular values did or will happen.

Boxplot

A plot of data based on the five number summary. A line is drawn from the minimum observation to Q1; a box is drawn from Q1 to Q3 with a vertical line at the median and a line is drawn from Q3 to the maximum observation.

Empirical Probability

A probability calculated from our knowledge of numerous similar past events.

Theoretical Probability

A probability calculated from understanding the phenom in the problem.

Continuous Probability Model

A probability model that assigns probabilities as areas under a density curve; the probability of any event is the area under the curve and above the values on the horizontal axis that make up the event.

Discrete/Categotical Probability Model

A probability model with a sample space made up of a finite list of individual outcomes.

Conditional Probability

A probability that takes into avvount a given condition.

discrete random variables

A random variable that assumes countable values

random sample

A sample in which every member of the population has an equal chance of being selected

Finite Sample Space

A sample space dealing with either discrete or categorical variables that can take on only certain values.

Statistics

A science that deals with methods of collecting, organizing, and summarizing data in such a way that valid conclusions can be drawn from them

Exhaustive

AT LEAST ONE WILL OCCUR for sure

Class Problem: Out of 125 students surveyed, 12 were accounting majors, 24 were business majors, and 34 were either an accounting major or business major (or both). Draw and label a Venn Diagram

Acc 10 Both 2 Bus 22

Identifying Unusual Results Range Rule of Thumb

According to the range rule of thumb, most values should lie within 2 standard deviations of the mean. We can therefore identify "unusual" values by determining if they lie outside these limits: Maximum usual value = μ + 2σ Minimum usual value = μ - 2σ

Description of mean

Advantages - Is relatively reliable, means of samples drawn from the same population don't vary as much as other measures of center. Takes every data value into account

The outcome of a single individual outcome for a continuous probability model

All continuous probability models assign a probability of 0 to any individual outcome; only intervals of values can have positive probability.

informed consent

All individuals who are subjects in a student must give this before data is collected

association

Although there may be a strong ________________ between variables this does not necessarily imply there is causation.

causation

Although there may be a strong association between variables this does not necessarily imply there is this

Probability Limits

Always express a probability as a fraction or decimal number between 0 and 1.

Legitimate probability assignment

An assignment of probabilities to outcomes is legitimate if... a) each probability if between 0 and 1 (inclusive). b) the sum of the probabilities is 1.

Outlier

An individual observation that falls outside the overall pattern of the graph.

Event

An outcome or set of outcomes of a random phenom; a subset of the sample space.

Important Principles of Outliers

An outlier can have a dramatic effect on the mean. An outlier can have a dramatic effect on the standard deviation. An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured.

Outliers

An outlier is a value that lies very far away from the vast majority of the other values in a data set.

Variable

Any characteristic of a person or thing that can be assigned a number or category

0 ≤ P ≤ 1

Any probability is a number between 0 and 1.

Cr(n, k) = k+n-1! / k!(n-1)!

Combination With Replacement

C(n, k) = (n k) = n! / k!(n-k)!

Combination Without Replacement

Exhaustive Outcomes:

Combine to the entire probability space. Or, one of the outcomes must occur whenever the experiment is performed.

P(Ã) = 1-P(A)

Complement

Survival Function:

Complement of the Cumulative Distribution Function

Central Limit Theorem - continued

Conclusions: 1. The distribution of sample x will, as the sample size increases, approach a normal distribution. 2. The mean of the sample means is the population mean µ. 3. The standard deviation of all sample means is σ / (n)^(1/2)

P(A|B) = P(A∩B) / P(B)

Conditional Probability

P(B|A) = P(A and B)/P(A)

Conditional probability

Confidence Intervals for Comparing Data Caution

Confidence intervals can be used informally to compare the variation in different data sets, but the overlapping of confidence intervals should not be used for making formal and final conclusions about equality of variances or standard deviations.

Inferential Statistics

Consists of methods for drawing and measuring the reliability of conclusions about a population based on info obtained from a sample of the population

Descriptive Statistics

Consists of methods for organizing and summarizing info

takes all values in an interval of numbers, described by a density curve.

Continuous random variable

Poisson distribution used as a model for:

Counting the number of events of a certain type that occur in a certain period of time

Observational Study compared to a Designed Experiment

Designed experiment - treatments are imposed and experiment is controlled Observational study - experiment is only observed, no treatments imposed

Tree Diagrams

Diagrams that will show P(A) as independent branches, then P(B|A) as branches coming off those branches, etc. until a final event is reached. The probability of any one event occurring can be calculated by multiplying the probabilities of each branch along the way.

Sampling Error

Difference between measurement from a sample and the population because the sample does not perfectly represent the population

Distinguishable

Different order yields a different outcome

random variable with either a finite (whole) number value or a countable number

Discrete random variable

A∩B∩C = Ø

Disjoint or Mutually Exclusive Events

The Varianc is a measure of the:

Dispersion of X about the mean. Will always be > or equal to 0

Bar graph

Displays the distribution of a categorical variable

Bimodal

Distribution has tow peaks of about the same height

Unimodal

Distribution with one peak

Stratified Sampling

Divide the entire population into distinct subgroups called strata. The strata are based on a specific characteristic such as age, income, education level, and so on. All members of a stratum share the specific characteristic. Draw random samples from each stratum.

Cluster Sampling

Divide the entire population into pre-existing segments or clusters. The clusters are often geographic. Make a random selection of clusters. Include every member of each selected cluster in the sample.

Descriptive Statistics

Don't draw inferences present data in a table format

complement

E (event) does not occur

Margin of Error for Proportions

E = z(α/2) * ( [p̂ * q̂] / n ) ^(1/2)

Confidence Interval Estimate of p(1) - p(2)

E = z(𝞪/2) * ( (p(1) * q(1) / n(1)) * (p(2) * q(2) / n(2)))

expected value formula

E = ∑ [x • P(x)]

Properties of Mean and Variance

E(c)=c V(c)=0 E(X+/-Y)=E(X)+/-E(Y) E(cX)=cE(X) V(cX)=c^2V(X) if X and Y are independent: V(X+/-Y)=V(X)+V(Y)

COV[X,Y] =

E[XY] - E[X] * E[Y]

M'x(0) =

E[X]

If X & Y are independent, then E[X*Y] =

E[X] * E[Y]

E[X + Y] =

E[X] + E[Y]

M''x(0) =

E[X^2]

VAR[X] =

E[X^2] - (E[x])^2

Mx(t) =

E[e^(tx)]

Class

Each raw data value is placed into a quantitative or qualitative category

Uniform Probability:

Each sample point has the same probability of occurring

With Replacement

Each time same probability as the probability of the initial set

Roundoff error

Errors in rounding off a decimal. These add up over time.

Disjoint or Mutually Exclusive

Events A and B are disjoint (or mutually exclusive) if they cannot occur at the same time. (That is, disjoint events do not overlap.)

mutually exclusive

Events that cannot occur at the same time.

INDEPENDENCE: two events A and B are independent if P(A and B)=P(A)*P(B)

Example: P(A)= .3 P(B)= .5 P(A and B)= .10 .15 does not equal .10 so A and B are not independent

A∪B∪C = Ω

Exhaustive

Negative Binomial Distribution:

Experiment performed repeatedly until the r-th success (X # of failures)

using a normal distribution as an approximation to the binomial probability distribution.

If the conditions of np ≥ 5 and nq ≥ 5 are both satisfied, then probabilities from a binomial probability distribution can be approximated well by using a normal distribution with mean μ = np and standard deviation σ = (n * p * q) ^ (1/2)

Student t Distribution

If the distribution of a population is essentially normal, then the distribution of t = (xbar - μ) / [ s / n^(1/2) ]

dependent event

If the occurrence of one event has no effect on the occurrence of the other

Properties of the F Distribution - continued

If the two populations do have equal variances, then F = s(1) / s(2) will be close to 1 because and are close in value.

Disjoint ≠ Independent

If two events are disjoint, then the occurrence of one would mean the non-occurrence of the other. If events are independent, then non/occurrence is moot.

standard normal distribution

If we convert values of a normal distribution to a distribution that has a mean of 0 and a standard deviation of 1, this probability distribution is called __________.

Multiplication Principle

If you can do one task in n1 ways and a second task in n2 ways, then both tasks can be done in n1*n2 ways

Hypothesis Test for Correlation Conclusion

If |r| > critical value from Table A-6, reject H0 and conclude that there is sufficient evidence to support the claim of a linear correlation.

Rare Event Rule for Inferential Statistics

If, under a given assumption (such as the assumption that a coin is fair), the probability of a particular observed event (such as 992 heads in 1000 tosses of a coin) is extremely small, we conclude that the assumption is probably not correct.

Rare Event Rule for Inferential Statistics

If, under a given assumption, the probability of a particular observed event is extremely small, we conclude that the assumption is probably not correct.

Examples:

In a game, a die is thrown. Alan pays Sally $1 if the die falls 1,2, or 3, and $3 if the die falls 4 or 5. If the die falls 6, Sally has to pay Alan $8. What is the expected value and standard deviation of the amount Sally wins? Winnings X: 1 3 -8 P(X): 0.5 0.333 0.1667 -E(X)=(1)(0.5)+(3)(0.333)+(-8)(0.1667)=(power point got $0.1667 but my calculations were $0.1654) -PwPtV(X)=(1-.1667)^2(0.5)+(3-.1667)^2(.333)+(-8-.1667)^2(.1667)= 14.13 -myV(X)= (1-.1654)^2(0.5)+(3-.1654)^2(.333)+(-8-.1654)^2(.1667)=14.13

equal

In a normal distribution, the relationship between the mean, median and mode is __________.

Chi-Square Distribution

In a normally distributed population with variance σ^2 assume that we randomly select independent samples of size n and, for each sample, compute the sample variance s2 (which is the square of the sample standard deviation s). The sample statistic x^2 (pronounced chi-square) has a sampling distribution called the chi-square distribution.

The Addition Rule: P(A or B)=P(A) + P(B)-P(A and B) Overlap counted twice, subtract out once.

In an office building of 80 people, 28 work on Saturday, 11 work on Sunday, and 3 people work on both Sunday and Saturday. What is the probability that a person in this office works at least one of these days? P(Sat or Sun)= P(Sat) + P(Sun) - P(Both) = 28/80+11/80-3/80=.45

P(A|B) = P(A)

Independent

P(A∩B) = P(A) • P(B)

Independent Events

if P(B|A) = P(B)

Independent events

influential observations

Individual points that change the regression line. Often outliers in the x direction, but require large residuals.

Nonresponse

Individuals either cannot be contacted or refuse to participate. [ ] can result in significant undercoverage of a population.

Voluntary Response

Individuals with strong feelings about a subject are more likely than others to respond. Such a study is interesting but not reflective of the population.

voluntary response

Individuals with strong feelings about a subject are more likely than others to respond. Such a study is interesting but not reflective of the population.

P(A∩B) = P(A|B) • P(B)

Intersection (Not necessarily independent)

Description of Ranged

It is very sensitive to extreme values; therefore not as useful as other measures of variation.

Comparing Variation in Different Samples

It's a good practice to compare two sample standard deviations only when the sample means are approximately the same. When comparing variation in samples with very different means, it is better to use the coefficient of variation, which is defined later in this section.

Converting from the kth Percentile to the Corresponding Data Value

L = (k/100) * n

Range

Larger - smallest Max - Min

rules for a discrete probability distribution

Let P(x) denote the probability that the random variable X equals x; 1. then ∑P(x) = 1 2. 0 ≤ P(x) ≤ 1

double-blind

Neither the test subject, nor the administrator knows the treatment given.

equation for computing probability using the classical method

P(E) = (number of ways that E can occur)/ (number of possible outcomes) = m/n

Benford's Law

Mathematical algorithm that accurately predicts that, for many data sets, the first digit of each group of numbers in a random sample will begin with 1 more than a 2, a 2 more than a 3, a 3 more than a 4, and so on. Predicts the percentage of time each digit will appear in a sequence of numbers.

mean of X = multiply each possible value by its probability and then add it up

Mean of a discrete random variable

mean = 1/p

Mean of geometric random variable

Census

Measurements or observations from the entire population are used.

Qualitative

Measures a non-numerical category (aka categorical)

Quantitative

Measures a numerical amount (discreet and continuous)

Quantitative variable

Measures a numerical characteristic such as height

Properties of the Standard Deviation

Measures the variation among data values

If you can do one task n number of ways and a second m number of ways, then both tasks can be done in n*m ways.

Multiplication principle

P(A and B) = P(A)*P(B)

Multiplication rule for independent events

These are counting Rules.....

NOT trying to find probability!

Double-Blind

Neither the individuals in the study nor the observers know which subjects are receiving the treatment.

standard normal distribution

Normal distribution with a mean µ=0 and a standard deviation σ=1 is known as __________.

P(A∪B) = P(A) + P(B) - P(A∩B)

Not Mutually Exclusive

Resistant

Not influenced by size of data, but by positioning of data when set in numerical order.

Systematic Sampling

Number all members of the population sequentially. Then, from a starting point selected at random, include every Nth member of the population in the sample.

Data

Numbers or categories recorded for the observational units in a study

outliers

Numbers that are much greater or much less than the other numbers in the set

Observational Study

Observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured.

Placebo Effect

Occurs when a subject receives no treatment but (incorrectly) believes he or she is, in fact, receiving treatment and responds favorably.

Lurking Variable

One [variable] for which no data have been collected but that nevertheless had influence on other variables in the study.

Completely Randomized Experiment

One in which a random process is used to assign each individual to one of the treatments.

One-Tailed Tests

One-tailed tests can occur with a claim of a positive linear correlation or a claim of a negative linear correlation. In such cases, the hypotheses will be as shown here. left tailed test: p < 0 right tailed test p > 0 For these one-tailed tests, the P-value method can be used as in earlier chapters.

Reverse J-Shaped Distribution

Opposite if the J-shaped distribution

Indistinguishable

Order is not important

Descriptive Statistics

Organizing, picturing, and summarizing information from samples or populations

∩ (Intersection)

Outcome that is in one "AND" the other

∪ (Union)

Outcome that is in one "OR" the other

à (Complement)

Outcomes that Do Not occur in A

A-B or A\B

Outcomes that are in A but not in B

Mutually Exclusive Outcomes:

Outcomes that cannot occur simultaneously. (disjoint)

Notation for Probabilities

P - denotes a probability. A, B, and C - denote specific events. P(A) - denotes the probability of event A occurring.

Class problem: Roll a die twice, what is the probability that the number on the second cast is greater than the one on the first cast?

P(2nd>1st) = 15/36=5/12

Formal Multiplication Rule

P(A and B) = P(A) • P(B | A)

The Multiplication Rule for Independent Events

P(A and B) = P(A)P(B)

Compound Event

P(A or B) = P (in a single trial, event A occurs or event B occurs or they both occur)

Addition rule for disjoint events

P(A or B) = P(A) + P(B) If two events are disjoint, the probability of getting one or the other is the sum of their individual probabilities.

Rule of Complementary Events

P(A) + P(Abar) = 1 P(Abar) = 1 - P(A) P(A) = 1 - P(Abar)

Basic Rules for Computing Probability (Rule 1) - Relative Frequency Approximation of Probability

P(A) = # of times A occurred / # of times procedure was repeated

P(A and B) (non-independent)

P(A)*P(B|A)

Basic Rules for Computing Probability (Rule 3) - Subjective Probabilities

P(A), the probability of event A, is estimated by using knowledge of the relevant circumstances.

Conditional Probability

P(B/A)= P(A∩B)/P(A) Read as: "the probability of B given A."

Conditional Probability

P(BlA) = P(A and B)/P(A). P(BlA) is read " the probability of B given A."

Independence (Formally)

P(BlA) = P(B) when A and B are independent.

Independence (used formally)

P(BlA) = P(B) when A and B are independent.

When P(A)>0, the conditional probability of event B occurring given A occurs is

P(B|A) = P(A and B) / P(A)

Notation for Conditional Probability

P(B|A) represents the probability of event B occurring after it is assumed that event A has already occurred (read B|A as "B given A.")

Equation for conditional probability

P(B|A)= P(A and B)/P(A)

multiplication rule for n independent events

P(E and F and G and ...) = P(E) ∗ P(F) ∗P(G)

multiplication rule for independent events

P(E and F) = P(E) ∗ P(F)

general multiplication rule

P(E and F) = P(E) ∗ P(F|E)

general addition rule

P(E or F) = P(E) + P(F) - P(E and F)

"None"

P(E)

Complement Rule

P(E) Is the set of outcomes in the sample space that are not included in the outcomes of E

Classical Probability

P(E) = # of ways the trial can occur total # of outcomes Whenever you are finding probability where the sample space is the same.

Empirical Probability

P(E) = frequency for the class = f total frequencies in the distribution n Relies on actual experience to determine the likelihood of outcomes.

equation for approximating probabilities using the empirical approach

P(E) ≈ relative frequency of E = (frequency of E)/(number of trials of experiment)

complement rule

P(Ec) = 1 - P(E)

Notation for Binomial Probability Distributions

P(S) = p (p = probability of success)

Calculating probability: Roll a die twice, what is the probability that the sum of the faces will be 8?

P(Sum=8)=5/36

Outlier

a value that differs abnormally from the other observations

Finding the Probability of "At Least One"

P(at least one) = 1 - P(none)

Specificity

P(negative|non-diseased) Want to be as high as possible to avoid false positives.

Sensitivity

P(positive|diseased) Want to be as high as possible, to diagnose.

Probability Formula

P(x) = (n! / ((n - x)! * x!))) * p^x * q^(n-x)

Requirements for Probability Distribution

P(x) = 1 where x assumes all possible values. 0 <= P(x) = 1 for every individual value of x.

95.44%

P(µ-2σ<x<µ+2σ)=

99.74%

P(µ-3σ<x<µ+3σ)=

68.26%

P(µ-σ<x<µ+σ)=

Decision Criterion

P-value method: Using the significance level : If P-value <= 𝞪 , reject H0. If P-value > 𝞪 , fail to reject H0.

Positive Predictive Value

PPV = # of true positives / total # positives

P[A or B (giv) C] =

P[A(giv)C] + P[B(giv)C] - P[A & B (giv) C]

P[A or B or C] =

P[A] + P[B] + P[C] - P[A & B] - P[A & C] - P[B & C] + P[A & B & C]

P[A or B] =

P[A] + P[B] - P[A & B]

Conditional Probability: P[B] =

P[B(giv)A] * P[A] + P[B(giv)A'] * P[A']

Conditional Probability: P[B & A] =

P[B(giv)A] * P[A}

Individuals

People/objects included in a statistical study

Rates

Percent or proportions used for each category in data or a graph.

Pr(n, k) = n^k

Permutation With Replacement (Distinguishable)

P(n, k) = n! / (n-k)!

Permutation Without Replacement (Distinguishable)

Observational unit

Person or thing assigned a number or category

Variability

Phenomenon of a variable taking on different values or categories from observational unit to observational unit.

Deviations

Pieces of data which do not follow the graphs overall pattern.

Finding the Point Estimate and E from a Confidence Interval

Point estimate of p̂ = (upper confidence limit + lower confidence limit) / 2 Margin of error E = (upper confidence limit - lower confidence limit) / 2

Finding the Point Estimate and E from a Confidence Interval

Point estimate of µ: xbar = (upper confidence limit + lower confidence limit) / 2

P(E)

Probability of an Event

P(E) = # favorable outcomes / total outcomes

Probability of an Event

The P(A) of any event is between 0 and 1 (inclusive)

Probability rule 1

If S is the sample space in a probability model, then P(S)=1

Probability rule 2

Two events A and B are disjoint if they have no outcomes in common. P(A or B)=P(A)+P(B)

Probability rule 3

the complement of A is 1-P(A)

Probability rule 4

P(A|B) = Conditional Probability

Probability that A occurs given B is known to occur

probability distribution

Provides the possible values of the random variable and their corresponding probabilities. Can be in the form of a table, graph, or mathematical formula.

Interquartile Range (or IQR)

Quartile 3 - Quartile 1

alternative hypothesis

The alternative hypothesis (denoted by H1 or Ha or HA) is the statement that the parameter has a value that somehow differs from the null hypothesis. The symbolic form of the alternative hypothesis must use one of these symbols: , <, >.

Categorical variable

Records a group designation such as gender

Distribution (of a variable)

Refers to it's pattern of variation. With a categorical variable, distribution means the variable's possible categories and the proportion of responses in each

central limit theorem

Regardless of the population distribution, The sampling distribution is normal IF n is large enough (>30).

Stemplot

Represents data by seperating each value into two parts, with the stem being the larger digit in the value and the leaf being the smaller digit.

Upper Class Limit

Represents the largest data value that can be included in the class

Lower Class Limit

Represents the smallest data value that can be included in the class

Combinations Rule

Requirements: There are n different items available. We select r of the n items (without replacement). We consider rearrangements of the same items to be the same. (The combination of ABC is the same as CBA.)

Permutations Rule (when items are all different)

Requirements: There are n different items available. (This rule does not apply if some of the items are identical to others.)

Permutations Rule (when some items are identical to others)

Requirements: There are n items available, and some items are identical to others.

Truthfulness of Response

Respondents may lie intentionally or inadvertently.

Faulty Recall

Respondents may not accurately remember when or whether an event took place.

Nonsampling Error

Result of poor sample design, sloppy data collection, bias, etc. (human error)

Undercoverage

Results when population members are omitted from the sample frame.

Roundoff Rule for μ, σ, σ^2

Round results by carrying one more decimal place than the number of decimal places used for the random variable x. If the values of x are integers, round µ, σ, and σ^2 and 2 to one decimal place.

CLASS PROBLEM: Toss a coin, if it lands heads, roll a die once. If it lands tails, flip the coin one more time. What is the sample space, and what is the size of the sample space?

S={(H,1), (H,2),...(H,6), (T,H), (T,T)} lsl=8

Test Statistic for Hypothesis Tests with Two Variances

SEE PAGE 498

Confidence Interval: Independent Samples with σ1 and σ2 Both Known

See page 479

Systematic Random Sample

Select every 'nth' subject (can be problematic if the subject is cyclical)

Convenience Sample

Select from a group of individuals that are easy to reach (severely biased!)

Description of Midrange

Sensitive to extremes , because it uses only the maximum and minimum values, so rarely used

Without Replacement

Set of possibilities reduces by 1 after each selection

0.6915

The area under a normal curve to the left of z=0.5 is __________.

0.8869

The area under a normal curve to the right of z=-1.21 is __________.

0.7012

The area under the normal curve between z=-1 and z=1.08 is __________.

Counts

Specific number for each category used for a graph or set of data.

Law of Large Numbers

States that long-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increase.

geometric distribution

Success / Failure, trials continue until successful, each outcome is independent, constant probability of success

For a discrete random variable Mx(t) =

Sum (e^(tx) * p(x)

interpretation of the mean of a discrete random variable

Suppose an experiment is repeated n independent times and the value of the random variable X is recorded. As the number of repetitions of the experiment increases, the mean value of the n trials will approach µx, the mean of the random variable X. x̄ =( x₁ + x₂ + ⋅⋅⋅ + x-sub-n)/n The difference between x̄ and µ-sub-x gets closer to 0 as n increases

Bayes's Theorem

Suppose that A₁, A₂, ... Ak are disjoint events whose probabilities are not 0 and add to exactly 1, i.e. any outcome must be exactly one of those events. The, if B is any other event whose probability is not 0 or 1, P(Ai|B) = P(B|Ai)P(Ai) / P(B|A₁)P(A₁) + ... P(B|Ak)P(Ak)

Simple Random Samples

Take 'n' measurements from a population so that every sample of size 'n' has an equal chance of being selected and every individual has an equal chance of being included (use the random number table!)

margin of error

The +- value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter

P-Value

The P-value (or p-value or probability value) is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data, assuming that the null hypothesis is true.

Joint Distribution of Random Variables:

The Probability of two or more random variables together as a joint distribution

0.9783

The area under the normal curve to the right of z=-2.02 is __________.

regression line

The best-fitting straight line

regression equation

The best-fitting straight line's equation

Coefficient of Variation

The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. Sample CV = s/xbar * 100% Population CV = mu / µ * 100%

Sample Space

The collection of all possible outcome values. The sample space has a probability of 1.

Sample Space

The collection of all possible outcomes

Complementary Events

The complement of event A, denoted by A, consists of all outcomes in which the event A does not occur

Intuitive Approach to Conditional Probability

The conditional probability of B given A can be found by assuming that event A has occurred, and then calculating the probability that event B will occur.

condtional probrability

The conditional probablity of B given A, written P(B/ A) is the probabilty that event B will occur given that event A has occured

Spread

The difference between the highest and lowest data figures.

interquartile range

The difference between the upper and lower quartiles.

deviation

The difference of a data value and the mean of a data set

interquartile range

The difference of the upper in lower quartiles of a data set

Variance

The distance of each observation form the mean and square eache of these distances. Average the distances by dividing their sum by n-1. this average squared distance is called variance.

Probability Distribution

The distribution of a random variable X that tells us what values X can take and how to assign probabilities to those values.

Standard deviation "sigma" of a density curve

The equals-area point of a density curve.

Type I Error

The error that is committed when a true null hypothesis is rejected erroneously. The probability of a Type I Error is abbreviated with the lowercase Greek letter alpha.

expected value

The expected value of a discrete random variable is denoted by E, and it represents the mean value of the outcomes. It is obtained by finding the value of [x • P(x)].

z=x-µ/σ

The formula to convert any normal distribution to the standard normal distribution is __________.

Shape

The general shape of a graph which can be described in a few words.

bell-shaped curve

The graph of a normal probability distribution curve is called a __________.

linear correlation coefficient

The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x- and y-values in a sample.

Quartiles

The medians of the two halves of a set of a data after the median of the entire set is found.

Center

The midpoint of the distribution of the graph.

null hypothesis (denoted by H0)

The null hypothesis (denoted by H0) is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We test the null hypothesis directly. Either reject H0 or fail to reject H0.

P-Value note

The null hypothesis is rejected if the P-value is very small, such as 0.05 or less.

Median

The number in a set of data which half the observations are small than and the other half of the observations are larger than.

number of permutations of distinct objects in groups

The number of arrangements of r objects chosen from n objects in which 1. the n objects are distinct 2. repetition of objects is not allowed 3. order is important

Frequency

The number of data values contained in a specific class

degrees of freedom

The number of degrees of freedom for a collection of sample data is the number of sample values that can vary after certain restrictions have been imposed on all data values. The degree of freedom is often abbreviated df. degrees of freedom = n - 1 in this section

number of combinations of n distinct objects taken r at a time

The number of different arrangements of n objects using r ≤ n of them, in which 1. the n objects are distinct 2. repetition of objects is not allowed 3. order is not important

Frequency Distribution

The organization of raw data in table form; consists of classes and frequencies

Outcome

The outcome of a trial is the value measured, observed, or reported for an individual instance of that trial. Outcomes are considered to be either a) discrete if they have distinct values such as heads or tails (even if the values are numerals) b) continuous if they take on numeric values in some rand of possible values

Independent Events

The outcome of one event does not affect the outcome of the second event

independent events

The outcome of one event does not affect the outcome of the second event

Overall pattern

The overall pattern of a graph is the basic trend the graph shows and can lead to a general explanation of the data presented.

µ,σ

The parameters of the normal distribution are __________ and __________.

Mean "mu" of a density curve

The point at which the density curve would be balanced, if it were physical.

Probability Mass:

The probability at a point of a discrete random variable.

Probability

The probability of an event is a number between 0 and 1 that reports the likelihood of the event's occurrence. A probability can be derived from equally likely outcomes, from the long-run proportion of the event's occurrence, or from known proportions, We write P(A) for the probability of the event A.

P(A does not occur) = 1 - P(A)

The probability of an event not occurring is equal to 1 minus the probability of the event happening.

Complement Rule or "at least one"

The probability of an event occurring is 1 minus the probability it doesn't occur. P(A) = 1 - P(Ac(it doesn't occur)).

General Multiplication Rule for Any Two Events

The probability that both of two events A and B happen together can be found by P(A and B) = P(A)P(B|A)

Conditional Probability

The probability that the second event B occurs given that the first event A has occurred can be found by dividing the probability that both events occurred by the probability that the first event has occurred. The formula is P(B!A) = P(A and B) P(A)

Probability

The proportion of times an outcome will occur in a very long series of repetitions.

Hidden Bias

The question may be worded in such a way as to elicit a specific response. The order of questions might lead to biased responses. Also, the number of responses on a Likert scale may force responses that do not reflect the respondent's feelings or experience.

range

The range of a numerical data set is a measure of disperison

Odds

The ratio of the probability of an outcome of a random phenom over the probability of that outcome not occurring.

Special Property

The regression line fits the sample points best.

Empirical Rule

The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve

Sample Mean

The sample mean is the best point estimate of the population mean.

Five number summary

The smallest observation, the first quartile, the median, the third quartile, and the largest observation. Usually written from smallest to largest.

Standard Deviation - Important Properties

The standard deviation is a measure of variation of all values from the mean. The value of the standard deviation s is usually positive. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others). The units of the standard deviation s are the same as the units of the original data values.

standard deviation

The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean. Also known as the Square root of variance or ( { ∑(x-xbar)^2 } / (n-1) )^(1/2)

Standard Normal Distribution

The standard normal distribution is a normal probability distribution with μ = 0 and σ = 1. The total area under its density curve is equal to 1.

Statistics

The study of how to collect, organize, analyze and interpret numerical information from data

∑P = 1

The sum of all probability outcomes is equal to 1.

"Something has to happen rule"

The sum of the probabilities of all possible outcomes of a trial must be 1.

test statistic

The test statistic is a value used in making a decision about the null hypothesis, and is found by converting the sample statistic to a score with the assumption that the null hypothesis is true.

1

The total area under the normal curve is __________.

Population Data

The variable is from every relevant individual

Sample Data

The variable is from some of the relevant individuals

Variance

The variance of a set of values is a measure of variation equal to the square of the standard deviation.

0

The z value of µ for a normal curve is always __________.

Rationale for using n - 1 versus n

There are only n - 1 independent values. With a given mean, only n - 1 values can be freely assigned any number before the last value is determined. Dividing by n - 1 yields better results than dividing by n. It causes s2 to target 2 whereas division by n causes s2 to underestimate 2.

Control Group

This group received a dummy treatment, enabling the researchers to control for the placebo effect. In general, a [ ] group is used to account for the influence of other known or unknown variables that might be an underlying cause of a change in response in the experimental group.

Confusion of the Inverse

To incorrectly believe that P(A|B) and P(B|A) are the same, or to incorrectly use one value for the other, is often called confusion of the inverse.

Combinations Rule

Used when selecting a smaller number from a larger number but the order is NOT important. nCr= n! r! (n-r)! n=sample size, r=smaller objects selecting On calculator: enter amount(n), math, PRB, 3, enter amount(r), enter

Independent Events

Two events A and B are independent events if the fact that A occurs does NOT affect the probability of B occurring. *with replacement = independent events

Dependent and Independent

Two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other. (Several events are similarly independent if the occurrence of any does not affect the probabilities of the occurrence of the others.) If A and B are not independent, they are said to be dependent.

Disjoint Events

Two events are disjoint (or mutually exclusive) if they have no outcomes in common.

Disjoint (mutually exclusive)

Two events are disjoint of they share no outcomes in common. If A and B are disjoint, then knowing that A occurs tells us that B cannot occur. Disjoint events are also called "mutually exclusive."

Independence (informally)

Two events are independent if knowing whether one event occurs does not alter the probability of the other event occurring.

Independence (used casually)

Two events are independent if knowing whether one event occurs does not alter the probability that the other event occurs.

Independence (Casually)

Two events are indpendent if knowing whether one event occurs does not alter the probability that the other event occurs

disjoint (mutually exclusive)

Two events share NO outcomes in common. As a mater of fact, knowing that A occurs tells that B CANNOT occur.

depentent event

Two events such that the occurrence of one event affect the occurrence of the other event

Mutually Exclusive Events

Two events that cannot occur at the same time (i.e., they have no outcomes in common).

mutually exclusive events

Two events that cannot occur at the same time; no common outcomes

Tails

Two tailed test: <> used Left-Tailed Test < used Right-Tailed test > used

Random Sampling

Use a simple random sample from the entire population

Multistage Sampling

Use a variety of sampling methods to create successively smaller groups at each stage. The final sample consists of clusters.

Choosing the Appropriate Distribution

Use the normal (z) distribution If σ known and normally distributed population or σ known and n > 30

Factorial Rule

Use this when you have "n" objects and you want to know how many different ways they can be arranged. n! On calculator: enter amount, math, arrow left to PRB, 4, enter

Fundamental Counting Rule

Use this when you have different positions and you want to know how many options there are within those positions. ___*___*___*___*___*___*___*___*___= 2 =512

Categorical Frequency Distribution

Used for data that can be placed into specific categories, such as nominal or ordinal level data

Randomization

Used to assign the individuals to the two treatment groups. This helps prevent bias in selecting members for each group.

Class Boundaries

Used to separate the classes so that there are no gaps in the frequency distribution

Permutations Rule

Used when selecting a smaller group from a larger group and you put them in a specific order. *ORDER IS IMPORTANT nPr= n! (n-r)! n=sample size, r=smaller objects selecting On calculator: enter amount(n), math, PRB, 2, enter amount(r), enter

Dot plot

Useful for displaying the distribution of a relatively small data set of a quantitative variable

variance of x = (x1-mean)^2*p1...(xi-mean)^2*pi

Variance of a discrete random variable

Conclusions in Hypothesis Testing

We always test the null hypothesis. The initial conclusion will always be one of the following: 1. Reject the null hypothesis. 2. Fail to reject the null hypothesis.

0.5

What proportion of the area under a normal curve is to the right of a z-score of zero?

continuous probability distribution

What type of probability distribution is the normal distribution?

Sample Size for Estimating Proportion p

When an estimate of p̂ is known n = (z(α/2)^2 * p̂ * q̂) / E^2 When no estimate of p is known: n = (z(α/2)^2 * 0.25) / E^2

Raw Data

When data are in their original form; little information can be obtained from looking at this

Permutations versus Combinations

When different orderings of the same items are to be counted separately, we have a permutation problem, but when different orderings are not to be counted separately, we have a combination problem.

Rounding Off Probabilities

When expressing the value of a probability, either give the exact fraction or decimal or round off final decimal results to three significant digits. (Suggestion: When a probability is not a simple fraction such as 2/3 or 5/9, express it as a decimal so that the number can be better understood.)

General Rule for a Compound Event

When finding the probability that event A occurs or event B occurs, find the total number of ways A can occur and the number of ways B can occur, but find that total in such a way that no outcome is counted more than once

Round-Off Rule for Measures of Variation

When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.

Grouped Frequency Distribution

When the data is large and the data must be grouped into classes that are more than one unit in width.

Negatively Skewed

When the data values are clustered to the right and taper off to the left

Confounding Variable

When the effects of one [variable] cannot be distinguished from the effects of the other. [ ] variables may be part of the study, or they may be outside lurking variables.

Formal Addition Rule

When the event-A and event-B are NOT mutually exclusive, use

Relative Frequency Graphs

When the frequencies can be converted into proportions

Random Phenomenon

When the individual outcomes of a phenomenon are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions.

Skewed to the left

When the left side of the graph extends much farther out than the ride side.

cases

When the objects are people in a set of data

Independence Definition

When the outcome of one event cannot influence the outcome of a second event.

Dependent Events

When the outcome or occurrence of the first event affects the outcome or occurrence of the second event in such a way that the probability is changed. *without replacement = dependent events

Positively Skewed

When the peak of the distribution is to the left and the data values taper off to the right

Symmetric distribution

When the right and left sides of the graph are approximately mirror images of each other.

Skewed to the right

When the right side of the graph(containing the half of the observations with larger values) extends much farther out than the left side.

Addition Rule 1

When two events A and B are mutually exclusive, the probability that A or B will occur is P(A or B) = P(A) + P(B)

Multiplication Rule 2

When two events are dependent, the probability of both occurring is P(A and B) = P(A) * P(B!A)

Multiplication Rule 1

When two events are independent, the probability of both occurring is P(A and B) = P(A) * P(B)

Disjoint

When two events have no outcomes in common and so can never occur together.

Tree Diagram

a diagram used to show the total number of possible outcomes in a probability experiment

Round-Off Rule for Confidence Intervals Used to Estimate µ

When using the original set of data, round the confidence interval limits to one more decimal place than used in original set of data. When the original set of data is unknown and only the summary statistics (n, x, s) are used, round the confidence interval limits to the same number of decimal places used for the sample mean.

continuity correction

When we use the normal distribution (which is a continuous probability distribution) as an approximation to the binomial distribution (which is discrete), a continuity correction is made to a discrete whole number x in the binomial distribution by representing the discrete whole number x by the interval from x - 0.5 to x + 0.5 (that is, adding and subtracting 0.5).

Multiple Combinations Rule

When you are taking more than one combination in a problem. nCr * nCr

Mutually Exclusive

Will NEVER occur at the same time

Class problem: John is suing his landlord. If he wins. he will be awarded $6000 and will not have to pay any court costs. If he loses, he will have to pay court fees totaling $200. -john has found a lawyer that will represent him for $1200. If he hires this lawyer, there is an 80% chance he will win, and if he represents himself there is only a 60% chance that he will win. -should john hire this lawyer? (calculate his expected net winnings using the lawyer and his expected net winnings not using the lawyer)

With lawyer: 4800 -1400 P(X): .8 .2 -E(X)= (4800)(.8)+(-1400)(.2)=3560 Without lawyer: 6000 -200 P(X): .6 .4 -E(X)=(6000)(.6)+(-200)(.4)=3520

Vague Wording

Words such as "often", "seldom", and "occasionally" mean different things to different people.

Chi-Square Distribution formula

X^2 = ( [ n -1 ] * s^2) / σ^2

Requirements for Testing Claims About σ or σ^2

X^2 = (n - 1) * s^2 / σ^2

Test statistic for standard deviation

X^2 = (n - 1) * s^2 / σ^2

How do you Standardize Normal Distribution: P[ r < X < s)

Z = (X-u)/std, =P[(r-u)/std < (X-u)/std < (s-u)/std]

Replication

[ ] of the experiment on many patients reduces the possibility that the differences in pain relief for the two groups occurred by chance alone.

Variance (probability distribution)

[∑x^2 • P(x) ] - µ^2

asymptotic

__________ means that the normal curve gets closer and closer to the x-axis but never actually touches it.

a +ar + ar^2 +............=

a / (1-r)

E[aX + b] =

a E[x] + b

mean and standard deviation of a binomial random variable

a binomial experiment with n independent trials and probability of success p has a mean and standard deviation given by the formulas u-sub-x = np and σ-sub-x = √np(1-p)

relative frequencies

a casual term for probability

Variable

a characteristic observed on a sample unit

Variable

a characteristic of an individual

variable

a characteristic of the individual to be measured or observed

event

a collection of outcomes, usually identified to attach probabilities to them; denoted by capital letters such as A,B, or C.

combination

a collection, without regard to order, of n distinct objects without repetition.

event

a combination of outcomes usually for the purpose of attaching a probability to them

placebo

a control, does nothing to the text subject

Probability distribution

a description that gives the probability for each value of the random variable; often expressed in the format of a graph, table, or formula

tree diagram

a diagram to determine a sample space that lists the equally likely outcomes of an experiment

probability of an event

a number between 0 and 1 that reports the likelihood of the event's occurrence; can be derived from equally likely outcome, long-run relative frequency of the events occurence or from known probabilities. We write P(A) for the probability of an event

Parameter

a number describing or calculated from a population, usually the actual numerical value is unknown and we must describe the parameter in words

Statistic

a number describing or calculated from a sample, usually the actual numerical value is known

statistic

a number that can be computed from the data.

parameter

a number that describes the population

expected value

a number that makes a rational expression undefine

simulation

a numerical facsimile or representation of a real-world phenomenon

normal quartile plot

a pattern on such a plot that deviates substantially from a staight line indicates that the data are not normal

normal quartile plot

a pattern on such a plot that deviates substantially from a straight line indicates that the data are not Normal.

Seasonal variation

a pattern that repeats itself at known regular intervals of time

Random

a phenomenon is random if any individual outcome is unpredictable, but the distribution of outcomes over many repetitions is known example: toss a coin. no flip is predictable, but many flips will result in approximately half heads and half tails -remember that random does not mean that each outcome is equally likely, it only means that a particular outcome cannot be predicted with certainty

Outcome

a possible result of an experiment

subjective probability

a probability obtained on the basis of personal judgment

Parameter

a quantitative measure that describes a characteristic of the Population

Statistic

a quantitative measured that describes a characteristic of the Sample

Caution

a random variable does not share the same properties as an algebraic variable -for an algebraic variable X: X+X+X=3X -for a random variable, each X may turn out differently, so X+X+X doesnotequal 3X -this distinction matter when calculating variance. -X+X+X should really be written X1+X2+X3

hdi

a ranking of countries, combining various statistics

response variable

a record of the resulting values from each trial that corresponds to what we were interested in

observational study

a researcher observes and measures characteristics of interest of part of a population but does not change existing conditions.(1.3)

stratified random sample

a sample in which the population is first divided into similar, nonoverlapping groups. A simple random sample is then selected from each of the groups

compound

a sequence of simple events

Population

a set of data that form the target of a study example: student body of a school

Sample

a set of data values collected on some of the sampling units example: a student

trial

a single attempt or realization of a random event

trial

a single attempt or realization of a random phenomenon

confounding

a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable.

unbiased estimator

a statistic whose sampling distribution is centered over the population parameter

Law of Large Numbers

a statistical law stating that as sample size increases, the probability of an event outcome will more closely reflect the theoretical probability of the event.

event

a subset of a sample

simple random sample

a subset of the population selected in a manner such that every sample of size n from the population has an equal chance of being selected

sample survey

a survey done only on a sample of the population

Observational Study

a survey of an existing population carried out by adopting a sample procedure (pre-existing/ in place)

Census

a survey that attempts to gather data on the entire population

probability model

a table or listing of all the possible outcome of an experiment, together with the probability of each outcome; must follow the Rules of Probability

Distribution of a Data Set

a table, graph, or formula that provides the values of the observations and how often they occur

binomial distribution

a theoretical distribution of the number of successes in a finite set of independent trials with a constant probability of success

experiment

a treatment is deliberatrly imposed on the individuals in order to observe a possible change in the response or variable being measured

tree diagram

a tree-shaped diagram that illustrates sequentially the possible outcomes of a given event

Example: the american vet ass. claims that the annual cost of medical care for dogs averages $100 with a standard deviation of 30$, and the annual cost of medical care for cats averages $130 with a standard deviation of $35 a) what's the expected difference in cost between cats and dogs? b) what's the standard deviation of the difference between cats and dogs? c) if the differences in costs is normally distributed, what's the probability that the medical expenses for a woman's dog is greater than that for her ca?

a) E(C-D)=E(C)-E(D)=120-100=$20 b) V(C-D)=V(C)+V(D)=1225+900=2125 > O c-d=$46.1 c) we are told the difference is normal, and we already found the center and spread. Difference N(20,46.1) P(difference<0)=P(Z<(0-20/46.1)=P(Z<-.4338)=.3322

ratio level of measurement

applies to data that can be arranged in order; differences are meaningfull; true zero

record the number of people that walk into a post office each day. a) what is the sample space? b) How do you think the outcomes will be distributed (what shape)

a) S={0,1,2,3,....) lsl= infinity b) skewed-right

Class problem: K, A, and M have completed several relay triathlons. K-swimming, A-bikes, M-runs. Their respective completion times (in hours) have means .77, 1.33, and .9, and their respective standard deviations are .05, .08, and .06. a) what is their expected team finish time? b) what is the standard deviation of the team finish time? c) assume their team finish times are normally distributed. What is the probability that they finish the triathlon 15 minutes earlier than usual?

a)E(K+A+M)=E(K)+E(A)+E(M)=.77+1.33+.9=3 b)V(K+A+M)=V(K)+V(A)+V(M)=.0025+.0064+.0036=.0125 oK+A+M=Square root of .0125=.1118 c) T N(3, .1118) > P(T<2.75)=P(Z<2.236)=0.0127

experiment

any process with uncertain results that can be repeated

interval level of measurement

applies to data that can be arranged in order; differences are meaningfull

Statistic

descriptive measure for a sample

E[a1 h1(x) + a2 h2(x) + b] =

a1 E[h1(x)] + a2 E[h2(x)] + b

VAR[aX +bY + c] =

a^2VAR[X] + b^2VAR[Y] + 2abCOV[X,Y] (If X & Y are independent, COV[X,Y] = 0)

Var[aX + b] =

a^2Var[X]

integral of a^x =

a^x / ln a

simple random sample

abbreviated SRS, this requires that every item in the population has an equal chance to be chosen and that every possible combination of items has an equal chance to exist. No grouping can be involved.

experiment

action whose outcome cannot be determined with certainty

confidential

all individual data on subject must be this

sample space

all possible outcomes of given experiment

population

amount of people in a given area

Probability Experiment

an action, or trail, through which specific results are obtained

permutations

an arragement or listing in which an order or placement is important

permutation

an arrangement in which r objects are chosen from n distinct objects, repetition is not allowed, and order is important.

Array

an arrangement of data in ascending or descending order

mean

an average of n numbers computed by adding some function of the numbers and dividing by some function of n

simple event

an event consisting of just one outcome.

unusual event

an event that has a low probability of occurring, typically less than 5%

random event

an event where we know what outcome could happen, but not which particular values will happen

impossible

an event with a probability of 0

certainty

an event with a probability of 1

simple event

an event with only one outcome

single blind experiments

an experiment in which the participants are unaware of which participants received the treatment

placebo effect

an improvement in health not due to any treatment, but only to the patient's belief that he or she will improve.

factor

an independent variable in statistics

catagorical variable

an individual into one of two or more groups or categories

outcome

an individual result of a component of a simulation; the value measured, observed, or reported or an individual instance of the trial

Sampling Unit

an item or object on which an observation can be recorded example: grade level

Simple Event

an outcome or an event that cannot be further broken down into simpler components

event

and outcome of a random phenomenon, a subset of the sample space

experiment

any activity for which the outcome is uncertain

resistance measure

any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of oberservations no matter how large these changes are

variable

any characteristic of an individual

event

any collection of outcomes from a probability experiment, consisting of one or more outcomes

Event

any collection of results or outcomes of a procedure

Continuous Variable

any numerical value over an interval -measured Example: Height

Event

any outcome or set of outcomes of a random phenomenon

Examples:

calculate the mean and standard deviation of the following random variable: -X: -2 3 7 -P(X): .3 .1 .6 -E(X)= (-2)(.3)+(3)(.1)+7(.6)=3.9 -V(X)=(-2-3.9)^2(.3)+(3-3.9)^2(.1)+(7-3.9)^2(.6)=16.29

QuaLitative Variable

can be identified by noting its presence describes observation as belonging to a set of categories

linear transformations

changes the original variable x into the new variable x(new) given by the euation ***

influential point

changes the regression line if removed from the data.

Response Variable

characteristic of experimental outcome that is to be measured or observed

hypothesis

claim or statement about a property of a population

Population

collection of all individuals or items under consideration in a statistical study

nCr

combination of n objects taken r at a time

varience

commons measure of spread about the mean as center

cumulative distribution function

computes probabilities less than or equal to a specified value

simpson's paradox

conclusions drawn from two or more separate crosstabulations that can be reveresed when the data are aggregated between two quantitative variables

qualitative data

consist of attributes, labels, or nonnumerical entries.(1.2)

data

consist of information coming from observations, counts, measurements, or responses.(1.1)

quantitative data

consist of numerical measurements or counts.(1.2)

Event

consists of one or more outcomes and is a subset of the sample space

Three Principles of Experimental Design - Control

control effects due to factors other than ones of primary interest

Principles of experimental design

control, randomize, and repeat

convenience sampling

create a sample by using data from population members that are readily available

ordinal level of measurement

data at this level are qualitative or quantitative. Data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful.(1.2)

nominal level of measurement

data at this level is qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level.(1.2)

ratio level of measurement

data at this measurement are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so that one data value can be meaningfully expressed as a multiple of another.(1.2)

interval level of measurement

data at this measurement can be ordered, and you can calculate meaningful differences between data entries. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero.(1.2)

Ordinal level

data can be arranged in some order, but differences between values are meaningless

Raw Data

data collected in an investigation and not organized systematically

Nominal Level

data consists of names, labels, or categories only, no order

Univariate Data

data for one variable from a population

Bivariate Data

data for two variables from same population

Descriptive Statistics

data is summarized using numerical and graphical techniques in some useful way

Inferential Statistics

data taken from only a sample is used to generalize to a larger population

4.4 properties of random variables

definitions: expected value (or mean) of a random variable: this is denoted E(X) Variance of a random variable: this is denoted V(X)

Experiment

deliberately imposes some treatment on individuals in order to observe their responses. Used to study whether the treatment causes a change in the response

Probability Distributions

describe what will probably happen instead of what actually did happen, and they are often given in the format of a graph, table, or formula.

qualitative variable

describes an individual by placing the individual into a category or group

quartiles

describes the distribution further Q1 : 1/4 of the data Q3 : 3/4 of data

density curve

describes the overall pattern of a distribution, area = 1

standard deviation

describes the variation around the mean

Parameter

descriptive measure for a population

confidence level, degree of confidence, or the confidence coefficient.

is the probability 1 - α (often expressed as the equivalent percentage value) that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times.

significance level (denoted by 𝞪)

is the probability that the test statistic will fall in the critical region when the null hypothesis is actually true. This is the same 𝞪 introduced in Section 7-2. Common choices for 𝞪 are 0.05, 0.01, and 0.10.

replication

is the repetition of an experiment using a large group of subjects.(1.3)

Statistics

is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.(1.1)

critical region (or rejection region)

is the set of all values of the test statistic that cause us to reject the null hypothesis.

cumulative frequency

is the sum of the frequency for that class and all previous classes. The cumulative frequency of the last class is equal to the sample size n.(2.1)

midpoint

is the sum of the lower and upper limits of the class divided by two. Sometimes called the class mark and is calculated [(lower class limit)+(upper class limit)]/2. section (2.1)

simulation

is the use of a mathematical or physical model to reproduce the conditions of a situation or process.(1.3)

Classical (or theorectical) Probability

is used when each outcome in a sample space is equally likely to occur

standard deviation

is zero when there is no spread and gets larger as the increase spreads

Experimental Units

items or individuals on which the experiment is performed (subjects)

Normally Distributed Variable

its distribution has the shape of a normal curve

Exponential Distribution: k-th moment E[X^k] =

k! / lamn^k

Independent

knowing that one event occurs does not change the probability that the other occurs

Exponential Distribution: Mx(t) =

lamn / (lamn - t) t<lamn

Exponential Distribution with mean (1/lamn), f(x) =

lamn(e^(-lamn*x))

Poisson Distribution: E[X] & Var[X] =

lamna

Ratio Level

like interval, but now there is a natural zero; ratios make sense

Interval Level

like ordinal, but differences between values make sense; data does not have a natural zero or starting point

probability model

lists the possible outcomes of a probability experiment and each outcome's probability

Outliers

lower limit = Q1 - 1.5 x IQR upper limit = Q3 + 1.5 x IQR if data falls below lower limit or above upper limit, it is an outlier

box plots

made based off of the 5 number summary modified - shows outliers

Representative Sample

matches the characteristics of the population well

lurking variables

may explain the relationship between explanatory and response variables.

voluntary response sampling

may lead to a large amount of bias

Normal Distribution: mean, median, mode =

mean (u)

Standard Normal Distribution

mean = 0, standard deviation = 1

mean = np

mean of binomial

Standard Normal Distribution has mean and variance of:

mean: 0 and Variance: 1

U, "or," Union

means to add

upside down U, "and," intersection

meant to multiply

Median

measure not impacted by extremes

p-value

measure of how rare the sample results would be if Ho were true

time series

measurements of a variable taken at regular intervals over time

correlation

measures the direction and strength of the linear relationship between two quantitative variables.

Z score

measures the number of standard deviations a data value is from the mean

correlation

measuring the strength and direction of the relationship between two numerical variables

2nd Quartile

median of the entire data set

3rd Quartile

median of the portion of the entire data set that lies at or above the median of the entire data set

1st Quartile

median of the portion of the entire data set that lies at or below the median of the entire data set

five number summary

median, quartials and the min and max number

stratified sample

members of the population are divided into two or more subsets, called strata, that share a similar characteristic such as age, gender, ethnicity or even political preference. A sample is then randomly selected from each of the strata and ensures that each segment of the population is represented.(1.3)

median

midpoint of the data

Combinations Rule Formula

nCr = n! / ( (n - r)! * r!)

Permutations Rule Formula (when items are all different)

nPr = n! / (n - r)!

double-blind experiment

neither the subject nor the experimenter knows if the subject is receiving a treatment or a placebo. The experimenter is informed after all the data have been collected. This type of experimental design is preferred by researchers.(1.3)

No Mode

no data value is repeated

Qualitative Variable

non-numerically valued variable

Binomial Distribution: E[X] =

np (mean of the binomial distribution)

Binomial Distribution: Var [X] =

np(q)

systematic sampling

number all members of the population then from a random starting point, select every kth member

n

number of equally likely outcomes

mortality

number of infants, for every 1000 born who die before the age of 1

Discrete data

number of possible values are finite or countable; gaps between possible values

Z-Score

number of standard deviations an observation is away from the mean

combinations

number of ways to combine items in which order doesn't matter

Percentile formula

numbers of values less than x / total number of values * 100

parameters

numbers that describe a population

statistics

numbers which measure factors of life in a given country

linear correlation coefficient r

numerical measure of the strength of the relationship between two variables representing quantitative data.

parameter

numerical measure that describes an aspect of a population

statistic

numerical measure that describes an aspect of a sample

quantitative variable

numerical values for which arithmetic operations such as adding and averaging make sense

Quantitative Data

numerical variable for which it makes sense to do arithmetic operations; measurements

Quantitative Variable

numerically valued variable

Individual

object described by a set of data. Individuals may be people, but they may also be animals or things

observational study

observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured

outliers

observations that lie outside the overall patter of the distribution

Observational Study

observes individuals and measures variables of interest but does not attempt to influence the responses. Purpose of study is to describe some group or situation

Census

obtain info on the entire population of interest

placebo effect

occurs when a subject reacts favorably to a placebo when in fact, he or she has been given no medicated treatment at all.(1.3)

placebo effect

occurs when a subject receives no treatment but (incorrectly) believes he is receiving treatment and responds favorably

Gambler's fallacy, or "law of averages"

psychological prejudice that assumes observations will behave as expected much sooner than necessary. In other words, thinking an event is "due" or "not due" -playing a different lottery number than last week's winning number because the chances it would come up twice in a row are so small. -building your home in the exact spot that a meteor struck reasoning it would almost impossible for a meteor to strike in the same place twice. -a man brings a bomb on a plane. he reasons "the chances of there being a bomb on a plane are so small, so the chances of there being another one are almost zero"

Confidence Interval for Estimating a Population Proportion p notation

p̂ - E < p̂ < p̂ + E p̂ +/- E (p̂ - E, p̂ + E)

Confidence Interval for Estimating a Population Proportion p

p̂ - E < p̂ < p̂ + E where E = z(α/2) * ( [p̂ * q̂] / n ) ^(1/2)

Geometric Distribution: p(x) =

q^x *p

Discrete Variable

quantitative variable whose possible values form a finite (or countable infinite) set of numbers

Continuous Variable

quantitative variable whose possible values form some interval of numbers

bar graph

quickly compares data in column form, the heights can also show percents

Negative Binomial Distrubution E[X] =

r(1-p) / p

Negative Binomial Distribution VAR[X] =

r(1-p) / p^2

coefficient of determination

r2

Discrete Random Variable

random variable whose possible values from a finite or countably finite set of numbers

Completely Randomized

randomly place subjects of sample into all experimental groups

randInt(min,max,num)

randomly selects num integers from min to max

confidence interval (or interval estimate)

range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI.

control group

receives a dummy treatment, enabling the researchers to control for the placebo effect; used to account for the influence of other known or unknown variables that might be an underlying cause of a change in response in the experimental group

replication

reduces the possibility that the differences in pain relief for the two groups occurred by chance alone

Subjective Probability

result from intution educated guesses and estimates

undercoverage

results when population members are omitted from the sample frame

Skewed Distributions

reverse-j, j-shaped, right-skewed, left-skewed

if X is a random variable and a and b are fixed numbers, mean of a+bX = a+bmeanX

rule 1 for means

if X and Y are random variables, meanx+y=meanx + meany

rule 2 for means

Range Rule of Thumb for Estimating a Value of the Standard Deviation s

s approx = range/4

statistically significance

said to exist when the probability that the observed findings are due to chance is very low

Q2 (Second Quartile)

same as the median; separates the bottom 50% of sorted values from the top 50%

symmetric

same on both sides

S

sample space

Representative Sample

sample that reflects as closely as possible the relevant characteristics of the population under consideration

Simple Random Sampling

sampling procedure for which each possible sample of a given size is equally likely to be the one obtained

residual plot

scatterplot of the (x, y) values after each of the y-coordinate values has been replaced by the residual value y - y (where y denotes the predicted value of y). That is, a residual plot is a graph of the points (x, y - y).

Hypothesis Test for Two Means: Independent Samples with σ(1) and σ(2) Both Known

see page 479

Confidence Intervals for Matched Pairs

see page 488

linear correlation coefficient formula

see page 520

Q1 (First Quartile)

separates the bottom 25% of sorted values from the top 75%

Q3 (Third Quartile)

separates the bottom 75% of sorted values from the top 25%.

pie chart

shows us the percents or count in relationship to a whole

point estimate

single value (or point) used to approximate a population parameter.

event

specified result that may or may not occur when an experiment is performed

variance

standard deviation squared, a measure of spread

Weighted Mean formula

x bar = ∑ (w • x) / ∑w

explanatory variable

x variable. explains or causes changes in the y variables

For Discrete Random Variable E[X] =

x1*p(x1) + x2*p(x2) + .......

µ+zσ

x=

Confidence Interval for Estimating a Population Mean (with σ Known)

xbar - E < μ < xbar + E or xbar +/- E or (xbar - E, xbar + e) where E = z(α/2) * ( σ / (n)^(1/2) )

confidence interval limits

xbar - E, xbar + E

Mean from a Frequency Distribution

xbar = (∑(f * x)) / ∑f

response variable

y variable. measures an outcome of a study

Regression Equation

yhat = b(0) + b(1) * x

Test Statistic for Two Proportions

z = ( phat(1) - phat(2) ) - ( p(1) - p(2) ) / ( (phat * qhat / n(1) + (phat * qhat / n(2) )

Test statistic for proportion

z = (phat - p) / (p * q / n)^(1/2)

z score formula

z = (x - xbar)/standard devation

Conversion Formula

z = (x - μ) / σ

Test Statistic for Testing a Claim About a Mean (with σ Known)

z = (xbar - μ(xbar) / (σ / (n)^(1/2) )

Test statistic for mean

z = (xbar - μ) / (σ / (n)^1/2 ) or t = (xbar - μ) / (s / (n)^(1/2) )

Standardized Variable

z-value

x-µ/σ

z=

Binomial Distribution: Mean

µ = n • p

Mean of a Probability Distribution

µ = ∑[x • P(x)] Methods for Finding Probabilities - Method 1: Using the Binomial

mean of the sample means formula

µ(xbar) = µ

mean of a discrete random variable

µ-sub-x = ∑ [x ∗ P(x)]

Binomial Distribution: Standard Deviation

σ = (n • p • q)^(1/2)

Standard Deviation of a Probability Distribution

σ = (∑[x^2 • P(x)] - µ^2)^(1/2)

standard deviation of a discrete random variable

σ-sub-x = √σ²-sub-x

Binomial Distribution: Variance

σ^2 = = n • p • q

Variance (shortcut) of a Probability Distribution

σ^2 = ∑ [x^2 • P(x)] - µ^2

Variance of a Probability Distribution

σ^2 = ∑[(x - µ)^2 • P(x)]

variance of a discrete random variable

σ²-sub-x = ∑ [x² P(x)] -µ²-sub-x

Population Standard Deviation

∑ (((x - µ)^2)/N)^(1/2)

(Mean of all values) µ

∑ x/N

(Mean) x bar

∑ x/n


Conjuntos de estudio relacionados

Ch 16 PrepU Nursing Management during Postpartum period, PrepU: Chapter 15: Postpartum Adaptations, OB - Chapter 15: Postpartum Adaptations, Ricci, Kyle & Carman: Maternity and Pediatric Nursing, Second Edition: Chapter 15: Postpartum Adaptations; Pr…

View Set