AP Statistics Chapter 4-6

Ace your homework & exams now with Quizwiz!

Basic Principle for Experimental Design

1. Comparison - Use a design that compares two or more treatments. 2. Random Assignment - Use chance to assign experimental units. Create roughly equivalent groups of experimental units at the start of the experiment to balance the effects of other variables among the treatment groups. 3. Control - Keep other variables that might affect the response the same for all groups. (This is not the same as control group.) 4. Replication - Use enough experimental units in each group so the differences can be distinguished from chance.

Scope of Inference

1. Inferences about populations are possible when individuals are randomly selected. 2. Inferences about cause and effect are possible when individuals are randomly assigned to groups.

Criteria for establishing causation when we can't do an experiment.

1. The association is strong. 2. The association is consistent. 3. Larger values of the explanatory variable are associated with stronger responses. 4. The alleged cause precedes the effect in time. 5. The alleged cause is plausible.

Rules of Probability

1. The probability of any event must be between 0 and 1, inclusive. 0 ≤ P(E) ≤ 1. 2. The sum of the probabilities of all outcomes must equal 1. 3. If E and F are disjoint events, then P(E or F) = P(E) + P(F). If E and F are not disjoint events, then P(E or F) = P(E) + P(F) - P(E and F) 4. If E represents any event and Ec represents the complement of E, then P(Ec) = 1 - P(E) 5. If E and F are independent events, then P(E and F) = P(E)∗P(F)

simple random sample problem

1. label each experimental unit 1-n and put those in a hat 2. pick randomly while not putting any back to skip repeats, skip numbers not in range

Matched pair design

A common form of blocking for comparing just two treatments. In some matched pairs designs, each subject receives both treatments in a random order. In others, the subjects are matched in pairs as closely as possible, and each subject in a pair is randomly assigned to receive one of the treatments. twin studies pretest vs posttest

Block

A group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. When blocks are formed wisely, it is easier to find convincing evidence that one treatment is more effective than another. strata - sample vs experiment - block

Table of random digits (table d)

A long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: • Each entry in the table is equally likely to be any of the 10 digits 0 through 9. • The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part.

For events A and B related to the same chance process

If A and B are independent, then they cannot be mutually exclusive. these events are independent so they can't be mutually exclusive

law of large numbers

if we observe more and more repetitions of any chance process, the proportion of times that a specific outcome approaches a single value

Spread of a Linear Transformation

oy=IbIox

Population (chapter 4)

In a statistical study, the population is the entire group of individuals about which we want information.

Simple random sample (SRS)

The basic random sampling method. An SRS gives every possible sample of a given size the same chance to be chosen. We often choose an SRS by labeling the members of the population and using random digits to select the sample. Common ways to choose an SRS included drawing names out of a hat, technology random number generators or using tables of random digits. You should be able to describe in great detail how to choose an SRS using those methods. size n is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample. People that are randomly picked from the hat are the simple random sample

Bias

The design of a statistical study shows bias if it would consistently underestimate or consistently overestimate the value you want to know.

Sampling frame

The list from which a sample is actually chosen.

Wording of questions

The most important influence on the answers given to a survey. Confusing or leading questions can introduce strong bias, and changes in wording can greatly change a survey's outcome. Even the order in which questions are asked matters.

Nonsampling error

The most serious errors in most careful surveys are nonsampling errors. These have nothing to do with choosing a sample—they are present even in a census. Some common examples of nonsampling errors are nonresponse, response bias, and errors due to question wording.

Binomial Coefficient

The number of ways of arranging k successes among n observations

Sample (chapter 4)

The part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. subset of individuals in the population from which we actually collect data

Experimental units

The smallest collection of individuals to which treatments are applied.

Random sampling

The use of chance to select a sample; is the central principle of statistical sampling. let chance do the sampling which eliminates bias

Stratified random sample

To select a stratified random sample, first classify the population into groups of similar individuals, called strata. Then choose a separate SRS from each stratum to form the full sample. stratified random samples give more precise estimates than simple random samples of the same size

symbol for intersection

∩ (means "and")

symbol for union

∪ (means "or")

Geometric Random Variable

Y when Y= the number of trials required to obtain the first success

probability model

a description of some chance process that consists of two parts: a sample space S and probability for each outcome

probability (chapter 5)

a number between 0 and 1 the describes the proportion of times the outcome would occur in a very long series of repetitions

simulation

an imitation of chance behavior based on a model that accurately reflects the situation Follows four-step process: State -- Ask a question of interest about some chance process. Plan -- Describe how to use a chance device to imitate one repetition of the process. Tell what you will record at the end of each repetition. Do -- Perform many repetitions of the simulation. Conclude -- Use the results of your simulation to answer the question of interest

event

any collection of outcomes from some chance process; subset of sample space; usually designated by capital letters (ex. A, B, C, etc.) p(A0=(number of outcomes corresponding to event A)/(total number of outcomes in sample space)

basic probably rules

For any event A, 0 ≤ P(A) ≤ 1. If S is the sample space in a probability model, P(S) = 1. In the case of equally likely outcomes, use the P(A) formula Complement rule: P(AC) = 1 − P(A). Addition rule for mutually exclusive events: If A & B are mutually exclusive, P(A or B) = P(A) + P(B).

Strata

Groups of individuals in a population that are similar in some way that might affect their responses. ideally similar within and different between ex. high school grades (freshmen, sophomores, juniors, seniors)

Margin of error (chapter 4)

A numerical estimate of how far the sample result is likely to be from the truth about the population due to sampling variability.

Convenience sample

A sample selected by taking the members of the population that are easiest to reach; particularly prone to large bias. bad method of sampling

systematic random sample

A sample where the items or people are selected according to a specific time or item interval.

Treatment

A specific condition applied to the individuals in an experiment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables.

Level

A specific value of an explanatory variable (factor) in an experiment For example, if we were studying effects of advertising an explanatory variable might be lengths of commercials and we could have commercials of varying lengths. 30, 45 and 60 minute commercials would make 3 levels of that one explanatory variable.

Census

A study that attempts to collect data from every individual in the population. costs too much and takes a lot of time

Sample survey

A study that uses an organized plan to choose a sample that represents some specific population. We base conclusions about the population on data from the sample. You must 1) say exactly what population you want to describe and 2) say exactly what you want to measure - give exact definitions of the variables.

Response bias

A systematic pattern of incorrect responses. Ex. lie about age, income, etc. misremember a number of hours, etc. Or the gender, race, ethnicity, or behavior if the interviewer can affect people's responses

Single-blind

An experiment in which either the subjects or those who interact with them and measure the response variable, but not both, know which treatment a subject received.

Double-blind

An experiment in which neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received.

Control group

An experimental group whose primary purpose is to provide a baseline for comparing the effects of the other treatments. Depending on the purpose of the experiment, a control group may be given a placebo or an active treatment. placebo effect

Replication

An important experimental design principle. Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.

Random assignment

An important experimental design principle. Use some chance process to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units at the start of the experiment.

Placebo

An inactive (fake) treatment.

Statistically significant

An observed effect so large that it would rarely occur by chance. statistically significant does imply causation

Experiment

Deliberately imposes some treatment on individuals to measure their responses. Sometimes, the explanatory variables in an experiment are called factors. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a level) of each of the factors. cause and effect ONLY taste test

Placebo effect

Describes the fact that some subjects respond favorably to any treatment, even an inactive one (placebo).

Subjects

Experimental units that are human beings.

difference between nonresponse and voluntary response

Misuse the term "voluntary response" to explain why certain individuals don't respond in a sample survey. Their idea is that participation in the survey is optional (voluntary), so anyone can refuse to take part. What the students are describing is nonresponse. Nonresponse can occur only after a sample has been selected. In a voluntary response sample, every individual has opted to take part, so there won't be any nonresponse.

Observational study

Observes individuals and measures variables of interest but does not attempt to influence the responses. survey

Nonresponse

Occurs when a selected individual cannot be contacted or refuses to cooperate; an example of a nonsampling error.

Undercoverage

Occurs when some members of the population are left out of the sampling frame; a type of sampling error. Ex. opinion polls conducted by calling landlines miss households that have only cell phones as well as those without any phone

general multiplication rule (probability)

P(A and B) = P(A ∩ B) = P(A) * P(B|A) where P(B|A) is the conditional probability that event B occurs given that A has already occured

general addition rule (probability)

P(A or B) = P(A) + P(B) - P(A and B) if A and B are any 2 events resulting from some chance process

addition rule of mutually exclusive events

P(A or B) = P(A) + P(B), if A and B are mutually exclusive

multiplication rule for independent events (probability)

P(A ∩ B) = P(A) * P(B) if A and B are independent events, then the probability that A and B both occur

Compliment rule

P(A^C) = 1 - P(A) not A

Geometric Probability

P(Y=k)=(1-p)^(k-1)p

Voluntary response samples

People decide whether to join a sample based on an open invitation; particularly prone to large bias. Call-ins or many Internet polls rely on voluntary response samples. People who choose to participate in such surveys are usually not representative of some larger population of interest. Voluntary response samples attract people who feel strongly about an issue, and who often share the same opinion which leads to bias bad method of sampling under or overestimate bc of strong opinionated people usually volunteer

P(A^C)

Probability of NOT A within the sample space

Randomized block design

Start by forming blocks consisting of individuals that are similar in some way that is important to the response. Random assignment of treatments is then carried out separately within each block. the random assignment of experimental units to treatments is carried out separately within each block In summary: control what you can, block on what you can't control, and randomize to create comparable groups for reducing variation

Cluster sample

To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters should mirror the characteristics of the population. Then choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample. Often used for practical reasons: saving time and money Cluster sampling works best when the clusters look just like the population but on a smaller scale Don't offer the statistical advantage of better info about the population that stratified random samples do because clusters are often chosen for ease so they may have as much variability as the population itself. Some people take as SRS from each cluster rather than including all members of the cluster

Inference about the population

Using information from a sample to draw conclusions about the larger population. Requires that the individuals taking part in a study be randomly selected from the population of interest. Random sampling - representation of whole population. Or any other way of sampling that reduces bias can make inference about population only when individuals were randomly selected

Inference about cause and effect

Using the results of an experiment to conclude that the treatments caused the difference in responses. Requires a well-designed experiment in which the treatments are randomly assigned to the experimental units. Statistical significance since the difference would be too large to be explained by chance variation in the random sample Well-designed experiments randomly assign individuals to treatment groups. However, most experiments don't select experimental units at random from the larger population which limits such experiments to inference about cause and effect Observational studies don't randomly assign individuals to groups, which rules out inference about cause and effect. An observational study that uses random sampling can make an inference about the population. can make inferences about cause and effect only when individuals were randomly assigned

Completely randomized design

When the treatments are assigned to all the experimental units completely by chance.

Lack of realism

When the treatments, the subjects, or the environment of an experiment are not realistic. Lack of realism can limit researchers' ability to apply the conclusions of an experiment to the settings of greatest interest. tested on rats and assumed it'll have the same reaction on humans

Confounding

When two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. how is something confounding? how does it affect the response and experimental variables?

Adding/Subtracting Constants to a Random Variable

adds/subtracts a to measures of center and location (mean, median, quartiles, percentiles) doesn't change shape or measures of spread (range, IQR, standard derivation)

Binomial Setting

consists of independent trials of the same chance process, each resulting in success or failure, with probability of success on each trial

Geometric Setting

consists of repeated trials of the same chance process in which the probability p of successes is the same on each trial, and the goal is to count the number of trials it takes to get one success finding first success so no n

law of averages

do not mistake for law of large numbers -- idea that possible outcomes balance out in the future, i.e. getting heads on a coin flip six times in a row must be followed by getting tails six times; MYTH

Well designed experiment

establishes internal validity, which is one of the most important validates to interrogate when you encounter causal claims ells us that changes in the explanatory variable cause changes in the response variable. More precisely, it tells us that this happened for specific individuals in the specific environment of this specific experiment Well-designed experiments randomly assign individuals to treatment groups. However, most experiments don't select experimental units at random from the larger population which limits such experiments to inference about cause and effect Observational studies don't randomly assign individuals to groups, which rules out inference about cause and effect. An observational study that uses random sampling can make an inference about the population

cluster

group of individuals that are located near eachother different between, similar between

Discrete Random Variable

has a fixed set of possible values with gaps between them each probability has to be between 0 and 1 sum of probabilities has to =1

Normal Approximation (large numbers count)

if X is a count of successes having the binomial distribution with parameters n and p, then when n is large, X is approximately Normally distributed with mean np and standard deviation square root of np(1-p) to find normal approx. np≥10 AND n(1-p)≥10

Linear Transformation

involves adding or subtracting a constant, multiplying or dividing a constant, or both Y=a+bx Y = a + bX is a linear transformation of the random variable X the probability distribution of Y has the same shape as the probability distribution of X if b > 0 μY = a + bμX σY = |b|σX ( b could be a negative number) Linear transformations have similar effects on other measures of center or location (median, quartiles, percentiles) and spread (range, IQR). Whether we're dealing with data or random variables, the effects of a linear transformation are the same Results apply to both discrete and continuous random variables.

Binomial Distribution

its probability distribution The probability distribution of X is a binomial distribution with parameters n and p, where n is the number of trials of the chance process and p is the probability of a success on any one trial. The possible values of X are the whole numbers from 0 to n basically what is n and p

Geometric Distribution

its probability distribution with parameter p, the probability of a success on any trial

Independent Random Variables

knowing the value of one variable tells you nothing about the other

Multiplying/Dividing Constants to a Random Variable

multiplies/divides measures of center and location (mean, median, quartiles, percentiles) by b multiples.divides measures of spread (range, IQR, standard deviation) by b doesn't change shape of distribution

Binomial Probability

n= # of trials p=prob of success k= # of success (or use binompdf)

Shape of a Linear Transformation

same as the probability distribution of X is b>0

intersection

shows A and B

Continuous Random Variable

takes all values in some interval of numbers infinitely many possible values use density curve and the probability is the area under the curve (normcdf) all continuous probability models assign probability 0 to every individual outcome

Random Variable

takes numerical values determined by the outcome of a chance process

Probability Distribution

tells us what the possible values of X are and how probabilities are assigned to those values discrete or continuous

Variance of a Random Variable

the "average" squared deviation of the values of the variable from their mean typical distance from the mean

Mean (Expected Value) of a Random Variable

the balance point of the probability distribution density curve or histogram average of the possible values of X

10% Condition

the binomial distribution with trials n and probability p success gives a good approximation to the count of successes in an SRS of size n from a large population containing proportion n of successes as long as the same size n is no more than 10% of the population size N sample sizes should be no more than 10% of the population. Whenever samples are involved in statistics, check the condition to ensure you have sound results. rule for independent when not replacing n<.10N

Binomial Random Variable

the count X of successes

Mean of the Difference of Two Random Variables

the difference of their means

conditional probability

the probability that one event happens given that another event is already known to have happened; denoted by P(B|A)

sample space S

the set of all possible outcomes

Standard Deviation of random variable

the square root of the variance and measures the typical distance of the values in the distribution from the mean *(x-u)^2

Mean of the Sum of Two Random Variables

the sum of their means have to be independent in order to add means

The Variance of the Sum of Two Independent Random Variables

the sum of their variances

The Variance of the Difference of Two Independent Random Variables

the sum of their variances (square root to find standard deviation)

mutually exclusive (disjoint)

two events that have no outcomes in common that can never occur together; when P(A and B) = 0 An example of a mutually exclusive event is flipping a coin. The result can be either heads or tails but never both, so it can be said that flipping a coin is mutually exclusive 1) Have no outcomes in common 2) Cannot be independent 3) Cannot occur at the same time 4) Have an intersection that is the "empty set"

Large Counts Condition

using normal approximation when np≥10 and n(1-p)≥10

Mean of a Geometric Random Variable

uy=1/p

Center of a Linear Transformation

uy=a+bux

independent events (probability)

when the occurrence of one event does not change the probability that the other event will happen; if P(A|B) = P(A) and P(B|A) = P(B) two mutually exclusive events can never be independent because if one event happens, the other event is guaranteed not to happen (male and pregnant) 1) Cannot be disjoint 2) Means that the outcome of one event does not influence the outcome of any other event


Related study sets

CHES Practice Exam Question (From 6th Edition)

View Set

Ch. 9: "Postscript: The Meaning of Life"

View Set

Intro to Communications Final- WGU

View Set

Chapter 4- Texas Deceptive Trade Practices & the Consumer Protection Act

View Set

Pathophysiology - Mid Term Exam Review Questions

View Set

Leadership, Delegation, and Emergency Response Planning

View Set