Stats 1430 Midterm Rumsey

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Is X related to Y? - Method 1

Compare the conditional distributions If the same/close, then: there is no relationship If different, then: is a relationship; use % to explain it

process for stratified random sample

Divide the population into subgroups (strata) of interest Choose a simple random sample (usually same size) from each subgroup.

random sample

Each group of the same size has the same chance of being selected as the sample. ***** Allow no favoritism by the sampler or the sampled (bias)

complement rule simplified

Everyone who doesn't have characteristic A

T or F, if r = 0 no relationship?

False theres just no linear relationship

Confidentiality -

I can track you but I wont

Anonymity -

I cant track you

When finding the correlation between two quantitative variables, you will get the same answer if you switch X and Y. Explain briefly.

If you switch the X's and Y's around in the entire formula you get the same answer, by commutative property of multiplication.

Two questions that determine whether y-intercept can be interpreted If yes to both of these questions, y -intercept can be interpreted

Is data in area? Does # make sense?

Correlation is affected by outliers. Explain why, briefly

Looking at the formula for r, correlation is based on the mean of X, the mean of Y, the SD of X, and the SD of Y. All four of these items are affected by outliers

Marginal distribution =

Looks at one variable at a time (out of grand table)

a good experiment (3 points)...

Makes comparisons Avoids bias Has enough data

histogram

Nice way to see the overall shape of a data set & see patterns See data broken down into small groups but hard to identify quartiles Can only get a rough idea of center or variability Hard to compare data sets in detail, good at big picture

If line fits well residuals should have:

No pattern -Should have random scatter about the regression line No systematic change as X increases -Example Y values fan out as X increases No unusually large values of a residual -Outlier in the Y direction No influential points -Outlier in the X direction

Observational Studies

Observes individuals Measures variables and makes conclusions, comparisons Does not attempt to influence the responses

joint (and) distribution

Overall percentage in each cell (grand total) Sums to one

A and B are disjoint if

P(A and B) = 0

stratified random sample

Purpose: Compare subgroups of the population equally

simple random sample

Purpose: Examine the entire population as it exists

IQR=

Q3-Q1

4 steps to good survery

Select a good sample Design a survey that avoids bias Implement your survey to avoid bias Analyze your data properly

Biased Sample: Volunteer (aka) & issue

Self-selected; A call goes out and people enter the study on their own Issue: No sampling procedure used Sample won't represent any population - usually get mostly strong opinions

boxplots

Shows skewed vs. symmetric shapes Limitation: does not show what type of symmetric shape Easy to determine center and variability Good for skewed data sets Easy to see quartiles but can't see any other breakdown Easy to compare data sets|

Most common observational study:

Surveys

bias

Systematic favoritism in one direction or the other

conditional distribution=

Take 1 value of 1 variable and break into groups by the other variable

complement rule

The complement of A is the set of all outcomes in S which are not included in A.

Differences in the response must be due to

Treatment Random chance

Simpson's paradox

When you look at 2 variables you get one relationship but adding a third variable reverses the relationship

what variable should be on the x-axis of your scatterplot

variable you're using for prediction

Statistical significance:

when a result is too large to be due to chance (in our opinion)

Experiments

with experiment researcher actually gets involved; give treatment + have controls & see results (at end you can try to figure out why you got what you got)

confounding variables (aka)

working; variables operating in the background that can influence results and you didn't take account of them (they are in the background) may affect the results

Is correlation affected by outliers and skewness?

yes

It is possible for the first and second quartiles of a data set to be the same

yes

What happens to standard deviation if we add the same number to all values in the data set?

stays the same

standard deviation can equal zero?

true

Direction

uphill or downhill from left to right

Frequencies:

# in each category

Relative frequencies:

% in each category

slope=

(change in y)/unit change in x (x incr. by 1 )

correlation can be _____ <=r<= ____

-1,1

If all the residuals from a regression line are equal to zero, what is R^2

1

What happens to standard deviation if you multiply all values by 10? New SD=_________

10 x Old SD

For a correlation __ quantitative variables are needed

2

spot/avoid biased sample, Big Ideas - 2 criteria

A sampling procedure must be used. The sample must represent the entire population (truly random!)

Biased Sample: Undercoverage & issue

A subgroup of the population is excluded from the very beginning. issue: Sampling procedure is used Can only represent the remaining population without the subgroup

complement rule notation

Ac Or A'

Implementation: Response Bias

An individual in the sample responds but doesn't give the correct data

implementation: nonresponse

An individual is selected to be in the sample but doesn't respond to the survey

interpretation of correlation

strength + direction of linear relationship between x and y

biased sample: convenience & issue with it

Choose individuals in the easiest way Issue: Sampling procedure is used (technically) Sample won't represent any population

Is X related to Y? - Method 2

Compare conditional distribution to marginal (overall) distribution.

Data Distribution*****-

all possible values and how often they occur (it's a list showing how data is distributed)

How to avoid response bias

anonymity and confidentiality

Quantitative Data-

counts and measurements

2 things to interpret scatterplot

direction & strength

Independent Variable (aka)

factor; variable that youre changing and looking at the results of what happens (variable being compared)

Control Group-

fake or no treatment or existing treatment (give what is the current solution)

T or F, R squared gives you the percentage of points on the line??

false

T or F, correlation has units?

false

if disjoint a and b can happen at the same time?

false

Categorical Data-

groups (ex: gender)

strength

how close to the line points are

Treatment Group (s)-

individual groups on which each treatment is imposed

if you switch x and y what happens to r

it doesn't change

Suppose everyone at Bob's restaurant gets a $5.00 raise per hour to their existing wages. How does this raise affect the Interquartile Range of the salaries?

it will stay the same

and probability also known as?

joint

if the _____ changes, the standard deviation changes

mean

standard deviation is never _____

negative

Can you see the mean on a boxplot?

no

Can you tell what the sample size is from a boxplot?

no

in general, can you recreate the original data values from its histogram?

no

Don't be fooled by a high _________ of respondents. Look for a high __________ of respondents. This is called the ______ _____

number; percentage; response rate

residual=

observed y- predicted y (y-y hat)

standard deviation is affected by ________ and _______

outliers; skewness

correlation

r

all good samples are _________

random

Dependent Variable (aka)

response; the variable that responds, that comes out of the experiment ("what happens?")

standard deviation has the _____ _____ as the original data

same units

A ________ is a set of all possible outcomes of some random process

sample space

Getting Good Survey Results - 2 challenges

select a good sample & collect good data

If high concentration of data in the middle, IQR is _______

small


संबंधित स्टडी सेट्स

L 201 Class 9- Chapter 37 and 41

View Set

Sequences and series (Definitions and formulas)

View Set

ch 54 Drugs Acting on the Upper Respiratory Tract

View Set

Chapter 7 - Pattern Matching with Regular Expressions

View Set

CH 21-22 (Theory of metal machining/Machine tools and operations)

View Set