PSYC 5405 Advanced Stats Midterm

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Outliers and clusters these are ways of _________ _______when describing the scatter plot

Unusual Features

Attributes that may take different values for various individuals

Variables

When examining the relationship between ______, these steps should be taken: - Plot the data and examine any numerical summaries (five number summary, mean, standard deviation) - Describe the scatter plot

Variables

ŷ is the ___________ _____ of the response variable for a given value of the explanatory value

predicted value

- Arrange the observations in increasing order and locate the median These are the steps to take to calculate ___________

quartiles

Measures the outcome of a study (dependent variable)

response variable

- ____________ observational studies examine existing data for a sample of individuals - ______________ observational studies track individuals into the future

retrospective, prospective

- Assign labels that place individuals into particular groups - Have NO order - Ex: Hair color, zip code, favorite song

Categorical

The median or mean (depending on distribution)

Center

In a roughly symmetric distribution, the mean and median are _______ __________

Close together

_________ _________ _________: Selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample - A _______ is a group

Cluster random sample

- The __________ __ _________ measures the percent of the variability in the response variable that is accounted for by the least-square regression line - It measures the percent of data values that are accurately depicted by the least-squares regression line - We can find the linear regression line and the correlation coefficient by using LinReg on our calculator

Coefficient of determination

- A _________ ______ of a variable describes the values of that variables among individuals who have a particular value of another variable - Ex: Conditional distribution by sport: Male baseball: 13/36, Female baseball: 23/36, and so on

Conditional Distribution

What does the distribution represent?

Context

_______ _____: Selects individuals from the population who are easy to reach

Convenience sample

- Collect data from a representative sample (from the population of interest) - Perform data analysis, keeping probability in mind - Use the results to create inferences about the population How to go from ______ _______ to ____________

Data analysis, inference

Direction: positive association, negative association, no association ○Form: Linear or nonlinear○Strength: Weak, moderate, strong○Unusual Features: Outliers and clusters○Context of the problem These _______ ___ ______ _____

Describe the scatter plot

- tells us what values a variable takes and how frequently it takes these values - Ex: Histograms, box plots, dot plots, scatter plots, stem and leaf plots, and line graphs for quantitative data - Ex: Bar graphs, two-way tables, and pie charts for categorical data

Distribution

In the normal distribution with mean m and standard deviation s: - Approximately 68% of observations fall within one s of m - Approximately 95% of observations fall within 2s of m - approximately 99.7% of observations fall within 3s of m This is the ______ __________

Empirical rule

the median of the observations located to the right of the median in the list

Third quartile

T/F All normal curves are characterized by a bell shape, a single peak, and are symmetrical

True

T/F Correlation is NOT resistant to outliers

True

a _____-_____ _____ describes two categorical variables, organizing counts according to a row variable and a column variable

Two-way Table

________ ________ ______: Consists of people who choose themselves by responding to general appeal - Often show bias because people with strong opinions are more likely to respond

Voluntary response sample

Does Adding or subtracting the same number n to each observation Add or subtract n to the measures of center and location (mean, median, quartiles, percentiles)?

Yes

Does Multiplying or dividing the same number n to each observation Multiply or divide the measures of center and location by n?

Yes

Does Multiplying or dividing the same number n to each observation Multiply or divide the measures of spread by |n|?

Yes

A scatter plot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis - If there is no leftover pattern, the regression model is ___________ - If there is a leftover pattern in the residual plot, consider using a different form of _____ _______

appropriate, regression model

The design of a statistical study shows _____ if it is very likely to underestimate or overestimate the value you want to know

bias

two varieties of variables

categorical, quantitative

- Select the rows or columns of interest - Use the data from the table to calculate conditional distribution of the rows or columns - Make a graph to display the conditional distribution○Use a side-by-side bar graph or a segmented bar graph These are the steps to take to examine or compare _____________ _______________

conditional distributions

the process of organizing, displaying, summarizing, and questioning data

data analysis

The ______ of a sample refers to the method used to choose the sample from the population

design

positive association, negative association, no association these are ways of _________ when describing the scatter plot

direction

For a linear association between two quantitative variables, the correlation (r) measures both the ________ and _______ __ ___ ____________

direction, strength of the association

An _________ deliberately imposes some treatment on individuals in order to observe their responses

experiment

Data always involves ________ and ________

individuals, variables

- Use the data from the table to calculate the marginal distribution of the row or column totals - Create a graph to display the marginal distribution These are the steps to take to examine a ___________ ____________

marginal distribution

a ____ _____ _____ provides a good assessment of the adequacy of the normal model for a set of data

normal probability plot

An _________ _______ observes individuals and measures variables of interest but does not attempt to influence the responses

observational study

A __________ z-score is above the mean, a __________ z-score is below the mean

positive, negative

_________ involves studying a part in order to gain information about the whole

sampling

- Convenience - Voluntary response - Simple random - Multi-stage random - Stratified random - Cluster random - Systematic random These are types of _______ _____

sampling design

________ ________ _____:Consists of n individuals of size n chosen from the population in such a way that every set of n individuals has an equal chance to be the sample actually selected

simple random sample

- When observations are not possible, ___________ provide an alternate method for producing data - We generate random numbers and assign certain numbers to outcomes based on probability

simulations

The _________________ is susceptible to outliers

standard deviation

average distance between each value and the mean

standard deviation

In a perfectly symmetric distribution, the mean and median are ________

the same

_________ ___occurs when some groups in the population are left out of the process of choosing the sample

undercoverage bias

The "average" squared deviation

variance

a is the _-_______ -the value of y when x = 0

y-intercept

The _-_____ tells us how many standard deviations away from the mean an observation falls, and what direction it falls in

z-score

_-_____ have no units

z-scores

Attempts to explain the observed outcomes (independent variable)

Explanatory variable

The science of data

Statistics

- Undercoverage - Nonresponse - Response - Order of choice - Wording of questions These are types of ____

Bias

T/F The mean and the median of a normal curve are not the same

False

the median of the observations located to the left of the median in the list

First quartile

- A _____-______ ______ is a quick summary of the distribution of a data set - It contains the minimum, first quartile, median, third quartile, and maximum - A box plot contains all numbers in a _____-______ ______

Five-Number summary

Linear or nonlinear these are ways of _________ when describing the scatter plot

Form

- Divide the range of data into classes of equal width - Find the count or percent of each individuals in each class - Label and scale your axes and draw the histogram these are the steps to take on how to construct a ___________

Histogram

- graphs that display the distribution of a quantitative variable by showing each interval of the values as a bar - The heights of the bars show the frequencies of values in each interval - show off distributions very clearly - are the most common graph of distribution

Histograms

Objects described in a data set

Individuals

- the difference of the first and third quartiles - This can also be found using your calculator - It is resistant to outliers - An observation is an outlier if it falls more than 1.5 x IQR above the third quartile or 1.5 x IQR below the first quartile

Interquartile range

_____-_____ ______ ______: The line that makes the sum of the squared residuals as small as possible

Least-Square Regression Line

- The __________ ________ of one of the categorical variables is the distribution of values of that variable among all individuals described by the table - Ex: Marginal distribution of gender: Male: 48/100 = 48% Female: 52/100 = 52% - The marginal distributions should total to 100%

Marginal Distributions

- The _______ is the average of all individual data values - To find the ______, add all of the observations and divide by the number of observations

Mean

Determine if you should use the mean or median to measure the center of a distribution of data - If the distribution is reasonably symmetric and has no outliers, use the ________ - Outliers have a big impact on the _______ which would cause an inaccurate measure of center (it is not resistant to outliers)

Mean

A normal curve is described by its ________ and _______ _________

Mean, Standard deviation

- Arrange all observations from smallest to largest - If the number of observations is odd, the median is the center observation in the list - If the number of observations is even, the _______ is the average of the two center observations in the list - For n observations in a group, use (n + 1)/2 to find the position of the ________ in the list of observations

Median

- The _________ is the midpoint of the distribution - It is the number where half of the observations are smaller and the other half larger

Median

Determine if you should use the mean or median to measure the center of a distribution of data - If the distribution of data is skewed or has outliers, use the _________ - Outliers have little to no effect on the ________, thus maintaining its accuracy (it is resistant to outliers)

Median

____-______ _______ ______: involves the repeated selections of simple random samples within prior random samples

Mulit-State random sample

Does Adding or subtracting the same number n to each observation change the shape or measure of spread of the distribution (range, IQR, standard deviation)?

No

Does Multiplying or dividing the same number n to each observation change the shape of the distribution?

No

_____________ _____ occurs when an individual chosen for the sample can't be contacted or doesn't cooperate

Nonresponse bias

- The nth _________ of a distribution is the value with n percent of the observations less than it - Ex: 60th _____________ of data is 50. This means that 60% of the data is less than 50 and 40% of the data is 50 or above

Percentile

_______: The entire group of individuals we want information about ______: A subset of individuals in the population from which we collect data

Population, sample

+ means ________ direction, -means _________ direction

Positive, negative

- Take numerical values for which it is sensible to find an average - Have order - Ex: Age, speed, height

Quantitative

- A _________ ____ displays the relationship between two variables, but only when one of the variables helps explain or predict the other - It is a model for the data the equation gives us a compact mathematical description of what this model tells us about the relationship between y and x

Regression line

- When data has a _________ overall pattern, we can use a simplified model called a ______ ________ to describe it - Always on or above the horizontal axis - It has an area of exactly 1 underneath it

Regular, Density Curve

- A _________ is the difference between the actual value of y and the predicted value of y by the regression line - = y -ŷ

Residual

_______ ______ is A scatter plot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis - If there is no leftover pattern, the regression model is appropriate - If there is a leftover pattern in the residual plot, consider using a regression model with a different form.

Residual plot

_______ _____ occurs when the time surveyed or who the surveyor is causes a bias - Also occurs when people do not remember answers or lie

Response bias

2 types of variables to keep in mind when analyzing two or more variables:

Response, Explanatory

When describing distribution of quantitative data, we use the acronym _____

SOCCS

Symmetric, Skewed Right, Skewed Left, Bimodal, Unimodal

Shape

What does SOCCSS stand for

Shape, Outliers, Context, Center, Spread

b is the ________ -the amount y is predicted to change when x increases by one

Slope

The range (most of the time) or the standard deviation

Spread

The ______ _________ is the distance from the center to the change-of-curvature points on either side

Standar deviaiton

- the normal distribution with mean 0 and standard deviation 1 - We obtain this by converting every value into its z-score and representing each data point as its z-score in the distribution - N(0,1) the ____ ________ ___________

Standard normal distribution

- another name for z-score is ____________ _____________ - the formula is 𝑥−𝑚𝑒𝑎𝑛/𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Standardized value

- Separate each observation into a stem and a leaf - A stem includes all but the final digit - A leaf is just the final digit of the number - Write all possible stems from the smallest to the largest in a vertical column - Draw a vertical line to the right of the column - Write each leaf in the row to the right of its corresponding stem - Arrange the leaves in increasing order out from the stem - Provide a key that explains in context what the stems and leaves represent These are the steps on how to make a ____-____-____ ____

Stem-and-Leaf Plot

____-___-_____ ____ are a simple graphical display for small sets of data - They give us a visual of the distribution while including the actual numerical values

Stem-and-Leaf Plots

_________ _________ _______: First classify the population into groups of similar individuals who share characteristics called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample

Stratified Random sample

Weak, moderate, strong these are ways of _________ when describing the scatter plot

Strength

- The closer to 1 or -1, the ________ the association - The closer to 0, the ______ the association

Stronger, weaker

__________ _________ _________: Selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter

Systematic random sample


Kaugnay na mga set ng pag-aaral

ECON 101 Fundamentals of Microeconomics (Exam 3 Guide)

View Set

Chapters 1 and 2 Buying and Retail Math Quizzes 1-5

View Set

Risk Management Principles and Practices

View Set

Chapter 14: Pricing Concepts for Capturing Value (INTRO TO MKTG)

View Set

Ch. 1: Uniform Securities Act. Sec. 2: Securities Registration Practice Questions

View Set