stats mini exam

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

empirical rules for normal distribution

About 68% of the data fall within one standard deviation of the mean, about 95% of the data fall within two standard deviations of the mean, and almost all fall within three standard deviations of the mean

Describing Quantitative Variables

Always start with making a picture: Histogram or Stem and Leaf Plot summary of different values observed for the variable includes the "3 s's": shape, center, and spread spread is also known as variation or variability

Finding area to the right of a z-score

1. use symmetry 2. using properties of density curves (1- )

a density curve

A mathematical model used to describe the overall pattern of the distribution of a random variable; rescale a percent histogram so that the area under the curve is 1

correlation

A measure of the extent to which two factors vary together, and thus of how well either factor predicts the other.

standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1.

coefficient of determination (r^2)

The fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. a measure of how well the LSRL fits the data the square of the correlation coefficient (r) fraction of variance in y (vertical scatter), that can be explained linearly by changes in x (horizontal scatter)

bar charts

Used when data is divided into categories (discrete data) The bars are separated to show different categories The height represents the frequency of that category among all individuals Describe the distribution in a bar chart by comparing the heights of the bars

extrapolation

Using a LSRL to predict outside the domain of the explanatory variable. Predictions are not reliable (Can lead to ridiculous conclusions if the current linear trend does not continue)

IQR

distance between the first and third quartiles (Q3-Q1) resistant to outliers and skew, only looking at the middle 50% of the observations

five-number summary

minimum, Q1, median, Q3, maximum quick numerical summary of a quantitative variable, used to create a box plot

parameter

numerical summary of some feature of the population, "mu" and "sigma"

normal QQ plot

of residuals, we want the points to fall on the line to show the residuals are normally distributed and centered at 0

interpret LSRL

on average, for each additional unit of x, y changes by b units.

a residual

or error, the vertical distance between an observed and predicted value of y Ei = yi - yiHat the positive and negative residuals will be 0 if added up

ordinal variable

ordered; ex. letter grade, rankings

categorical variable

places an individual into one of several groups or categories (ordinal or nominal)

re-randomization

reassign individuals into treatment groups and observe the difference between these new groups as a comparison

when asked if a linear model is appropriate for your data,

report on the residuals and R^2

Individuals

the objects described by a set of data n = number of individuals the rows

two-way table

A table containing counts for two categorical variables. It has r rows and c columns. can have any number of categories

2 measures of center

mean and median

fitted LSRL

yHat = a + bx

standardized value (z-score)

z = x-u / o

scatter plots

Shows the relationship between two variables. Straight line indicates closer correlation. describe by form, direction, and strength of association

intercept

a = yHat - b(x-bar)

normal distribution

a bell-shaped curve, a family of curves, describing the spread of a characteristic throughout a population, defined by its center and spread (mean, std dev)

z-score

a measure of how many standard deviations you are away from the norm (average or mean)

sample

a subset of the population needs to be chosen at random to be representative of the population

outliers

a value that deviates from the overall pattern, unusual observation

lurking variable

a variable that is not among the explanatory or response variables in a study but that may influence the response variable

response variable

a variable that measures an outcome or result of a study, y

explanatory variable

a variable that we think explains or causes changes in the response variable, x

table A

allows us to find the area (proportion of observations / probability) to the left of a z-score known as cumulative proportions / probabilities

influential observation

an observation that markedly changes the regression line if removed, substantially changes the regression equation, may or may not be an outlier too

a variable

any characteristic of an individual; can take different values for different individuals; should not be predetermined, must vary the columns

standardization

any normal distribution can be transformed into the standard normal distribution, in order to use the table need to standardize x to get z allows you to compare observations on different scales by recentering and 0 and rescaling to 1

slope

b = r(sy / sx)

1.5 x IQR rule

low outlier: less than Q1 - (1.5xIQR) high outlier: greater than Q3 + (1.5xIQR)

histograms

shows the number of individuals that fall in each interval (height of each bin)

2 measures of spread

standard deviation and IQR

marginal distribution

summarizes each categorical variable independently (row totals, column totals) ignore the potential bivariate relationship between the categorical variables in the table

common distribution shapes

symmetric, skewed, complex / multimodal

quantitative variable

takes numerical values for which arithmetic operations such as adding and averaging make sense across individuals ex. height, weight, GPA

the distribution of the variable

tells us 1) the possibly values or outcomes of a variable and 2) the frequency with which is takes on those values

mean of a density curve

the balance point, at which the curve would balance if made of solid material; if skewed it will get pulled towards the tail

statistic

the corresponding numerical quantity for the sample, x-bar and s

conditional distribution

the distribution of values of that variable among only individuals who have a given value of the other variable the distribution of the response variable, given a particular fixed category of the explanatory variable

mean

the distribution's "center of mass", is not robust and is sensitive to the data, will be pulled to the skew mean or average is denoted by x-bar

median

the distribution's midpoint, 50th percentile, it is robust and resistant to outliers be sure to sort from smallest to largest first

population

the entire group of individuals about which we want information

median of a density curve

the equal-areas point, the point that divides the area under the curve in half

least squares regression line

the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible, fit the average line that has the points as close to the line as possible

nominal variable

unordered; ex. position on a team, gender identity

standard deviation

used to describe the variation around the mean, represents the typical distance of an observation from the mean uses the mean and is therefore effected by outliers denoted by s

clustered bar chart

used to graph a conditional distribution, shows the distribution of the response variable conditioned on the levels of the explanatory variable

residuals vs fitted

want an even, random scatter of the points above and below zero. This indicates that fitting a linear model is appropriate and the residuals have constant variance.


संबंधित स्टडी सेट्स

Intermediate Econ Test Two Multiple Choice

View Set

Chapter 29: Respiratory System Functions, Data Collection

View Set

American Heritage Final Exam Study Guide CH 1-2

View Set

PreBoard III Nursing Practice III

View Set

Business Law - Ch. 5 Alternative Dispute Resolution

View Set

Anatomy and Physiology Chapter 1

View Set

Physics Semester 2 Equation Answers

View Set