stats final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

p̂± z* sqrt p̂⋅ q̂ /n

A formula used to construct a confidence interval for p when the sample size is large.

cumulative relative frequency plot

A graph of a cumulative relative frequency distribution

stem and leaf display

A method of organizing numerical data in which the stem values (leading digit(s) of the observations) are listed in a column, and the leaf (trailing digit(s)) for each observation is then listed beside the corresponding stem. Sometimes stems are repeated to stretch the display.

z = p̂− p0/ sqrt p0 ⋅ q0 / n

A test statistic for testing : when the sample size is large. The P-value is determined from the z curve

x^2 test for independence

The hypothesis test performed to determine whether an association exists between two categorical variables

rth percentile

The value such that r% of the observations in the data set fall at or below that value.

one-way frequency table

a compact way of summarzing data on a categorical variable, it gives the number of times each of the possible categories in the data set occurs

(x¯ 1 − x¯ 2) ± t* ⋅ sqrt s2 1/ n1 + s2 2 /n2

a formula for constructing a confidence interval for μ1-μ2 when the samples are independently selected and the sample sizes are large or it is reasonable to assume that the population distributions are normal.

(p̂ 1 − p̂ 2) ± sqrt p̂ 1q̂ 1 / n1 + p̂ 2q̂ 2/ n2

a formula for constructing confidence interval for p1-p2 when both sample sizes are large

p̂ ± z* sqrt p̂⋅ q̂ / n

a formula used to construct a confidence interval for p when the sample size is large

X^2 goodness of fit test

a hypothesis test performed to determine whether the true category proportions are different from those specified by the given null hypothesis

sample median

a measure of center in which observations are first ordered from smallest to largest, one or more observations are deleted from each end and the remaining ones are averaged

trimmed mean

a measure of center in which observations are first ordered from smallest to largest, one or more observations are deleted from each end and the remaining ones are averaged

random variable: discrete or continous

a numerical variable with a value determined by the outcome of a chance experiment. it is discrete if its possible values are isolated points along the number line and continuous if its possible values from an entire interval on the number line

time series plot

a picture of numerical data collected over time

Two-Way Frequency Table (contingency table)

a rectangular table used to summarize a bivariate categorical data set; two way tables are used to compare several populations on the basis of a categorical variable or to identify whether an association exists between two categorical variables

point estimate

a single number based on sample data, that represents a plausible value of a population characteristic

unbiased statistic

a statistics that has a sampling distribution with a mean equal to the value of the population characteristic

extraneous factor

a variable that is not of interest in the current study but is thought to affect the response variable

confounding variable

a variable that is related both to group membership and to the response variable

confidence interval

an interval that is computed from sample data and provides a range of plausible values for a population characteristic

simple event

any event that consists of a single outcome

statistic

any quantity whose value is computed from sample data

disjoint (mutually exclusive) events

events that have no outcomes in common

t/f a pie chart is most useful for numeric data

f

t/f a statistic is a characteristic of the population

f

t/f a transformation of a variable is accomplished by substituting a function of the variable in a place of the variable for further analysis

f

t/f as n grows larger, the mean of the sampling distribution x bar gets closer to μ

f

t/f for a continuous random variable x, the height of the density curve over an interval a to b represents the probability that x is between a and b f

f

t/f for chi squared goodness of chi squared test, the associated p-value is the area under the appropriate chi squared curve to the left of the calculated value of x^2

f

t/f for data that is skewed to the right sigma (x- xbar) > 0

f

t/f if the null hypothesis is not rejected, there is strong statistical evidence that the null hypothesis is true

f

t/f in a well designed experiment, the factors are confounded whenever possible

f

t/f in order to decide whether the observed data is compatible with the null hypothesis, the observed cell counts are compared to the cell counts that would be expected when the alternative hypothesis is true

f

t/f p^1-p^2 is biased estimator of p1-p2

f

t/f stratified sampling is a sampling method that in no way involves simple random sampling

f

t/f the classical view of probability is based on the law of large numbers

f

t/f the entire collection of individuals or objects about which information is desired is called a sample

f

t/f the relative frequency for a particular category is the number of times the category appears in the data

f

t/f the t confidence interval formula for estimating μ should only be used when the population being sampled is at least approximately normally distributed

f

t/f the value of pearsons r is always between 0 and 1

f

t/f the variance is the positive square root of the standard deviation

f

direct control

holding extraneous factors constant so that their effects are not confounded with those of the experimental ones

categorical data

individual observations are categorical responses (nonnumerical)

The population variance o^2 and standard deviation o

measures variability for the entire standard deviation of o population

five number summary

minimum, Q1, median, Q3, maximum

P(E ∩ F ) = P(E) ⋅ P(F )

multiplication rule for two independent

Descriptive Satistics

numerical, graphical, and tabular methods for organizing data

discrete numerical data

possible values are isolated points along the number line

p̂ c = n1p̂ 1 + n2 p̂ 2/ n1 + n2

p̂ c is the statistic for estimating the common population proportion when p1-p2

Type 1 error

rejection of H0 when H0 is true; the probability of a type 1 error is denoted by a and is referred to as the significance level for the test

t/f a chi squared goodness of fit can be used to test hypotheses about the proportion of the population failing into each of the possible categories

t

t/f a placebo is identical in appearance to the treatment of interest but contains no active ingredients

t

t/f all other things being equal, choosing a smaller value of a will increase the probability of making a type II error

t

t/f an event consisting of exactly one outcome is called a simple event

t

t/f by definition an outlier is more than 1.5iqr away from the closest quartile

t

t/f for any given data set, the median must be great than or equal to the lower quartile and less than or equal to the upper quartile

t

t/f for random variables x and y, if y= a+bx, then μy= a+bμx

t

t/f for tests of hypotheses about μ, β decreases the sample size increases if the level of significance stays the same

t

t/f for two independent samples, σx¯ 1−x¯ 2 = sqrt σ2 1 n1 + σ2 2/ n2

t

t/f pearsons correlation coefficient, r, does not depend on the units of measurement of the two variables

t

t/f selection bias can occur if volunteers only are used in a study

t

t/f the chi squared test statistic x^2, measures the extent to which the observed cell counts differ from those expected when H0 is true

t

t/f the closer p is to 0 or 1, the larger n must be in order for the distribution of p^ to be approximately normal

t

t/f the confidence interval formula for estimating μ, that is used when n is large, is based on the central limit theorem

t

t/f the least squares line passes through point (xbar,ybar)

t

t/f the level of significance of a test is the probability of making a type 1 error, given that the null hypothesis is true

t

t/f the mean of sampling distribution of p^ is p no matter how large n is

t

t/f the p-value of an upper tail test is the area to the right of the calculated t value on the appropriate t curve

t

t/f the width of one sample confidence interval for μ decreases as the sample size grows larger

t

t/f two samples are said to be independent when the selection of the individuals in one sample has no bearing on the selection of those in the other sample

t

t/f x bar, p^, s and s^2 are point estimates

t

t/f xbar d and xbar 1- xbar2 are always equal

t

A or B, A U B

the event consisting of all outcomes in at least one of the two events

A and B, A ∩ B

the event consisting of outcomes common to both events

treatments

the experimental conditions imposed by the experimenter

μx = np

the mean of a binomial random variable

μx and σx

the mean standard deviation, respectively, of a random variable x. these quantities describe the center and the extent of spread about the center of the variables probability distribution

μd

the mean value for the population differences

sample mean (x-bar)

the most frequently used measure of center of a sample. it can be very sensitive to the presence of even a single outlier

sampling distribution

the probability distribution of a statistic: the sampling distribution describes the long run behavior of the statistic

sampling distribution of x (bar)

the probability distribution of the sample mean x bar based on random sample size of n. properties of the x bar sampling distribution:μx¯ = μ and σx¯ = σ where μ and σ are the population mean and standard deviation respectively. In addition, when the population distribution

coefficient of determination: r^2= 1- ssresid/ssto

the proportion of variation in observed ys that can be attributed to an approximate linear relationship

xbar d

the sample mean difference

standard deviation about the least squares line se= sqrt ssresid/n-2

the size of a typical deviation from the least squares line

sd

the standard deviation of the sample differences

σx = sqrt npq

the standard variation of a binomial variable

Total Sum of Squares (SST)= sigma(y- ybar)^2

the sum of squared deviations from the sample mean is a measure of total variation in the observed y values

non response bias

the tendency for samples to differ from the population because measurements are not obtained from all individuals selected for inclusion in the sample

blocking

using extraneous factors to create experimental groups that are similar with respect to those factors, thereby filtering out their effect


Ensembles d'études connexes

PrepU Ch 40: Management of Patients with Gastric and Duodenal Disorders

View Set

Taxes, retirement and other insurance concepts

View Set

The French Revolution + the Rise and Fall of Napoleon

View Set