HIMA 4075 FINAL (definitions, t/f, mc)

¡Supera tus tareas y exámenes ahora con Quizwiz!

GCMC example (% of data, +/- 1 sd)

"what % of patients have a LOS that is less than 12 days" =(COUNTIF(A2:A201,"<12")/COUNT(A2:A201)) "if our data represented a perfect normal distribution, what would we expect the percentage of patients who have a LOS less than 12 days to be?" =NORMDIST(12, mean, sd, 1) **find mean, sd, mean-1sd, mean+1sd*** "what percentage of patients have a LOS that falls within 1 SD?" *below 1 sd*=(COUNTIF(A2:A201,"<mean-1sd")/COUNT(A2:A201)) *above 1 sd*=(COUNTIF(A2:A201,">mean+1sd")/COUNT(A2:A201)) *% between*= above - below "what is the LOS where 10% of patients fall below?" = n * 0.1 "if our data represented a perfect normal distribution, what would we expect the percentage of patients who have a LOS within 1 SD to be?" *below 1 sd*=(NORMDIST(mean-1sd, mean, sd, 1) *above 1 sd*=(NORMDIST(mean+1sd, mean, sd, 1) *% between*= above - below

calculating expected value in chi square

(row total x column total) / grand total

df for chi square

(rows - 1)(columns - 1)

efficiency =

(true positive + true negative) / (true positive + true negative + false positive + false negative)

z score

(x-mean) / sd

poisson distribution probabilities

*avg # of arrivals is 8* P(9 arrivals) =POISSON.DIST(9,8,0) P(9 or more arrivals) =1-(POISSON.DIST(8,8,1)) P(5 or fewer arrivals) =POISSON.DIST(5,8,1)

binom distribution probabilities

*n = 5, p = 4%... P(success) is defined as death* P(all 5 survive) = P(0 die) =BINOMDIST(5,5,0.96,0) P(no more than 1 person dies) = P(0 die) + P(1 dies) =BINOMDIST(1,5,0.04,1) P(2 or more people die) = P(2 die) + P(3 die) + P(4 die) + P(5 die) OR 1 - P(1 or fewer die) = 1 - P(1 dies) - P(0 die) =BINOMDIST(1,5,0.04,1)

"what is the weight where 20% of the sample falls below this weight?"

*sort file by weight; reference value is (count of weight cells * 0.2)*

t distribution percentages

-/+ 1 sd = 68% -/+ 2 sd = 93% 7% is outside of -/+ 2 sd

In a two-tailed test in which α = .01, the probability of each tail is _____.

.005

A couple is planning on having five children. What is the probability that four out of the five children will be girls?

0.156

What is the probability of rolling a die and obtaining a number that is higher than four or an even number?

0.667

2 aspects of data in analysis setting

1. data are recorded in regard to or about cases or observations 2. data are made up of variables

arguments for dropping nonsignificant variables

1. including nonsignificant variables in a model will not improve the prediction of the dependent variable in any significant way 2. including nonsignificant variables unnecessarily complicates the understanding of the dependent variable with additional independent variables that have no predictive value

Which of these are assumptions of multiple regression?....

1. y is statistically independent, do not depend 2.Variance of y are equal 3. For each set of xi is a subpopulation , y, which are normally distribute

Which one of the following answers is definitely a wrong answer to a probability question?

1.5

At a local high school, the mean time for the 100-yard dash is 11.25 seconds with a standard deviation of 2.3 seconds for males on the track team. What time would be needed for a runner to be in the top 5% of runners on the team?

15.045 sec

mutually exclusive

2 or more outcomes that cannot simultaneously occur not independent

A universities entry-level statistics course has a mean score of 85 with a standard deviation of 12 upon completion. What is the standard deviation of the mean for a sample of 25 students randomly selected to evaluate the effectiveness of the courses at teaching statistics?

2.4

For the frequency table below, calculate the mean. X f fX 1 4 4 2 9 18 3 6 18 4 7 28 5 4 20 6 2 12 32 100

3.1

In general, conducting a two-tailed hypothesis test with significance level α = .05 will generate the same conclusion as constructing a _____ C.I.

95%

how to calculate sample size for p value

= (critical t squared x p(1-p)) / margin of error squared

how to calculate sample size for t value

= (critical t value squared x sd squared) / margin of error squared

what is the coefficient of variation for age?

=(STDEV(age column)/AVERAGE(age column))*100

if excel asks if cells contain mu???

=AND(upper limit conf int>$pop$mean, t-value<$pop$mean)

"mean weight for females"

=AVERAGEIFS(weight column,sex column,1) *if dummy code is 0 = M, F = 1

converting units on excel

=CONVERT(value,"g","lbm") "g" is current unit "lbm" is desired unit

how many respondents are exactly 40 years old?

=COUNTIF(age column, 40)

"how many babies are the mean weight?"

=COUNTIF(all baby weights,"=mean cell")

"how many babies are the mode weight?"

=COUNTIF(all baby weights,"=mode cell")

"what percentage of babies are lighter than 3100 grams?"

=COUNTIF(baby weight column,"<3100")/COUNT(baby weight column) 36.08%

"what percentage of babies are heavier than 3500 grams?"

=COUNTIF(baby weight column,">3500")/COUNT(baby weight column) 37.42%

how many males are exactly 40 years old?

=COUNTIFS(age column, 40, sex column, "Male")

dummy coding equation

=IF(A1 = "F",0,1)... pull down f = female = 0 ^0 means true

"what percentage of babies are lighter than 3100 grams based on the mean and sd?"

=NORM.DIST(3100, mean weight, sd of weight, 1) 36.09%

"what percentage of babies are heavier than 3500 grams based on the mean and sd?"

=NORM.DIST(3500, mean weight, sd of weight, 1) 62.72%

percentile from z score

=NORM.DIST(x, mean, sd, 1)

"what is the weight where 20% of the sample should fall below the height based on the mean and sd?"

=NORM.INV(0.2, mean weight, sd of weight)

"scores on a particular test are normally distributed with a mean of 70 and an SD of 15. what score would you expect ___% of the data to fall?"

=NORM.INV(0.84, 70, 15) =NORM.INV(0.16, 70, 15) =NORM.INV(0.975, 70, 15) =NORM.INV(0.025, 70, 15)

"what is the weight where 5% of the sample should be heavier than this weight based on the mean and sd?"

=NORM.INV(0.95, mean weight, sd weight) 4275.80

calculating p value for t stat

=T.DIST.2T(| test stat |, df)

sample mean in excel

=average(

standard error in excel

=sample sd/(sqrt(30))

population standard deviation in excel

=sqrt(pop variance cell)

sample standard deviation in excel

=st.dev.s(

population variance in excel

=sum((x-pop mean)^2*p(x)) column

population mean in excel

=sum(x*p(x)) column

The probability of drawing an ace from the top of a well shuffled deck is approximately 0.0769 (4/52) given the nature of the deck of cards (the are four aces in a 52-card deck). This type of probability is called _______ probability.

A priori

Which of the following requires a dependent sample t-test?

A study comparing academic achievement of a group of students before and after a learning intervention.

In the ___, R2 is adjusted to give a truer estimate of how much the independent variables in a regression analysis explains the dependent variable.

Adjusted R2

_____ is an example of possible relationships observed with a scatterplot.

All the above

bins/frequency chart calculation

BINS: mean - 3sd mean - 2sd mean - 1sd mean mean + 1sd mean + 2sd mean + 3sd maximum FREQUENCY: {=FREQUENCY(values being tested, bins value)} % of TOTAL OBSV: = frequency cell/total frequency cell

The occurrence of binary events generally follows a known distribution , call the ____ distribution.

Binomial

In hypothesis testing, a relationship could be ___

Causal Association

The correlation coefficient represents the ___ between the two variables , x & y......

Correlation

Which type of hypothesis is the following statement? people who are immunized against the flu will be less likely to contract the flu than those who are not immunized.....

Directional Hypothesis

The F Statistic is a broad picture statistic that intends to test the __ of the entire regression model presented....

Efficacy

Consider the following question. We are interest in predicting how students.... which is the dependent variable

Exam grade ( scored from 0-100)

A histogram would be a good way to show the distribution of ethnicity among the participants of a study.

FALSE

Multiple regression analysis is an extension of the simple linear regression model and it includes only on independent variable

FALSE

two events are independent if the conditional probabilities are not equal to the marginal probabilities

FALSE

R2 is a goodness-of-fit measure and the scale ranges from -1 to 1 .....

False

A type II error occurs when _________.

H0 is false and the researcher fails to reject H0

hypothesis acceptance/rejection examples

H0: the average height of 6-year-old boys is 48 inches H1: the average height of 6-year-old boys is not equal to 48 inches 1. confidence interval (w/ mean of 49, sd of 10, n of 100)... LCL = (49 - [2 * 10/sqrt(100)]) = 47 inches UCL = (49 + [2 * 10/sqrt(100)]) = 51 inches since 48 inches is included in the confidence intervals, do not reject the null hypothesis. conclude that the average height for 6-year-old boys in this county is the same as the national average. 2. confidence interval (w/ mean of 45, sd of 10, n of 100)... LCL = (45 - [2 * 10/sqrt(100)]) = 43 inches UCL = (45 + [2 * 10/sqrt(100)]) = 47 inches since 48 inches is not included in the confidence intervals, reject the null hypothesis. conclude that the average height for 6-year-old boys in this county is less than the national average.

simple addition rule

IF MUTUALLY EXCLUSIVE p(true) = P(true and first) + p(true and second) + p(true and third)

additional rule

IF NOT MUTUALLY EXCLUSIVE p(true or first) = P(true) + p(first) - p(true and first)

__________ are defined as samples selected from different populations where values from one population are not related or linked with values from another population.

Independent samples

Linear regression is a method of organizing data that uses the least squares method to determine which ___ best fits the data

Line

The probability of drawing a card that is an "ace" out of a standard deck of 52 cards is an example of what type of probability?

Marginal

if f value is < 0.5 variances are

NOT equal *two sample assuming unequal* look for experimental + control headings

What measurement level is the variable "military rank"

Ordinal

Which of the following probability is solved using the addition rule

P(X OR Y)

multiplication rule for probability table

P(salary is $50k to $69k and nurse has 5 to 9 years experience) =P(salary is $50k to $69k) * P(nurse has 5 to 9 years experience GIVEN salary is $50k to $69k) GIVEN = AND / p(salary is $50k to $69k)

addition rule for probability table

P(salary is $50k to $69k or nurse has 5 to 9 years experience) =P(salary is $50k to $69k *this is total from row*) + P(nurse has 5 to 9 years experience *this is total from column*) - P(both *this is the intersection*)

Information from samples can be used to make estimates of information about _______

Populations

When the concentration of values is on the left side of a distribution with a long tail on the right side of the distribution this is known as a __________ distribution

Positively skewed

____ is any process that randomly leads to one of several results for which the concepts and rules of probability are applicable.

Random experiment

In linear regression, the dependent variable should be measured at the _____

Ratio Interval

Consider the following question: We are interested in predicting who will do well in second exam how long they studied. We collect data on how long everyone studied and then we get everyones final numerical grade (scored from 0 to 100). What type of statistical test is applicable here? ....

Simple linear regression

_______ are samples drawn by first dividing the population into subsets equal to the number of observation ultimately desired in the sample and then drawing a specific observations from each subset.

Systematic samples

In a poisson distribution, the mean n equal to variance.

TRUE

Joint probability refers to the simultaneous occurrence of two of more types of events

TRUE

Statistical analysis is almost universally about relating one variable to another

TRUE

The first postulate suggest that each probability is expressed as a positive real number from zero to one

TRUE

Which of the following conditions must be met in order to conduct an independent samples t-test?

The samples must be randomly selected from two non-overlapping populations.

Which of the following is an example of an assumption of a study

The self-reported body weight was accurate

In a simple linear regression, with a single predictor variable, the probability of the f test will always be the same as the probability of the t test.....

True

In hypothesis testing the researcher test the statements of relationship between two or more variables

True

In linear regression, F-test test whether any of the independent in a multiple linear regression model are significant

True

The null hypothesis HO is always the hypothesis being test ....

True

a confidence interval is the range within we expect a true population value (e.g. a mean) to lie.

True

If there is, in fact, a relationship between immunization and contacting the flu and the researcher concludes there is no relationship, what type of error have they committed?....

Type 2

In what situation would a researcher want to conduct a one-sample t-test?

When µ is known, σ is unknown, and the population is normally distributed or the sample size is n > 30.

predictor variable

a casual variable whose values are predictive of the values of other variables in a given analysis

ordinal variable

a categorical (nominal) variable that is ordered by magnitude or intensity ex. good, better, best

nominal variable

a categorical variable that is not ordered

dummy variable

a categorical variable that takes on 2 values and is coded 1 and 0 0 is omitted variable

scatterplot/scatter graph

a graph that shows the simultaneous distribution of the data points for 2 variables XY graph

A parameter is

a measurable characteristic of population

standard deviation

a measure of overall variation in a set of data square root of the variance

variance

a measure of overall variation in a set of data that represents the average squared difference between each value in the data set and the mean of all values

variable

a measure of some attribute for a set of entities, persons, or organizations that takes on more than one value

standard error

a measure of the overall variation in the means from samples of a given size taken from a population sd / square root of n

ratio variable

a numerical variable (interval) that has a real zero point may be continuous, like weight may be discrete, like # of people in a waiting room

continuous variable

a numerical variable that can theoretically be infinitely divided (blood pressure)

normal distribution

a probability distribution in which values near the mean are more likely than values farther from the mean bell-shaped curve (68%, 95%, 99%) "approx. 68% of the values lie between -/+ 1 sd from the mean" continuous numerical data... a distribution of MEASUREMENT unlike binomial and Poisson which are distributions of EVENTS or COUNTS *based on scales that are CONTINUOUS not discrete*

t distribution

a probability distribution similar to the normal distribution but having fewer values near the mean and more in the tails, depending on df assumes a finite number of observations

poisson distribution

a probability distribution that represents the likelihood of a rare event only takes on values for whole numbers "concerned with the number of observations that will occur in a small amount of time or over a region of space"

argument for not dropping insignificant variables

a regression equation implies a casual relationship in which the values of the dependent variable are actually caused by the values of the independent variable set

nonlinear relationship

a relationship b/w 2 variables that does not show evidence of a straight-line relationship knowledge of x will provide a better prediction of y values than no knowledge of x... but bc the relationship isn't linear, simple linear regression will not be useful

matrix

a set of data in continuous row and columns

hypothesis

a statement of belief about a population to be assessed, using data from a sample

regression analysis

a statistical analysis that seeks to determine whether a given numerical variable is independent of some set of other numerical variables or 2 level categorical variables ex. cost of hospital stay is independent of the length of hospital stay ex. could assess the independence of the dollar value of all hospital billings and the number of patients admitted for a sample of for-profit and not-for-profit hospitals

correlation coefficient

a statistical index of the relationship between two things (from -1 to +1)

chi-square statistic

a statistical test that assesses whether a categorical variable is independent of one or more other categorical variables

chi-square test

a statistical test used to determine the probability of obtaining observed proportions by chance, under a specific hypothesis are the 2 variables statistically independent of each other??

random sample

a subset of a larger population selected in such a way that every member of the larger population has a known and nonzero likelihood of being included

sample

a subset of a population about which there is an interest selected to determine the values of interest for the population

multiple regression

a technique for determining whether a single numerical variable is independent of 2 or more other numerical or 2 level categorical variables

t test

a test that compares an estimated value from a sample with the standard error for that value used to determine whether a numerical variable is independent of a 2 level categorical variable ex. whether the score people received on a test of knowledge about breast cancer on a 20-point scale is independent of whether those people were specifically and consciously exposed to knowledge about breast cancer or not ex. determine whether the cost of a hospital stay is independent of whether the patient was a member of an HMO or not ex. cost per visit for males and females

regression coefficient

a value by which an independent variable can be multiplied to predict the values of a dependent variable slope

correlation

a value derived from a statistic that describes the relationship between two variables may range from -1 (perfect negative relationship) to 1 (perfect positive relationship 0 indicates no relationship

degrees of freedom

a value that designates the number of options that can be exercised before no others are available

categorical variable

a variable whose va

dependent variable

a variable whose values are assumed to be affected or modified by the value of other variables in a given analysis "CAUSED VARIABLE"

independent variable

a variable whose values are assumed to be unaffected by other variables in a given analysis "causal OR predictor variable"

central tendency

a way of referring to the central or midpoint around which a data series clusters (mean, median & mode)

If a value has a positive Z-score, its percentile rank is

above 50

cumulative binomial function

accumulates from lowest to highest

examples of continuous numerical variables are

age er waiting time cost per visit blood pressure ****ALL THE ABOVE

mean

aka simple average

flip a coin 5 times... what is the probability of

all heads = 0.03125 4 heads = 0.15625 3 heads = 0.3125 2 heads = 0.3125 1 head = 0.15625 no heads = 0.03125

sample space

all possible outcomes

increasing alpha will always lead to a decrease in

beta

true negative

condition absent with negative test result

false positive

condition absent with positive test result

false negative

condition present with negative test result

true positive

condition present with positive test result

Which is not an example of a point estimate....

confidence interval

The relationship between confidence level and the probability of committing Type I error, α, is _____.

confidence level = (1 - α)

margin of error (measurement error)

critical value x standard error

secondary data

data that have been collected for some purpose other than the study of interest to the researcher but can be accessed by the investigator for purpose ex. patient records, county-by-county statistics on median income level, low birth weights etc.

deletion

deleting cells when there is missing data

dispersion

describes how far values in the data set are from the measure of central tendency (range, variance, sd)

"determine distribution of ER arrivals in an hour" **Poisson**

discrete prob P(X=x) =POISSON.DIST(first value, average (or lambda), 0) cumulative prob P(X<=x) =POISSON.DIST(first value, average (or lambda), 1)

chi square critical value > test stat

do not reject variables are not related

if f value is > 0.5 variances are

equal *two sample assuming equal* look for experimental + control headings

standard error of estimates

error is how much the research is off when the regression line is used to predict particular values measure of variability of the errors (avg error over the entire scatterplot) the lower the error, the higher the degree of linear relationship b/w the 2 variables the higher the error, the less confidence can be put in the estimate

simple random sample

every member of the population has a known and equal chance of selection

bins

excel designation for the categories into which the =FREQUENCY() function accumulates a data series

type 2 error

failing to reject a false null hypothesis known only if a specific value for H1 is given false negative

false negative fraction =

false negative/(false negative + true positive) ex. negative affected fetus/total affected fetus

false positive fraction =

false positive/(false positive + true negative) ex. positive unaffected fetus/total unaffected fetus

A type I error is known as _______.

falsely claiming an effect when it actually does not exist

most common binary event

flipping a coin

effectiveness of dummy variable =

for treatment a = slope of x1*# + slope of dummy variable 1*1 + slope of dummy variable 2*0 + intercept for treatment b = slope of x1*# + slope of dummy variable 1*0 + slope of dummy variable 2*1 + intercept for treatment c = slope of x1*# + slope of dummy variable 1*0 + slope of dummy variable 2*0 + intercept

effectiveness for dummy variable + interaction =

for treatment a = slope of x1*# + slope of dummy variable 1*1 + slope of dummy variable 2*0 + slope of interaction 1*1 + slope of interaction 2*0 + intercept for treatment b = slope of x1*# + slope of dummy variable 1*0 + slope of dummy variable 2*1 + slope of interaction 1*0 + slope of interaction 2*1 + intercept for treatment a = slope of x1*# + slope of dummy variable 1*0 + slope of dummy variable 2*0 + slope of interaction 1*0 + slope of interaction 2*0 + intercept

independence

formally, the understanding that conditional probabilities equal marginal probabilities; the recognition that 2 or more variables are not dependent on each other

IQ lecture example (% of data, proportions)

from IQ score (given varible)... proportion below column will = NORM.DIST(A2, mean, sd, 1) percent below column will just proportion below values as % proportion above will = 1-NORM.DIST(A2, mean, sd, 1) percent above column will just proportion above values as %

a scatterplot is a way to...

graphically display a collection of points, each having the value of one variable determining the position on the x axis and the value of the other variable determining the position on the y axis

calculation of coefficients on excel

headings: stay, LOS (x), total charges (y), x-xbar, y-ybar slope (b1) = y-ybar average (=sumsq) / x-xbar average (=sumsq) intercept (b0) = y average - (x average * slope)

when inputting interaction on excel, match up interaction 1 with dummy variable 1

i.e. if dummy variable 1, use actual age # for interaction 1 cells and 0 for interaction 2 column if dummy variable 2, use 0 for interaction 1 and actual age # for interaction 2 column

if chi square data is not given in the form of a table... make a pivot table

if gender and exceed exercise threshold are the variables, put gender in rows, exercise in columns, and exercise in count

imputation

impute-or fill in missing cell values could use the mean of available data to fill in, median etc.

control group

in an experiment, the group that is not exposed to the treatment; contrasts with the experimental group and serves as a comparison for evaluating the effect of the treatment

negative linear relationship

increase in x = decrease in y knowing something about x will allow you to predict something about y even if you have no knowledge

positive linear relationship

increase in x = increase in y knowing something about x will allow you to predict something about y even if you have no knowledge

As the sample size __, the standard error will ___, because it is the standard deviation divided by the square root of the sample size.

increases, decrease

reduce type 2 error by

increasing sample size *larger sample size = lower level of beta*

coefficient b0

intercept of the line represents where the line crosses the y axis OR the value of y when x = 0

confidence interval

interval on the number scale within which a population value is expected to lie with some predetermined probability (95%) "the range within which we expect a true population value to lie" established from samples and used to predict pop value

in general, if the chi square value is as large as the number of cells in the contingency table...

it is likely that the null hypothesis will be rejected at the 5% LOS

Based on the formula for the standard error of the mean, you can tell that the _________, the smaller the standard error of the mean.

larger the sample size

best fitting line

line determined by an independent variable that passes closest to the values of a dependent variable in a 2D graph usually defined as the line that minimizes the sum of squared differences b/w the line and the values of the dependent variable for all the values of the independent variable

t-test, assuming equal variance

male vs female ages

Confidence intervals are calculated as the point estimate ± _______.

margin of error

skewed left

mean < median tail is to the left

skewed right

mean > median tail is to the right

parameter

measure of a characteristic of a population ex. mu, N

statistic

measure of characteristic of a sample estimates a parameter ex. x-bar, n

what would be an appropriate measure of central tendency to use when looking at height of children at the age of 5.

mode median mean ****ALL THE ABOVE

*for f test* say the mean for females is 12.133 and mean for males is 18.588...

one tail hypothesis - H0: the mean of males is greater than the mean for females H1: the mean of females is greater than the mean for males two tail hypothesis - H0: the mean of males and females is equal H1: the mean of males and females is NOT equal

In a __ test, the region of rejection of HO is at only end of the continuum

one tailed

interval variable

ordinal variable that have equal intervals age groups divided into 5-year categories (65 to 69, 70 to 74 etc.) NOT categories like young, middle-aged, and old --> these are ordinal variables w/o equal intervals

conditional probability and bayes's theorem

p(a | b) = p(a and b) / p(b)

conditional probability for independence

p(a | b) = p(a)

discrete numerical data

produced by a counting action and represent measures that can be made in discrete individual units only (no fractions, always whole #)

r square

proportion of variance in a dependent variable that can be accounted for or explained by knowledge of variation in an independent variable or variables

coefficient of determination

r^2 SSr/SSt coefficient of multiple determination b/w a dependent variable and the independent variables tells how much of the variability in y is explained by x can range from 0 to 1 "the proportion of variation in y that can be predicted using x"

adjusted r square

r^2 is adjusted to give a truer estimate of how much the independent variables in a regression analysis explain the dependent variable taking into account the # of independent variables make the adjustment 1 - ((standard error of estimate)^2 / (standard dev of y)^2)

statistical significance

refers to a statistical test result that leads to the rejection of the implicit or explicit hypothesis of independence between 2 or more variables

chi square critical value < test stat

reject variables are related

type 1 error

rejecting null hypothesis when it is true always set by the level of confidence false positive

The research analyst must evaluate data for ___

relevance, reliability, and validity not... potential

multiple r

represents the strength of the linear relationship b/w the actual and the estimated values for y scale ranges from -1 to 1 -1 = good inverse relationship 1 = good direct relationship

chi square test tends to be ___ skewed

right

The population from which the sample is actually drawn is known as the:

sample population

the two possible outcomes from the flip of a coin- heads or tails- are frequently called the _____ for this process.

sample space

A confidence interval is defined as an interval of values calculated from _______ to estimate ________.

sample statistics; the value of a population parameter you calculate a statistic to estimate the value of the population

coefficient b1

slope of the line rise over run (one unit)

no relationship

small x values = large and small y values large x values = large and small y values knowing something about x is no better than knowing nothing about x in predicting y values

"smoking policy is related to type of institution

smoking policy is dependent of the type of institution

"smoking policy is not related to type of institution"

smoking policy is independent of the type of institution

as sample size increases

standard error decreases

linear regression

statistical technique for relating a dependent numerical variable to an independent numerical or 2-level categorical variable that generally assumes a straight-line relationship uses the least-squares method to determine which line best fits the data

hypothesis testing (statistics)

statistics are used to assign a probability to the likelihood that 1. a particular statistical value could have come from some specifiable population 2. 2 (or more) sample (or population) groups are different from one another 3. a measure taken from a sample (or population) is of a particular value

inferential statistics

statistics from a sample are used to make inferences about the populations from which the samples were drawn converting information about a sample into intelligent guesses about a population

"the big picture"

statistics is used to determine whether a value found in a sample could be assumed to have come from a population with certain characteristics

paired samples t-test

subject, after program, before program

slope (b1) =

sum of (x-xbar)*(y-ybar) / sum of (x-xbar^2)

sum of squares total (sst)

sum of (y-ybar)^2

For a left-tailed t-test with α = .10 and n = 25, the rejection zone is ______.

t < −1.318

small expected values in chi square

tend to inflate chi square value and increase likelihood of rejection

t test for related data aka paired 2 sample for means

test of whether 2 measurements for the same group of people are similar or different look for before, after, difference headings t stat = mean difference in before and after / standard error of differences

the central task of determining independence is...

the establishment of confidence limits and the testing of hypotheses about the data

The margin of error is calculated by multiplying the critical value of t for a two-tailed test with α level, tα/2 by _____.

the estimated standard error of the mean

When the 95% confidence Intervals include the hypothesized population parameter under H0, the correct conclusion is that

the evidence is not strong enough to support the claim that there is an effect at α = .05.

population

the group of persons or organizations about which there is an interest

alpha

the level of TYPE 1 ERROR usually set @ 0.05 or 0.01

beta

the level of TYPE 2 ERROR usually not known unless a specific value is stated for H1

joint probability

the likelihood of 2 simultaneously occurring events ex. the likelihood that a person will come to an ER during the day and will come for a true emergency

probability

the likelihood of an outcome of an event

empirical probability

the likelihood of the occurrence of an event that can be determined only on the basis of historical data about similar events that have already occurred

The coefficient of determination, r2, tells us _______.

the percentage of variance overlapping between variables X and Y

binomial distribution

the probability distribution that represents the accumulation of a Bernoulli distribution for any number of trials and any value of the probability of 1?

marginal probability

the probability of some outcome without regard to any other event associated with a single event - mutually exclusive ex. the likelihood that a person who arrives at an ER will come for a true emergency vs a nonemergency condition

outcome

the results of the events an observation to which a probability can be assigned

trend line

the single line through an XY scatterplot that provides the best linear or nonlinear fit to the data

statistics

the study of samples, characteristics of samples, and the ways in which inferences about populations may be drawn from those samples

ANOVA (analysis of variance)

the test used to determine whether a numerical variable is independent of 1 or more categorical variables that may take on more than 2 variables

critical value

the value that the test statistic must exceed in order to reject the null hypothesis chi-square, t value, F

failing to reject p value in chi square

they are not related

rejecting p value in chi square...

they are related

Discrete numerical variables are typically the rseult of the counting of:

things activities or organization person ALL THE ABOVE ***

probability tables that look like chi square...

to fill in cells, divide corresponding original cell by total from the column on the outside of the table, sum the percentages from the rows/columns

The following are properties of the normal distribution except?

total area under the curve is less than one

a scatterplot is a way to graphically display a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis

true

multiple regression analysis help us understand relationship between serval independent or predictor variable and a dependent or outcome variable

true

specificity = true negative fraction =

true negative/(true negative + false positive) ex. negative unaffected fetus/total unaffected fetus

negative predictive value =

true negatives/(true negative + false negative)

sensitivity = true positive fraction =

true positive/(true positive + false negative) ex. positive affected fetus/total affected fetus

positive predictive value =

true positive/(true positive + false positive)

The rule of thumb says the absolute value of t should be greater than __ to be able to reject the null hypothesis

two

A non-directional hypothesis test is also known as a _______.

two-tailed test

descriptive statistics

used to characterize either a total population or a sample from that population

f test

used to compare equality of two variances (experimental group variance / control group variance)

A statistic is a _____ derived from a sample that can be used to infer something about a populations

value

numerical value

variable that is measured on a number scale may be continuous or discrete

write up for interaction

we are able to (reject/do not reject) the null hypothesis. regression results indicate that the overall model (significantly/insignificantly) predicts treatment effectiveness and interaction effectiveness, r2 = ___, adjusted r2 = ___, p = ___. this model accounts for (adjusted r2)% of the variance in treatment effectiveness and interaction effectiveness when adjusting for the number of variables included in our model.

write up for dummy variable

we are able to (reject/not reject) the null hypothesis. regression results indicate that the overall model (significantly/insignificantly) predicts (y value). r2 = ___, adjusted r2 = ___, f(x) = ___, p = ___. The model accounts for (adjusted r2)% of the variance in treatment effectiveness when adjusting for the number of variables included in our model.

mathematical independence

when marginal probabilities equal conditional probabilities in a joint probability table

dummy variable and interaction

when the dummy variable is 0, the interaction is 0 (x1*x2) when the dummy variable is 1, the interaction is = the continuous variable

interaction

where we allow slope to adjust

regression write up

with our model we can use x to predict y. Using x we were able to account for (r2)% of the variance in x, R^2 = ___, F(1,498) = 1782, p = ___. for each additional unit increase in x, y increased/decreased by approximately ___ units (look at coefficients). given our regression equation, x (for example) of ___, would have an expected y of ____ & x of ___ would have an expected y of ___. **to solve the regression equation and explain... do y = b1x + b0 chart (y column, slope column, x column, intercept column) y column = slope*x + intercept slope = coefficient x cell intercept = coefficient intercept cell x = input whichever #s

A random sample is selected from a population with µ = 60 and the sample is subjected to an experimental procedure. If the sample s = 6, which set of sample characteristics is most likely to lead to a decision to reject H0?

x= 63, n = 16 use test statistics, use t test xbar (63) -population mean (60)/(standard error (6)/square root of n (16) 63-60/(6/4) = 2 use t.inv.2t to find critical value 2 is greater than critical value so reject

linear regression equation (line of best fit)

y = b1x + b0

to solve the regression equation and explain in write up... do y = b1x + b0 chart (y column, slope column, x column, intercept column)

y column = slope*x + intercept slope = coefficient x cell intercept = coefficient intercept cell x = input whichever #s

intercept (b1) =

ybar - (b1 * xbar)

When conducting a one-sample t-test, what is the rejection zone for a two-tailed test with α = .05 and n = 26?

|t| > 2.06 go into excel and use t.inv.2t to solve this question

For the equality of variances test for two independent samples t-test, what is the H1?

σ12 ≠ σ22

The range of Pearson's r is ______.

−1 ≤ r ≤ 1

In a normal distribution the middle 95% of the population is bound by

−1.96 < Z < 1.96


Conjuntos de estudio relacionados

CISSP Chapter 5 Review Questions, Chapter 3, Chap1 questions, Computer Security (Ch. 7), Intro to Security (Chapter 4), Intro to Security (Chapter2)

View Set

Florida Life and Health Practice Test

View Set

Targeted Medical Surgical Cardiovascular Online Practice 2019

View Set