Epi/Bio Test #1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

a special meaning = result of a defined calculation

estimate

Explanatory Studies attempt to provide insight into ____. What are the two studied within this category?

etiology or find better treatments *can be experimental or observational*

Does CI of a mean quantify variability?

no - depends on not only the spread but also the sample size 95% of the time you expect the population mean to be within the CI

If you said to one of the voters, "only a fool would vote for cake" is this considered accurate data?

no - intimidation and swaying their vote this will skew the data

Nonparametric tests based on ranks

no assumptions about population distribution -work by ignoring actual data values and *analyze ranks* instead ranks high to low - ensure test won't be affected by outliers *doesn't assume any particular distribution*

multiple linear regression - the null hypothesis is that the parameter pervades

no information so that the parameter value for that parameter = 0

Is the SD the same as the SEM?

no, the SEM is always smaller bigger population, smaller SEM SD = more variation, bigger SD

Can the mean and median be computed when n=1? n=2?

no, you need more than one value yes, median = take the average with even numbers

In relative risk, what does the p value depend on?

sample size and how far the relative risk is from 1.0

Random (representative) sample

assumption that the sample of data analyzed is sampled from a larger population of data at which you want to generalize

Odds ratio = Log of odds ratio =

asymmetrical symmetrical

pooled data is not independent

independent subjects

95% CI if they do not overlap, that means

p value is way less than 0.05 = significant

two groups of subjects are selected - one with the exposure and the other without - observes over time to determine incidence rates

prospective study (longitudinal study)

If a patient lives past the time frame of the study - how is this recorded?

*censored out* cannot tell how long patient would have lived and *cannot include any information after the study has been completed*

The Odds Ratio what is best summarized? what is being compared?

*case-control studies* *probability*: probability event will occur is the fraction of times you expect to see that event *odds*: probability that the event will occur divided by the probability that the event will not occur

multiple linear regression model do not distinguish between the _____ and the other ____ .

(x) variable you really care about (x) variables you are adjusting for (covariates)

Logistic Regression

*2 possible outcomes* *uses logarithms*

data capable of possessing *any value in a given range* - what type of variable is this? give an example.

*Continuous Variable* ex: BP, temperature, height and weight (there are distinct range between each order)

Comparison of linear regression and correlation

*Correlation*: don't have to think about cause and effect (just *relationship*) - *doesn't matter which is X and Y* *Regression*: need to think about *cause and effect* *best line that predicts Y from X*

Median and Percentiles - which variables are appropriate here.

*NOT* nominal - can't order them or find a median value

categorized in ordered groups with equivalent intervals between the variables but zero point is meaningful - what type of variable is this? example?

*Ratio Variable* - *zero is meaningful* cannot be negative!! ex: body weight, temperature in kelvin, mass, distance the ratio here can be calculated (determine the risk/odds of something)

SEM - Standard Error of the Mean

*SEM = (SD/square root n)* *does not tell* you *scatter or variability* but based on #observations *larger the sample size = reduced, close to zero (smaller SEM)* (ex: average IQ of PA student in this room - not very likely it is the true average until you add more and more people)

two ends of the CI are called

*confidence limits = value* (high/low) *(one limit to the next = range)*

association between two continuous variables can be quantified by the ____. What can be quantified? what values are used?

*correlation coefficient, r* quantify *direction and magnitude of linear correlation* *-1 to 1* (ex: insulin sensitivity in men)

End point

*defined consistently!* ex: all cause mortality or do you want patients to be censored for other deaths? created ambiguity if not transparent about censors

One-way ANOVA how does this work? what would the null hypothesis be here?

*determine sum of squares by fitting alternative models* null = all populations share the same mean, any difference is due to chance

In the statistical sense, a confounder is an ______ that correlates with _____.

*extraneous variable* *correlated with both dependent and independent variable* gotta know

What is the goal of linear regression?

*find values for the intercept and slope that are most likely to be correct and quantify the imprecision with the CIs*

Multiple Linear regression

*finds linear equations that best predicts Y from multiple independent variables*

Model with more parameters almost always ______

*fit the sample data better* but do not always reflect population

SEM is not a measure of the data, but rather

*how well you know the population mean*

One-way ANOVA compares

*means of three or more groups* assuming all values are *gaussian (parametric)* ex: non-runner, recreational runners and elite runners *logarithm* before analysis

Median survival is *undefined* when

*more than half* of the subjects are still *alive* at the end of the study can only be measured when half of the patients die!

Mann-Whitney what is this? hows does this work? assumption?

*nonparametric* test to compare *two unpaired groups* and to compute a p value must rank each group - calculate the mean rank of each group *samples are randomly sampled and are independent*

What is the formula for power? what is this also known as?

*power = 1-beta (false negative rate)* SENSITIVITY

One-tailed p value

*predict prior* - based on data, physical limitations, common sense tells you that the difference can only go in one directed ex: anti-DM meds - should have a decrease in glucose levels

Statistical hypothesis testing is used in _____ but rare in ______.

*quality control* rare in exploratory research

Fitting model to data

*regression* fits a model to data can adjust value of parameters to make predictions - *model is fit to the data* "model are shoes and data are feet"

Entry criteria should this change? why or why not? example?

*should not change over enrollment in a study* ex: improved diagnostic tool would identify subjects earlier and affect survival time --> if you have a new tool to detect new cancer earlier, will change the study completely

Correlation does not necessarily imply

*simple causality* --> two variables can be correlated without direct causation

Gaussian distribution is formed only when each

*source of variability is independent and additive with others* gotta know

When the mean equals 0 and the SD equals 1.0 the Gaussian distribution curve is called

*standard normal curve*

Median Survival

*summarize entire survival curve by one value (median)* ex: good way to compare different treatments - cancer treatments and see how it affected survival get the median value - see how long it takes for *50% of the patients to have died (cross x-axis)*

Multiple comparison of CI applies to ______. if CI crosses 0 =

*the ENTIRE FAMILY of comparison* - not just each individual interval if CI crosses 0 and the values are the same = no statistically significant

*Two populations have the same standard deviation even if*

*their means are distinct*

CI is inversely proportional

*to the square root of the sample size*

Censoring should only be

*unrelated to survival* ex: if patient is cured or is too sick to want to continue still related to the disease progression and will skew/bias results

SD will equal zero when _____. When will SD be negative?

*values are identical* *NEVER NEGATIVE* gotta know

What is the z value?

*z = (value = mean) / SD* z = *number of SD away from the mean*

Examples of when to use pair t test

comparing -twins -same person - opposite sides of the body -same subject (before/after) -income level

What are multiple regression methods used for?

-assess *impact* one variable while *adjusting* for others -create *equations* for predictions -under stand how *various variable impact outcome*

Most variation is likely due to biologic variability

-circadian variations (temperature change in day/night) -aging -alterations in activity -diet -mood

An experiment has more power to

-find big effect -when the data are very tight (little variation)

Goals of Multivariable regression in the lab? observational studies?

-in lab, you can control all the variables -observational studies = outcome measured may be affected by multiple variables

Accurate data

-measured the same way -mortality -voting poll - if there is intimidation or wording the question weird

Correlation : what assumptions can be made?

-paired samples -independent observations -x values were not used to compute y values -LINEAR -no outliers (accounted for but not included when computing correlations)

Censored Survival Data

-rolling enrollment period - taking new patients later enrolled subjects - not followed a long loss of follow up or surviving past the study period

Can the SD ever equal zero? can it be negative?

0 = only if every single value came back the same negative = NO

The proportion cannot go below ____ or above _____. The further away from _____, the more lopsided the range will be.

0.0 1.0 0.50

The more comparison, the more difficult to find significance when corrected - why?

0.05 gets divided up more and more

With K dependent variables, the chance that all will be not significant is _____. The chance that one or more will be statistically significant ____.

0.95K 1.0-0.95K (increase comparisons, you are likely to find statistically significant results by CHANCE)

What are the 3 approaches to sample size?

1) *Ad hoc* - collecting data *as it comes in* (if you do not get results you want, keep looking until you find one = type 1 error!) 2) *Conventional* - choose sample size, collect data, analyze - *no adjustments* 3) *Adaptive trials* - *modify* study (ex: shorted or lengthen clinical trial)

To interpret the CI, you must accept the following assumptions:

1) *Random sample* - representative of the population. (do we represent PA students across the board? - more of a convince sample) 2) *Independent observations* - each value has to be separate from another ex: siblings have to be independent - can skew the data 3) *accurate data* - cosmo IQ test is not accurate or all rectal temperatures 4) *assessing an event you really care about* 5) *the population is distributed in a Gaussian manner* - normally distributed data

What 3 things to you need to calculate the CI?

1) SD 2) number of values in the sample 3) degree confidence

T/F Multiple comparisons tests use data from all groups, even when comparing two

TRUE

T/F The alpha value is not influenced by the data

TRUE

What are the ways to wring statistically significance out of data?

1) change definition of outcome 2) different criteria for inclusion/exclusion 3) arbitrarily remove outliers 4) clump or separate groups differently 5) different statistical tests

Statistical hypothesis testing automates decision making following what 3 steps

1) define threshold (*p value*) 2) significance level of the test (*alpa*) - usually 0.05 3) hypothesis testing

What are the 3 types of study design?

1) descriptive 2) Experimental 3) Observational

Two-way ANOVA Interaction First Factor Second Factor what is the null hypothesis for these?

1) null = there is no interaction between the two factors 2) the population means are identical for each category of the first factor (running) 3) the population means are identical for each category of the second factor (age)

Simple linear regression Multiple linear regression

1) one X variable 2) two or more X variables

What assumptions are made when analyzing the results of a t test?

1) random (or representative) samples 2) independent observations (can be before, after, not related) 3) accurate data 4) values in population are distributed in gaussian manner

How is the CI calculated?

1) sample mean - best estimate of the population mean 2) standard deviation - the width of CI is proportional to the sample SD, want small SD

Null for correlation = Null for linear regression =

1) there is no correlation between X and Y 2) a horizontal line is correct (mean different things but the p values are identical)

"fits the best" "fits a simple model"

1) through graph of data points 2) determine the most likely values of the parameters that define that model (slope/intercept)

What are two examples of binomial studies? what are they comparing?

2 options - (yes/no, male/female) 1) Fisher's exact test = compares two unmatched (unpaired) groups 2) McNemar's Test = compares two matched (paired) groups

*SD quantifies variability* - an be computed from

2 or more

What is the rule of thumb for standard deviation?

2/3 of the observations in a population usually lie within the range defined by the mean minus 1 SD to the mean plus 1 SD 2 SD out = 95% of values falling in that (refers to most people but not everyone)

What is the minimum amount of subject needed for sample size?

30!

If the null hypothesis was true (no difference between 3 groups) - there would be ____% chance that each t-test would yield a significant p value

5% *with 3 comparisons, this would be WAY HIGHER*

Familywise Error Rate comparison made individually *without correction* for multiple comparisons

95% CI (assuming null hypothesis is true = not significant) --> other 5% applies to each comparison

If you go 2 standard deviation from the mean, what percent of the sample size will fall into this category?

95%!

Which is the most narrow? 90% CI, 95% CI, and 99% CI?

99% - wider margin of error

SEM can be used to construct

confidence intervals

collection of statistical models which the observed variance in a particular variable is partitioned into components attributable to different sources of variation

ANOVA

provides statistical test whether or not the means of *several groups* are all equal

ANOVA -generalizes t-test to more than two groups

If you have 3 groups you want to compare x, y, z to, which test would you use?

ANOVA!

caused by *any factor* that consistently alters the results - not just the preconceived notions of the experimenter

Bias ex: temperature of the room, if the AC went out.

categorical outcomes with *two distinct possible outcomes*. What is this? Give an example.

Binomial Variable - only TWO answer CHOICES ex: male/female, yes/no, alive/dead (*no order to magnitude, not numeric*)

T/F Intervals between values using ordinal variables may not be equal.

TRUE (ex: poor, fair, good, very good)

The values around the line =

confidence intervals

As you increase sample size, you shrink the

CI

two possible outcomes summarized as a proportion or a fraction, or a range with upper and lower limits

CI of proportion

compares observed and expected numbers of subjects in each category

Chi-square test

categorical outcomes with *more than two possible outcomes*. What is this? Give an example.

Nominal Variables - think "name" ex: eye color, blood type (usually *not numbers, no order or magnitude*)

A relative risk between 0 and 1.0 means the risk _____ with treatment.

DECREASES no difference = 1, if CI crosses 1 then it is not significant

Dependent Variable Independent Variable

Dependent = Y Independent = X (one or more predictors)

T/F You can use a tail test to compare three or more groups

FALSE

T/F The line that predicts Y from X is the same as the line that predicts X from Y

FALSE (one is inversely (-) related and the other is proportional (+))

clumping or separating groups differently

Gerrymandering

Why can't relative risk be calculated from case-control data?

INVALID we do not have data of entire population -incidence can't be calculated from this data

reciprocal absolute risk reduction

Number needed to treat how many people do I need to treat to actually make a difference? *low the better*

The larger the SD, the _____ the CI. why?

larger! mean is less certain (more variation = larger SD)

Mean, SD, SEM - which variables are appropriate here.

Interval and Ratio variables

continuous spaced with *equal intervals* or distances and the zero point is meaningless- what type of variable? give an example.

Interval variable ex: IQ *zero is not meaningful* - does not mean that there is nothing there (for example, zero degrees F - there is still temperature) can be negative!

The higher the degree of CI, the _____ CI.

LARGER want to have higher percentage to ensure you include the true mean if you only had 50% - more likely to make a mistake and the mean will not be included (type 1 error)

The simplest robust statistic is the

MEDIAN *not influenced by outliers* (mean is not robust because influenced by outliers)

methods that simultaneously compare several outcomes at once or one outcome and several independent variables

MULTIVARIATE

when comparing two unpaired groups, which test should you use?

Mann-Whitney

the value that occurs most commonly in the data set

Mode not used too frequently - does not tell us anything about the center

If mom has twins and does drugs while pregnant or they both have a genetic anomaly - can this be used in the CI?

NO these factors are not independent of each other

Does it matter whether the two groups have different numbers of observations?

NO *t test does not require equal n*

Could you graph when someone gets the flu on a survival curve?

NO - that is a recurrent even for some individuals

In paired t test - if the interval of difference in measurements of two matched subjects include 0 =

NOT significant

Statistical significance of a 95% CI for the difference between two means includes zero =

NOT significant

express *rank and order* matters - what type of variable? Give examples.

Ordinal - think "order" ex: pain scale 1-10 or restaurant ratings 1-5 (the numerical order matters but they do not quantify anything - for example, someone could have pain at a 10 but that someone with the same pain could be at a 6 - subjective data)

numerical representation of the *degree to which random variation* alone could account for the differences observed between groups or data being compared

P value

When is a test statistically significant?

P value is less than the alpha level

Two-tailed p value - Why are these used more often? when is this rejected?

P values are larger and more conservative (more consistent) reject: statistic fall into either tail of sampling distribution

th odds that a patient has the disease, taking into account both the test results and prior knowledge of patient

POST-TEST ODDS

Assumptions gives the multiple comparisons test more

POWER

the odds that the patient has the disease determined from information you know before testing

PRE-TEST ODDS

The Odds ration what are the ranges? Probably and odds

Probability = 0-1 Odds = any positive number or 0

*If p value < alpha, you should _____ the null hypothesis*

REJECT

the fraction of the variance shared between two variables - *coefficient of determination*

R^2 (percentage of variation - one variable to another)

the square of the correlation coefficient is an easier value to interpret than r

R^2 *will always be positive - between 0 and 1*

Coefficient of variation - which variables are appropriate here.

Ratio Variables

The width of CI depends on ____.

SAMPLE SIZE larger = narrow CI (closer to the population mean and more confident) Wide CI = outliers

SD is affected by: SEM is wider with:

SD = affected by spread of variability SEM = wide with a small group

What does the SD quantify? SEM quantify?

SD = variability of data around the mean SEM = take into account the scatter and the sample size

works in the *opposite direction* - start with one set of data (sample) and make inferences about the overall population or model

Statistics

T/F A high p value does not prove the null hypothesis

TRUE

T/F A small effect in a study with a large sample size can have the same p value as a large effect in a small study

TRUE

T/F Chance alone can cause a cluster

TRUE

T/F It can be counterproductive to reach a definitive conclusion of statistical significance.

TRUE

T/F The MEAN is sensitive to outliers.

TRUE outliers (high or low) skew the data ex: someone who has a fever or hypothermic - skew the data higher/lower Median will cancel out those outliers.

T/F All Gaussian distributions can be converted to a standard normal deviation

TRUE!

If you keep running the same experiment over and over until you find a "difference" - what type of error is this?

TYPE 1 false null hypothesis when it is actually true! (false positive)

The goal to compare every mean with every other mean - results include CI and conclusions about statistical significance.

Turkey's Test

When is the mode useful? not useful?

USEFUL = variables that can only be expressed as integers NOT useful = continuous variable

Which test compares two paired groups?

Wilcoxon Matching Pairs Single Ranks Test

Is the gaussian distribution the same as normal distribution?

YES

If you found a 1mmHG difference in your experiment and the p value was 0.0001 - would this be considered significant?

YES doesn't matter the size of the observed effect

Can there be more than one independent and dependent variable? If so, give an example.

Yes! ex: if you want to monitor the age and the gender of cardiovascular death in patients with CHF independent = age and gender dependent = CV death (could also add in stroke or other outcomes as well)

Predictive Value of a Test abnormal = normal =

abnormal = incorrectly identified normal = actual negative/postitive *depends on population

Experimental Study what is this? what is included? example?

active intervention of the investigator control group = not getting anything or getting gold standard experimental group = getting new medication ex: controlled trial clinical trial

What are the advantages and disadvantages of case-control studies?

advantage: quick, small sample size from previously recorded data disadvantage: need to pick the right controls to control extraneous factors (age, gender, socioeconomic factors) that might confuse results (can lead to bias)

Independent Observation what is it? what should be excluded?

all subjects are sampled from the same population and independent of each other ex: does not include twins or siblings (genetic factors or similar environmental factors) husband and wife may be excluded 1/2 from one city 1/2 from another

Multivariate Analysis what is this? example?

allows you to isolate the results to just look at what is needed ex: HTN, DM, HPL - all of these can cause heart disease allows you to focus on physical activity and coronary events

Confidence Intervals

always centered on the sample mean - total population mean cannot be known ex: n=130 v n=12 (larger = smaller SEM = better)

If a patient presents to the ER with adbomdinla pain - what is the likelyhood they will have an appendicitis?

appendicitis and vomiting - very low appendicitis and RLQ pain = more likely *two sings are equally helpful in distinguishing appendicitis from non appendicitis when they are present but not when absent*

5-year survival what is this comparing? what type of curve is being used? example?

arbitrary compare treatments and see the survival rate Kaplan Meier curves ex: 5 year survival with cancer

relative risk v attributable risk

attributable = subtracting one from another If the CI crosses 0 = not statistically significant

risk of having disease progression in the case

attributable risk

When things only go to 100% and when you have a small number - father you get from 50% mark

atypical CI

X and Y values are not intertwined - what would you want to avoid using?

avoid things where you know one directly affects the other ex: glucose and AIC use insulin sensitivity and fatty acids instead

Case Control Design what is this? example?

begin with the outcome - looking for future of people who share outcome RETROSPECTIVE ex: lung cancer those who develop v did not develop looks at risk factors

Most scatter in biologic and clinical studies is likely caused by

biologic variation

Cross Sectional Design what is this?

blends case-control and follow-up cohort - assess outcomes, descriptive features and potential predictors *slice in time or prevalence survey* less common!

two groups of subjects are selected - one group has disease and the other does not but are selected to be similar in may ways (controls) - investigators look back to determine risks

case-control study (ex: lung cancer v not)

Two-Way ANOVA and Beyond

categorized by two ways (type of runner and age) simultaneously tests 3 null hypotheses and 3 p values

Starting time what is this? example? what type of data is used?

clearly *defined* ex: first hospital admission, diagnosis, birth, death, etc. - OBJECTIVE DATA need to have concrete *documentation* and no recall bias from patients pt who die early should not be removed - may skew results and create bias

Idea there if you have 2 measurements and one is far from the mean - the second is more likely to be

closer to the average

smaller SEM =

closer to the average of the population (BIG SAMPLE) *smaller than the SD!!!*

Repeated-Measures ANOVA are used to

collect data in 3 experiments 1) measurements made repeatedly 2) subjects recruited as matched sets - each receives different intervention 3) lab experiment run several times

Retrospective study =

collecting data after the even or outcome has already occurs ex: collecting data on death of premature babies (22-25 weeks gestation)

Wilcoxon's Test what does this test? how does this work? assumption?

compares *two paired* groups - tests the null that there is no difference -rank the absolute value of all differences temporarily -add up ranks of positive and negative differences -compute differences between those two sums *assumes that the pairs are random and independent*

occurs when factors relate to both the characteristic under scrutiny and the outcome appear as competing explanations

confounding (ex: cigar and male pattern baldness - age all the time could be a confounder in this study)

If the p value in a normality test is large, this means the data is ______ with gaussian distribution.

consistant *cannot reject the null*

Frequency distribution? what is this? what variables does this apply to?

counting how many subjects there are this applies to all variables (nominal, ordinal, interval, ratio)

single sample of subjects is selected without regard to either the disease or the risk factor - divided into two groups based on previous exposure

cross-sectional study

"censor" refers to the

data

Continuous Data - what is it? examples?

deals with results that are continuous ex: BP, enzyme activity, IQ, hemoglobin, oxygen saturation., temperature usually graphed!

Survival data what are you interested in?

death - only *happens once* graph will show percentage survival as function of time

the outcome variable. What is this? Give an example.

dependent variable ex: what the outcome was - ex: grade

Correlation Coefficient r value -1 to 1 = 0 = + = - =

depending on slope of the line 0 = no variability, cannot draw a line, completely scattered results + = variable increase and decrease (proportional) - = inversely related (vary together and graph as a straight line) *the wider the r value = hight and very unlikely due to chance alone*

How does SD actually factor into the bell shaped curve?

depends how many SD out from the mean = *1 each way = 68%* *2 each way = 95%* *3 each way = 99%* there are a few outliers but *95% confidence interval* whole curve = 100%

Chi Square test what is combined? small p value means? large p value?

discrepancies between observed and expected to calculate p value small = due to some factor other than chance large = does not deviate from theoretical distribution anymore than expected by chance

Frequency Distribution

dives the ranges of values into a set of smaller ranges and graph smaller categories = better idea of the spread of the data larger = less detail both show the same data

You will make less errors when sample sizes are close to

equal = large differences (probably non-gaussian data)

comparing groups together - better concrete conclusion about X causing Y

explanatory studies

For two tail test - if the null hypothesis for value of the test statistic

falling into either tail of its sampling distribution

If you choose to do a one-tailed p value incorrectly =

find statistically significant results even tho they are not = type 1 error false postitive

What are the goals of multiple linear regression?

fits model to data *find the values for the coefficients that make the model come as close as possible to predict data* -best-fit value -p value for every independent variable

Parametric what distribution? small samples?

gaussian population small samples = misleading not very robust to violations of the Gaussian assumption

Degree of Confidence

give a wider interval to make sure it is within that particular range

Familywise Error Rate multiple with corrections - what is the goal? what type of error is most common here?

goal = 95% chance obtaining zero statistically significant results --> other 5% broken up depending on comparison more likely to make Type II error (false negative)

R2 in The higher the R2, ____. What if R2 = 1?

higher = means that one independent variable is really describing the relationship with the dependent R2 = 1 --> no variability and all data points are on the line

defined as a frequency distribution plotted as a bar graph

histogram

Box-and-whisker

horizontal = mean/median vertical = outer limits = quartiles (25% or 75%) *does not have to be centered around the median min v max - can change depending how they represent that data

The regression line fits the data better than a _____ line. What does R2 look like here?

horizontal line sum of squares is lower

Survival analysis must take into account

how long each subject is known to have been alive and following the protocol of the study

In linear regression model, R2 =

how much variance is counted for *low* = model is not good at representing variability* (independent variable not good at estimating sensitivity)

computes a range that you can be 95% sure would contain the experimental result if the null hypothesis were true

hypothesis testing approach

If the model is correct, there is an

ideal population for each of these odds ratios

If you set alpha low, you will make less type 1 errors

if null is true, only a *small* chance that the result will be mistaken for significant

If you set the alpha very low, you will make many more type II errors

if the null hypothesis is false, *greater* chance that you will not find significant difference (false postitive)

rate of new cases of disease

incidence

If the p value in a normality test is small, this means the data is ______ with gaussian distribution.

inconsistent (can reject the null hypothesis --> samples are NOT sampled from Gaussian population) *outlier not from Gaussian distribution

Type 1 Error

incorrect rejection of the true null hypothesis (*false positive*)

Type II Error

incorrectly retaining a false null hypothesis (*false negative*)

Doing multiple t-tests can result in ______. what test would be more useful at comparing two+ means?

increase *type 1 error (false positive)* ANOVA!!

What would increase the SD?

increase difference between mean and values (more spread out values) decrease #observations (the more people there are the more likely find the real mean = lower the SD)

If the odds ratio is way less than 1.0 =

increase in independent variable is associated with *decrease likelihood of the outcome occurring*

If the odds ratio is higher than 1.0 =

increase in independent variable is associated with *increase likelihood of the outcome occurring*

Power of a normality test =

increase sample size

How do you narrow the CI?

increase the number of observations you have want to have a narrow range to find what the actual population range will be

Relative risk goes down with

increased activity

experimental variable, the variable that is changed in the experiment. What is this? Give an example.

independent variable ex: grade of male v female students or age range of students

If the odds ratio is near 1.0 =

independent variable = little impact

If you have new analgesics at different doses - what is the independent variable? dependent variable?

independent: doses dependent: pain level

Each observed value must be _____ and expected value must be _____.

integer fraction *sum of all observed value must equal expected value*

Multiple comparison tests account for ______ comparisons

intertwined

explanatory =

interventional study (author detailing experience with cases, practices, treatments or making comparisons)

Pre-test Odds what must you know? what is the calculation

know the patient history! *post-test odd = sensitivity/(1-specificity) = pre-test odds * likelihood ratio*

The CI has to be _____ if you want to be 99% certain that the true mean lies within it.

larger ex: 8.2-11.5 instead of saying the value is 10 or having a smaller range

If you increase the sample size, is the SEM expected to get larger, smaller to stay the same?

larger sample size = smaller SEM

Bayesean Logic what is this?

likelihood probability v odds Probability = fraction of time expect to see event (*0-1*) Odds = probability the event will occur divided by the probability that the event will not occur (*0 - infinity*) Posttest Odds = Pretest odds ∙sensitivity/(1-specificity) = Pretest odds ∙ Likelihood ratio

based on symptoms for a patient, how likely is it if they have this finding that they are going to have the disease in question?

likelihood ratio

the probability of obtaining a positive test result in a patient with the disease divided by the probability of obtaining positive test result in a patient without disease

likelihood ratio

mathematical relationship between model parameters and the outcome

linear

When using the larger two-tailed p value, you are less likely to

make a type 1 error

When can you compute *mean* survival time? when would you use *median* survivals?

mean - you would have to wait until *all the subject have died* and there is no time to wait that long , otherwise can't calculate mean median - just have to hit *50%* mark (unambiguous)

Mean v Median

mean = average median = middle value

Overlapping error bars and the t test

mean you cant conclude degree of significance - just shows variability

What is the most important part of a peer reviewed article?

methodology! very important - review the materials and methods and details should read this section first - focus on the design

Survival curve what is it? what is graphed here?

misleading plot time to any well-defined point or event (positive or negative) cannot recur - need to know exact date the event happened (can look at death, brith or when someone was diagnosed with HIV)

larger the population --> more ____ --> ____ p value ----> less likely _____ error.

more power small p value less Type II error

The larger the SD, the

more spread out the distribution of data about the mean

If you increase the population size by 4, your CI will

narrow by a factor of 2

Average survival doesn't change during the study

nature of disease changing very quickly (flu)- may no longer apply *supportive care should not change* over the period of the study

Add or subtract - which variables are appropriate here.

need to have *definite difference* in values *Ratio and Interval variables*

null hypothesis what is it? do researchers want this to be true or false?

neither one of the factors should skew the results one way or another (coin flip = 50:50) want to be false to prove that there is a difference in the two things being tested (ex: HTN meds - want to disprove null hypothesis to show that the drug is working. If the null was true then neither the drug or placebo would have an effect on BP)

Gaussian Distribution what is it? horizontal shows? vertical shows?

normal distribution ---> bell curve if you have a large enough sample set, everything will revert back to the average horizontal = various values that can be observed Vertical = quantifies their relative frequency (highest at the mean = more people hitting the mean value)

used to quantify how much a data set deviates from the expectations of a Gaussian distribution

normality tests (check to see if truly Gaussian - increase in values increases the extremes and more like gaussian curve)

If 95% CI does contain the value of the null hypothesis, the results must

not be statistically significant

If it is under power, that means

not enough to show there was a difference when there was one

Nonparametric what distribution? small samples?

not gaussian population! --> no assumptions about the population distribution (data may is not normally distributed) small samples = misleading little power with small samples

A normality test can demonstrate that the deviation from the Gaussian ideal is

not more than you'd expect to see with chance alone

If you want to find the average IQ of a PA student in this room - can you apply to every PA program in the country?

not really but increasing the the sample size will give you a better median number = SD lower

Normality test what distribution? small samples?

not sure which distribution small sample = not very helpful need large samples!!

The number expected (exposed) =

null hypothesis

there is *no difference* in the population parameters amount the groups being compared and that any observed differences are simply a result of *random variation in the data* rather than results of actual disparity in the data itself

null hypothesis

The p value is computed assuming that the

null hypothesis is true - *not the probability that it is true*

NNT

numbers needed to treat (how many people do you need to treat to see a positive outcome)

which tests are most useful when trying to explain links between disease states and their outcomes or certain exposures and outcomes

observational *2 different groups without active intervention*

Odds - you are looking Probability - you are looking

odds = retrospective probability = looking forward *can be used together, yet different

If the observed difference went in the direction predicted by the experimental hypothesis,

one tailed P value is half the two-tailed P value

The *paired t test* compares two matched or pairs groups when the

outcome is *continuous* (ex: looking at patients before and after treatment in pairs)

The MEDIAN is not affected by

outliers!! this can help determine if the mean has been skewed or not (those do not skew the middle number - bell shape curves)

Unpaired t-test, ANOVA paired test assumes Gaussian distribution - can be defined as

parameters

parametric v nonparametric

parametric = normally distributed nonparametric = not normally distributed

Scatter plot what is this? horizontal position? vertical position?

plot of all values and a smaller random subset *Horizontal* = arbitrary and avoid overlap. commonly shows mean/median *Vertical* = represents the measured value not useful for larger data sets!

If all the subgroup comparisons are defined in advance

possible to correct for many comparisons - less error (0.05/# subgroups)

sample size is related to

power

probably that the test will reject the null hypothesis when the null hypothesis is false

power (probability of not committing a Type II error or making a false negative decision)

results are based on the characteristics of the test and the prevalence f the disease in the population being studied

predictive value of a test -if positive, what chance pt really has the disease? -if negative, what chance pt does not really have the disease?

fraction of the group that has the disease

prevalence ex: smoking - divided into smoke v no smoke

According to relative risk, anything greater than 1______ and less than 1 is ______. What happens if the CI crosses 1?

puts you at risk protective CI crosses 1 = not statistically significant

Error? what is this? can be due to? example?

random chance - natural variation over time/*variability* can be due to imprecision or experimental error ex: measuring temperature at different times throughout the day or measuring temperature from a different location (oral v rectal)

Y = mx+b what does this not account for?

random variation standard assumption that random variability = gaussian distribution (higher slope = steeper line)

In relative risk , if the null hypothesis is true, there is less than a 0.01% chance of _____.

randomly picking subjects with such a larger difference in incidence rates (0.01% chance that it is due to chance alone)

Descriptive Study what is this? what is not included?

record events, observations and activities does not include: -detailed explanations of cause of disease to evidence to evaluate efficacy of new treatments

a measure of the strength of association between a particular exposure (risk factor) or intervention and an outcome

relative risk

the ratio between two proportions - progression in the treatment group/progression in the placebo group

relative risk (ex: subjects with AZT progression in tx = 16 progression placebo = 28 16/28 = 57% Subjects treated with AZT were 57% as likely as placebo group to have disease progression)

Error Bars

represent variability - indicate uncertainty represent SD, error to CI - to see what it is referring to

The larger the odds ratio, when the results are repeated and make sense biologically =

results are more likely to be true

the probability of something happening to you?

risk

If you have a worried mother and wants to know the risk of her baby dying premature, how would you give a confident answer?

sample size representative of general population - CI calculated -nothing is absolute!

sample will contain a smaller or larger fraction of people voting than does the overall population

sampling error if you pull votes from a tiny town - will not apply to the entire state or country

Multiple comparison tests after ANOVA

see which are statistically distinguishable *increase likelihood of finding false positive if alpha = 0.05 (Type 1 error)* --> decreases power to make type 2 error

the fraction of all those with the disease who get a positive result

sensitivity (measures how well a test identifies those WITH DISEASE)

what is the formula for likelihood ratio?

sensitivity / (1-specificity )

the accuracy of a diagnostic test is quantified by its

sensitivity and specificity

predict how the change in one value can predict the change in another value

simple linear regression *finds the line that best predicts Y from X*

What should be done if the normality test fails?

small p value = data from another identifiable distribution *run an outlier test* *switch to nonparametric tests* = do not assume gaussian

The larger the sample size, the ___ the CI. why?

smaller! mean is more certain (closer to the population value - more reasonably sure it is accurate)

If you increase the sample size, is the SD expected to get larger, smaller or stay the same?

smaller. but want more values so that extreme values that cancel each other out

the fraction of those without the disease who get a negative test result

specificity (measures how well the test identifies those who DON't have the disease)

the variation among values expressed in the same units as the data

standard deviation

Follow-Up Design what is this? example?

start with people who have yet to experience outcome PROSPECTIVE - time intensive Cohort - diving up based on risk factors and looking at outcomes ex: smoke v no smoke

If a 95% CI does not contain the value of the null hypothesis, the the results must be

statistically significant

If the CI for the difference between two means does not include zero (null hypothesis) then the result must be

statistically significant

If the CI for the ratio of two proportions does not include 1.0 (null hypothesis, then the result must be

statistically significant

Actuarial method

survival time graph *x-axis divided up into regular interval*

Kaplan Meier method

survival time is recalculated with each patient death *preferred method unless study is very large* *account for all censored patients!* - all start at day one based on when they started the study

SEM, if they overlap that means

the P value is greater than 0.05 (not significant)

The larger the risk or rate ratio, the stronger

the association (more likely to occur) ex:

Multiple comparison tests can declare ________ but do not _____.

the difference between pairs of means to be statistically significant but DO NOT compute exact p values

The difference between ordinary and repeated measures ANOVA is the same as

the difference between unpaired v paired t-test

5% significance level applies to

the entire set (family) of comparisons - family wise significance level

For starting time - intention to treat is considered

the standard - assigned treatment even if not given

The smaller the p value

the stronger the evidence to *reject* the null hypothesis *any differences observed are real not due to chance alone *Statistically Significant!!!*

If the CI includes 0, this means

there is no relationship between the two values

A wider standard deviation means

there is wider variation - data points are more spread out

Bonferroni Correction

to achieve family wise error rate = divide value of alpha by # comparison -define any of the comparisons statistically significant (when p value less than alpha)

Why do you want a narrow CI?

to ensure that the true mean value will be within the range

A model with too few parameters = too many?

too few = won't fit the sample data well too many = will fit well but CI will be wide (not as confident)

If any of the assumptions are violated in the CI, what will happen?

too optimistic = too narrow (true CI will be wider than what is calculated)

the mean of most of the values - ignoring the highest and lowest (usually percentage) - what is this? example?

trimmed mean ex: olympic judges - take out the outliers not done frequently in medical literature

binomial distribution

two possible outcomes

Familywise Error Rate - more comparisons without correct = larger chances for

type I error

Bonferroni correction increase risk of what type of error? why?

type II - do not find difference when there actually is one -p value is so small, hard to find statistically significant (ex: 20 comparisons: w/o correct = 65% of obtaining one or more statistically significant results w/ correction = 0.05/20 = 0.0025 change of statistically significant )

Non-parametric tests are more robust = more like to make ____ error

type II error (will not see difference when there is one, false negative)

an underpowered study is wasted effort because

type II error *true effect may go undetected*

What is wrong with computing several t-tests?

unpaired - would require comparisons for each running group (3 times) the more you do the experiment, the more likely to make a *type 1 error* multi comparison = *greater change of observing one or more significant p values by chance*

compares the means of two groups, assuming data were sampled from a gaussian population

unpaired t test (*comparing two means of two separate groups*)

If you were polling people to vote - how could the sample not be random? what would this lead to?

using phone books and auto registrations to contact people to see who they would vote for back in the day - this was all the rich people which excludes a lot of the population = not representative of the entire population BIAS

"Normal Distribution"

values based on the bell curve ex: normal sodium (135-145) - give a range due to normal variation even if they are out of the range, they can just be outlier if still close to the range

Unpaired t test so not distinguish _____ among subjects from differences cause by treatment

variabilities *independent values = no twins, siblings, couples, etc.

If the 95% CI in a linear regression is 16.7-57.7, what does this mean?

wide, but does not include 0 *strong evidence unlikely due to chance of random sampling*

Is a 99% CI wider or narrower than a 90% CI?

wider - in order to get the outliers

If an experiment is watching the survival rate of cancer patients but one patient gets into a car accident, Would this be censored?

yes - they did not die form the cancer (removed from the analyses but they *MUST BE ACCOUNTED FOR*)

If you have the alpha value at 0.05 and the p value is 0.049 - does that mean it is statistically significant and 0.051 does not? what type of error could this be?

yes but not really may be a type II error - should run experiment again (false negative)

sample size is larger better? why?

yes, will get closer to the true value (CI will get smaller and more confident that is the true mean)

One-tailed p value should only be used if

you have *predicted which group will have larger mean BEFORE collecting data* if the other group ends up higher, you would have to attribute that to chance = "not statistically significant" because you weren't testing for it in the first place (can redo experiment again)

What is multiple comparison error?

you keep running an experiment until you find the answer you are looking for or find a difference


Ensembles d'études connexes

The Child with a Gastrointestinal Condition chapter 28

View Set

Chapter 11- Athletic Training:Sal Monty

View Set

Marketing 301 Chapter 12 Questions

View Set

Unit 10: Human Health and Air Pollution

View Set