E1: Epi/Bio (All)

Ace your homework & exams now with Quizwiz!

can you compute a coefficient of variation with interval variables?

no you can only compute a coeffcient of variation with ratio variables -you can get SD, SEM, add, subtract, median/percentiles, and frequency of distribution with interval variables

gaussian distribution is called a normal distribution in the statistical sense but does this also refer to normal in the clinical sense?

no! it would not make sense to define clinically normal limits symmetrically around the mean (temperature, lab values, etc)

A study is looking at the effects of exercise capability in a patient pre and post a new energy drink. They run a normality test and get a p value of 0.02 (less than 0.05). They proceed with a a paired T test. Is this appropriate?

no; should be using a Wilcoxon's matched pairs test -since they got a low p value from their normality test, this means there is a difference between their sample data and gaussian data. Differences are likely not just due to chance alone so they should be using a nonparametric paired test -paired test because the values are measured in the same patient

Nominal variables

nominal variable: categorical outcomes with more than two possible outcomes -*no consideration of order or magnitude* -*usually not numeric* (blood typing, eye color)

by definition the CI is always centered around the ____

sample mean -this is saying how confident you are that the values you come up with are similar the true population values

Histogram

A frequency distribution plotted as a bar graph

Ratio Variables

categorized in ordered groups with equivalent intervals between the variables but zero point is meaningful (example: body weight, temperature in kelvin, mass, distance) - Can make sense to compute ratio of two ratio variables - The difference between ratio variables and their ratio can be calculated

a study that includes a *group of subjects that already have the disease/condition you want to study* and the *other group does not* but both groups have very similar characteristics. *Investigators look back in time to determine if possible risk factors were present or not*. This is describing what kind of study

*case-control study (retrospective study)* ex: looking back at people who have lung cancer; was there a common risk factor that could have contributed to them getting this disease?

if you have 300 subjects in your study and at 1 year out you have 200 subjects, at 2 years you have 180 subjects, at 3 years out you have 150 subjects and at 4 years out you have 130 subjects. What is the median survival?

*3 years* since this how long it takes until half of the subjects have died

which approach to sample size is *not recommended*?

*ad hoc* -collecting and analyzing some data. if CI are not as narrow as you like or results are not statistically significant, *collect more data and reanalyze*--> p values and CIs cannot be interpreted with this method; *not valid* *conventional approach is better* (choose a sample size from power calculation then collect data and analyze--> no adjustments) adaptive trials approach have interim analysis while trial is proceeding; may shorten or lengthen the trial; gaining acceptance with larger clinical trials

if someone is looking at how many males vs females are in a class this would be considered a study using a ______ variable

*binomial (dichotomous) variable* because there are only two possible outcomes and order and magnitude do not matter; you are either male or female -this is just a specific category of nominal variables -another example would be yes/no answers

if you were on rotations and saw a patient with a very rare disease and wanted to publish an article about what you saw. what kind of study would this be?

*case study*;this is a descriptive study in that you are just describing what you observed. -not talking about the cause of the disease or how to treat the disease or comparing it to anything -N=1 since just one person -if you saw the same rare disease in multiple people you could publish a *case series*

test that *compares the observed and expected* values of subjects in each category to see if there is a difference between them

*chi-square goodness-of-fit test* -a small p value in this test would say that there is another factor besides chance is accounting for the discrepancies between observed and expected -deaths of firefighters during different tasks example (null said that the deaths occur randomly not during a particular event)

association between two *continuous* variables can be quantified as

*correlation coefficient (r)* -as you change one variable how does it affect the other? -if as one variable increases, the other increases as well you will see a positive correlation -*correlation does not necessarily imply simple causality* (ex: increased life span in people with more telephones); both increasing together but no direct causation

how can you correct for multiple comparisons

*familywise error rate* -the usual 5% chance applies to the entire family of comparisons--->less likely to make a type I error doing this but more likely to make type II now ex: if you run 5 comparisons, divide the level of significance out; 0.05/5= 0.01; each group would have significance level of 0.01

when would you use a non-parametric test/non-gaussian test?

*if you fail a normality test* (if your normality test gives you a *low p value* your sample set differs a lot from normally distributed data; these differences from gaussian data are not just due to chance alone) when performing a normality test your null hypothesis says that the data are sampled from a Gaussian population *power of a normality test increases with sample size*

incidence vs prevalence

*incidence*: rate of *new* cases of disease *prevalence*: fraction of group that has the disease; *snapshot of the population that has the disease in question*

when looking at survival rates among cancer patients, one of the participants in your study gets hit by a car and dies before the study has ended. You still include them in your study. What is this called?

*intention to treat* -including people even when they didn't have the outcome you were looking for if you were to take this patient out of the calculations this would be *censoring data*; you can do this but must explain why in the study

looking at the body temperature of difference species in *celsius* is what kind of variable?

*interval variables* -a difference means the same thing all the way along the scale no matter where you start but 0* celsius doesn't mean anything -can calculate the difference between these variables (90*C-80*C) but a ratio is not helpful (100*C is not twice as hot as 50*C)

preferred method for creating a survival table and how is it done (know)

*kaplan-meier method* - survival time is recalculated with each patient death (preferred method unless study is very large) -calculate the fraction of patients who survived on a particular day and divide the number alive at the end of the day by the number alive at the beginning of the day; *accounts for censored patients* - Done by computer. -time zero: day that patient entered the study (can expand for years based on the enrollment period for the study).

what is the advantage of using the median over mean in some cases?

*means can be sensitive to outliers* whereas medians are not influenced by outliers

if someone is looking at blood typing this is a _____ variable

*nominal!* there is no order or magnitude and there are more than two possible outcomes (type AB, type A, type B, etc) "nom think name" just labeling what you are measuring; if you assign a number it is not meaningful; *telling you a category or type* eye color is another example

if you are running a study that is looking at *restaurant ratings* on yelp (1-5 stars) with 1 being unsatisfied and 5 being very satisfied what kind of variable is this?

*ordinal variable* because you are *expressing rank and the order matters* -names and orders what you are measuring -*intervals between these values may not be equal* because *you can't tell how far a part the data is*; it's hard to define the exact difference between "satisfied" vs "very satisfied" -other examples include pain scales, level of education, ranking football teams

dependent variable

*outcome variable* -based off of whatever the independent variable is -what is measured during a study in study looking at efficacy of new analgesics at different doses the DV would be the *change in pain scale*

confounding factors

*outside influences that affect a study; things that researchers didn't take into account that could be affecting the bias of the study* ex: looking at people who smoke cigars that are bald -they only looked at men who are more likely to smoke cigars anyways -then looked at specific age range of men and got a completely different relative odds ratio; means age is a confounding factor

two groups of subjects are selected; one with exposure to second hand smoke and one without exposure to second hand smoke. You watch these groups over time to see what incidence occurs whether they were exposed to the risk factor or not. This is what kind of study?

*prospective study (longitudinal study)* -these studies observe *over time* to determine the *incidence* of rates in the two groups -usually more labor intensive

number needed to treat (NNT)

*the reciprocal of absolute risk reduction* -in the HIV example the absolute risk reduction was 12% so the *NNT is 1/0.12= 8.33* -*meaning for every 8 patients receiving treatment you would expect disease progression to be prevented in 1 patient*

the null hypothesis always states

*there is no difference between the things that you are comparing* -observed differences are simply a result of random variation in the data; just due to chance alone -researchers are often trying to prove the null hypothesis false and say that there is in fact a significant difference

what type of error are you likely to make if you do *multiple comparisons*?

*type I error* (saying there is a difference and rejecting null when there actually is no difference) -if you continue making comparisons you will eventually find some statistically significant results just by chance

type I error vs type II error

*type I error*: *incorrect rejection of the null hypothesis* (saying there is a difference between the groups when there actually is not)--> *false positive* *type II error*: *incorrectly retaining a false null hypothesis* (saying there is no difference between the groups when there actually is a difference there)--> *false negative*

Experimental study

- A single sample of subjects is selected and randomly divided into two groups - Each group gets a different treatment (or no treatment) - Outcomes or incidence is observed

Estimate

- A special meaning in statistics - result of a defined calculation - The value computed from your sample is only an estimate of the true value in the population (which you can't know)

Examples of paired t test

- A variable measured in each subject before and after an intervention - Subjects are recruited as pairs and matched for variables - one of each pair receives one intervention, the other an alternative treatment - Twins or siblings are recruited as pairs and receive different treatment - The same subject having a measurement performed via two different methods (BP via manual auscultation or automated machine)

Introduction of an article

- Background information on the topic - Rationale for the study - May review previous literature on topic

Case-control design

- Begin with the outcome - Look for features of people who share that outcome - Case - subject who already has the outcome of interest **Uses retrospective data!! This is useful for disease states that are rare.

Cross-sectional design

- Blends the case-control and follow-up - Begins with a population or cohort - Makes simultaneous assessment of outcomes, descriptive features and potential predictors - "Slice in time" or prevalence survey

How are CIs and statistical hypothesis testing related?

- CI computes a range that you can be 95% sure contains the population value - The hypothesis testing approach computes a range that you can be 95% sure would contain the experimental result if the null hypothesis were true *Any result within this range is considered not to be statistically significant *Any result outside this range is considered statistically significant

What are explanatory studies?

- Comparison - Attempt to provide insight into etiology or find better treatments - Can be experimental or observational

Observational studies

- Comparisons to examine and explain medical mysteries - Researchers are bystanders *Examine the natural course of health events *Gather data about subjects *Classify and sort the data - 3 main types: Case-control, Follow-up, Cross-sectional

Summary or abstract of an article

- Concise statement of the goal or hypothesis of the study - How the endeavor was undertaken - Highlights the results - not detailed, author putting "best foot forward" - Concluding thoughts - "Structured abstracts" - vary from journal to journal

Experimental studies

- Controlled trial, clinical trial - Active intervention of the investigator *Example of a prospective study

Continuous data

- Deals with results that are continuous - BP, enzyme activity, IQ, hemoglobin, oxygen saturation, temperature, etc. - First step should be to visualize it (graph it) - More common than other types of data

Results of an article

- Found in both text and in accompanying tables, charts, graphs and figures - Analysis and some interpretation of the data - Essential information to make an informed reader decision

Discussion or comment of an article

- Further analysis of results - Results and conclusions of other studies may be compared and contrasted - May present oversights and mistakes - May build case to strengthen and support results - Most speculative section

What are the goals of linear regression?

- Goal is to find values for the intercept and slope that are most likely to be correct and to quantify the imprecision with the CIs - Think of the model graphically - Linear regression finds the line that best predicts Y from X - Considers the vertical distances of the points from the line *Ex: Finding someone GFR based on their creatinine level.

Odds ratio for each independent variable

- If the odds ratio is near 1.0, then that independent variable has little impact on the outcome - If the odds ratio is much higher than 1.0, then an increase in that independent variable is associated with an increased likelihood of the outcome occurring - If the odds ratio is much less than 1.0, then an increase in that independent variable is associated with a decreased likelihood of the event occurring

Outlier tests- interpreting a high p value

- If the p value is high you have no evidence that the extreme value came from a different distribution than the rest - Does not prove that the value came from the same distribution as the others - There is no strong evidence that the value came from a different distribution - cannot reject the null hypothesis that the data is sampled from a Gaussian distribution

Outlier tests- interpreting a low p value

- If the p value is small then you will conclude that the outlier is not from the same distribution as the other values - Reject the null hypothesis that the data is sampled from a Gaussian distribution

Adaptive trials approach to sample size

- Interim analyses performed while the trial is proceeding are used to decide the course of the study - Gaining acceptance in designing large clinical trials - May shorten or lengthen a trial

Median survival

- It can be convenient to summarize an entire survival curve by one value - median survival - The middle value (50th percentile) of a set of numbers - How long it takes until half of the subjects have died *Median survival is undefined when more than half the subjects are still alive at the end of the study - Median survival is greater than the last duration of time plotted on the survival curve

why is it better to use the median survival time rather than the mean/average survival time?

- Mean survival time can only be computed when there are no censored observations and the study continues long enough for all subjects to have died - Median survival can be computed when some observations are censored and when the study ends before all subjects have died - Once half of all subjects have died the median survival is unambiguous

References or bibliography of an article

- Most conspicuous in their absence - Can give a clue as to how diligently authors researched and reviewed the literature - Can offer direct links to other related primary research

Goals of Multiple Regression

- Multiple regression fits the model to the data to find the values for the coefficients that make the model come as close as possible to predicting the actual data - Reports the best-fit value for each parameter, along with a CI - One p value is reported for each independent variable - The null hypothesis is that the parameter provides no information to the model so that the β (parameter) value for that parameter equals 0

Regression

- Need to think about cause and effect - Finds the best line that predicts Y from X (Line is not the same as the line that predicts X from Y) - Lipid content (X) affected insulin sensitivity (Y)

Nonparametric tests

- Nonparametric tests make no assumption about the population distribution - Most commonly work by ignoring the actual data values and instead analyze only their ranks - Unpaired t test, ANOVA paired test assumes Gaussian distribution - can be defined by parameters and are thus parametric - Nonparametric tests make no assumptions about the distribution of the populations - Simple nonparametric tests rank values from low to high and analyze those ranks (ignoring values)- Ensures test won't be affected by outliers and doesn't assume any particular distribution

Correlation and linear regression are related

- Null for correlation is that there is no correlation between X and Y - Null for linear regression is that a horizontal line is correct - Two nulls are essentially equivalent - P values for correlation and linear regression of this data set are identical

Correlation

- Quantifies the degree to which two variables are related - does not fit a line - Tells you the extent (and direction) that one variable tends to change when the other one does - Don't have to think about cause and effect - Doesn't matter which variable is X or Y

The following assumptions are made when analyzing the results of a t test

- Random (or representative) samples - Independent observations - Accurate data - The values in the population are distributed in a Gaussian manner - *Two populations have the same standard deviation, even if their means are distinct

Assumptions of survival analysis

- Random (representative) sample - Independent subjects - Entry criteria are consistent - End point defined consistently - Starting time clearly defined - Censoring unrelated to survival - Average survival doesn't change during the study

Robust statistics

- Rather than eliminate outliers, some statistical methods are designed so they have little effect - Methods of data analysis that are not very affected by outliers are called "robust" - The simplest robust statistic is the median

Fitting model to data

- Regression fits a model to data - Adjusts the values of the parameters in the model to make predictions of the model come as close as possible to the actual data - Regression does not fit data to a model, but rather the model is fit to the data - Models are shoes, data are feet

What's wrong with computing several t tests?

- Run into the problem with multiple comparisons - the more groups compared, the greater the chance of observing one or more significant p values by chance - If the null was true - no difference between the three groups - there would be a 5% chance that each t test would yield a significant p value - With three or more comparisons, the chance that any one (or more) will be significant would be far higher than 5%

clinical studies

- Sample of patients studied is rarely random sample of the larger population - Patients are representative of other similar patients - Extrapolation is still useful - Room for disagreement about the precise definition of population - Though population may be defined vaguely, still use the data to make conclusions about a larger group

CI size (width) depends on

- Sample size - the larger the sample size, the smaller the CI because the mean is more certain - Size of the SD - the larger the standard deviation, the larger the CI because the mean is less certain - Degree of confidence you want (e.g. 95%, 90%) - the higher the degree of confidence, the larger the CI ***The CI has to be larger if you want to be 99% certain that the true mean lies within it

Bonferroni correction

- Simplest way to achieve a family wise error rate is to divide the value of α by the number of comparisons - Then define any of the comparisons as statistically significant only when its p value is less than that ratio - If an experiment makes 20 comparisons, the previous graph shows a 65% chance of obtaining one or more statistically significant results - If the Bonferroni correction is used, a result is only declared statistically significant when its p value is less than 0.05/20 = 0.0025 - This increases risk of Type II error --> more difficult to find significance

Cross-sectional study

- Single sample of subjects is selected without regard to either the disease or the risk factor - Divided into two groups based on previous exposure to the risk factor - Prevalence of disease is compared between the two groups

Testing for Normality

- Statistical tests can be used to quantify how much a data set deviates from the expectations of a Gaussian distribution - These tests are called normality tests - First step is to quantify how far a set of values differs from the predictions of a Gaussian distribution - Many complex tests with complex names

Multivariate

- Term is used inconsistently - Sometimes refers to methods that simultaneously compare several outcomes at once (correct) - Sometimes refers to the methods used when there is one outcome and several independent variables - really multivariable

Ideal Gaussian Distribution

- The horizontal axis shows various values that can be observed - The vertical axis quantifies their relative frequency - The mean is the center of the Gaussian distribution - Distribution is symmetrical so the mean and median are the same *SD is a measure of the spread or width of the distribution *The area under the curve represents the entire population

P value

- The p value is a numeric representation of the degree to which random variation alone could account for the differences observed between groups or data being compared - A study that finds a p value of 0.05 asserts that there is a 5% chance of obtaining a result as extreme or more extreme than the actual observed or measured value by chance alone - The smaller the p value, the stronger the evidence to dispute the null hypothesis - Any difference observed in the study is more likely to be real, rather than due to chance alone *alpha= significance level (usually 0.05)

Standard Error of the Mean (SEM)

- The ratio of the SD divided by the square root of the sample size is called the standard error of the mean (SEM) - SEM = SD/√n - SEM does not directly quantify scatter or variability - Can be small even when the SD is large - The larger the sample size, the smaller the value of the SEM * Reduces close to 0 with very large sample size - little doubt as to the mean of the population

The two-tailed p are used more often than one-tail p values for the following reasons

- The relationship between p values and confidence intervals is more straightforward with two-tailed p values - Two-tailed p values are larger (more conservative)- Since many experiments do not completely comply with all the assumptions on which the statistical calculations are based, many p values are smaller than they ought to be. Using the larger two-tailed p value partially corrects for this - Some tests compare three or more groups, which makes the concept of tails inappropriate (more precisely, the p value has more than two tails). A two-tailed P value is more consistent with p values reported by these tests

what defines *how confident you are that the true population mean falls within a given range of values*

confidence interval

CI of a mean is computed from 4 values

- The sample mean- Best estimate of the population mean. CI is centered on the sample mean - The standard deviation- If the data are widely scattered (large SD) the sample mean is likely to be farther from the population mean. If the data are tight (small SD) the sample mean is likely to be closer to the population mean. Width of CI is proportional to the sample SD - Sample size- In a larger sample the sample mean is likely to be quite close to the population mean. In a small sample the sample mean is likely to be further from the population mean. CI is inversely proportional to the square root of the sample size - Degree of confidence- If you wish to have more confidence, a wider interval must be established

Assumptions of the One-Way ANOVA

- The samples are randomly selected (or at least representative) - The observations within each sample were obtained independently - The data were sampled from populations that approximate Gaussian distribution - The SD of all populations are identical **Same assumptions as the unpaired t test

Multiple regression methods are used for several purposes

- They assess the impact of one variable, while adjusting for others - To create an equation for making useful predictions - To understand scientifically how various variables might impact an outcome

CI versus confidence limits

- Two ends of the CI are called the confidence limits - the limits are a value - CI extends from one limit to the other - it is a range

Case-control study

- Two groups of subjects are selected - One has the disease or condition (cases) - The other does not have the disease or condition, but are selected to be similar in many ways (controls) - Investigators look back to determine possible risk factors

Prospective study (longitudinal study)

- Two groups of subjects are selected - one with the exposure and the other without - Observes over time to determine the incidence rates in the two groups

The Standard Normal Distribution

- When the mean equals 0 and the SD equals 1.0 the Gaussian distribution curve is called a standard normal curve - All Gaussian distributions can be converted to a standard normal distribution - z = (Value - Mean)/SD - Variable "z" is the number of SD away from the mean - The next table shows the fraction of a normal distribution between -z and +z for values of z

independent variable

- experimental or predictor variable -*can be manipulated in an experiment* - changed to have an effect on a dependent variable in study looking at efficacy of new analgesic at different doses the IV would be *the doses of the medications*

standard deviation

- variation among values expressed in the same units as the data. -*used to measure the spread of data about the mean* -*larger the SD, the more spread out the distribution of data about the mean*

will a 99% or 90% CI have a wider range? (know)

-*99% CI will have a wider range*; if you are that confident you need to give yourself more wiggle room -if your CI is 90% you can have a narrower range since you are giving yourself room to be wrong

when is logistic regression used?

-*describes relationship between one outcome variable and 1 or more exposure variables (independent variables)* -*outcome variable is always binary (only two outcomes) and exposure variables can be binary or continuous* -ex: outcome variable is obese (binary you are either obese or not obese) and exposure variables are -logistic regression computes an odds ratio for each independent variable age (continuous) and smoking status (binary)

pitfalls of the p value

-*does not convey information regarding the size of the observed effect* (small effect in study w/ large sample size can have the same p value as a large effect in a small study) -the more variable or endpoints in a study, the more likely one of them will come up statistically significant by chance alone (multiple comparisons error)

What happens if you set the p value (alpha) lower than 0.05?

-*you will be less likely to make a type I error* and say that there is a difference when there actually is not (false positive) HOWEVER -*you will be more likely to make more type II errors* and say that there is no difference when there actually is (false negative) if you want to set a very low p value you need to have *great effect value* meaning the difference between the groups is so large that it will be easy to find

SD vs SEM (know)

-SD is a measure of the *spread* of data -SEM is measuring how *well you know the true population mean*; *depends on the sample size (N) and SD*

which studies use odds ratio and which studies use relative risk? (know)

-case control or retrospective use odds ratio -prospective studies use relative risk

*tests used to describe one sample*; NOT running any sort of hypotheses here; just describing what you find(assuming normally distributed data)

-frequency distribution (just count how many times you get same value over and over) -sample *mean* -minimum and maximum value and range -25th and 75th percentile - SD

if you run a one-way ANOVA test and get a p value of 0.03 (there is a statistical difference between the groups somewhere), what do you do next to find out exactly what pairs are statistically distinguishable?

-multiple comparison tests

One-Way ANOVA

One-Way ANOVA compares the means of three or more groups assuming that all values are sampled from Gaussian populations *Compares all the groups at once!

simple vs multiple linear regression

-multiple linear regression is *used when there are two or more X variables* -looking at how one variable is influenced by several other variables ex: lead decreases kidney function; kidney function decreases with age; most people accumulate small amts of lead as they get older can accumulation of lead explain some of the decrease in kidney function with aging?

null hypothesis with correlation and linear regression

-null for correlation is that there is no correlation between X and Y -null for linear regression is that a horizontal line is correct (no correlation) --> horizontal with a slope of 0. p values would be identical

probability vs statistics

-probability *starts with the general case (population or model)* and then *predicts what would happen in many samples* (general to specific, population to sample, model to data) -statistics works in opposite direction; *start with one set of data (sample) and make inferences about the overall population or model* (specific to general, sample to population, data to model)

relative risk vs absolute risk reduction (attributable risk)

-relative risk is a ratio between two proportions (progression in tx group/progression in placebo group -absolute risk reduction is a difference between the % of progression in both groups (placebo group-treatment group)

to interpret the CI of a mean what assumptions must be accepted?

-that it is a random/representative sample (no convenience sampling) -independent observations were made (all subjects sampled from the same population and selected independently of the others) -accurate data -assessing an event you really care about -population is distributed in gaussian manner

after running a retrospective study on people with cholera and whether they were vaccinated or not you get an odds ratio of 0.25. 95% CI is 0.12 to 0.54. what does this mean and is it significant?

-this means that people who are vaccinated are 25% as likely to get cholera as unvaccinated people; this means people with the vaccine are protected; their odds of getting cholera are much less -this is significant because the CI does not include 1 -in vaccination studies can subtract 1-0.25=0.75 and say that vaccine is 75% effective in preventing cholera

two tailed vs one tailed test

-two tailed test includes both sides of the gaussian curve; better to use this if you are not sure which way the values are going to go (either group can have the larger mean) -one tailed test only includes one side of the gaussian curve; when using a one tailed test you usually have an idea of which values will be higher and which values will be lower; *you must predict which group with have the larger mean before collecting any data*

What number are we looking for with Attributable risk?

0 - If the confidence interval contains 0, then the data is not statistically significant.

When looking at the relative risk, what number are we looking for to determine significance?

1 - If the confidence interval includes 1, then the data is not statistically significant. *Anything above 1 is higher risk!

1. if a 95% CI *does not contain the value of the null hypothesis*, then the results will ___a.___ with a p value __b.____ 2. if a 95% CI *does contain the value of the null hypothesis*then the results will ___c.___ with a p value __d.___

1. a. be statistically significant b. <0.05 2. c. not be statistically significant (null would be true meaning no difference between groups) d. >0.05

1. 1 SD out from the mean ____ individuals should fall within these values 2. 2 SD out from the mean ____ individuals should fall within these values 3. 3 SD out from the mean ____ individuals should fall within these values

1. *68%* of data points should be 1 SD above or below the mean 2. *95%* of data points should be 2 SD above or below the mean 3. *99%* of the data points should be 3 SD above or below the mean

1. if looking at the *difference* between two means what within the CI will tell you if there is significance or not? 2. if looking at *two proportions* between groups what within the CI will tell you if there is significance or not?

1. *zero* within the CI tells you there is no significance (if the CI includes zero, this means that the two groups were the exact same so subtracting them gives you zero) 2. *one* within the CI tells you there is no significance (if the CI includes one this means that the groups are the same because dividing them gives you one) wouldn't even need to look at p values in this case

1. what correlation coefficient will you see if two variables do not vary together at all? 2. what correlation coefficient will you see if two varibales tend to increase or decrease together? 3. what correlation coefficient will you see if two variables are inversely related (one goes up and the other goes down)

1. 0 (will see a horizontal line); as values get closer to 0 will see more scatter 2. positive (will see a line going up to the right 3. negative (will see a line going down to the down to the right)

1. low p value from an outlier test means 2. high p value from an outlier test means

1. a *small p value* allows you to conclude that the *outlier is not from the same distribution as the other values*-->reject the null hypotheses that data is sampled from Gaussian and *switch to non-parametric* 2. a high p value means that there is *no evidence that the extreme value came from a different distribution* than the rest-->cannot reject the null-->*use parametric tests*

1. *non-parametric* test to compare *two unpaired groups* 2. *non-parametric* test to compare *two paired groups*

1. mann-whitney 2. Wilcoxon

1. with what test are you more likely to make a type 1 error? 2. with what test are you more likely to make a type 2 error?

1. one-tailed p value test (you would end up rejecting the null hypothesis when there is actually no difference between groups) 2. two-tailed p value test (you would end up retaining a false null hypothesis and saying there is no difference when there actually is) *it is usually better to make a type II error than it is a type I*

horse runs 100 races and wins 25 times. what is the probability vs odds ratio

1. probability is the fraction of times you expect to see that event in many trials -probability is 25/100=.25 2. odds ratio is the probability of the event happening/by the probability of the event not happening -odds ratio= 0.25/1-0.25=0.25/0.75= 0.333 so 1 win to 3 loses

with normally distributed data: 1. what test is best to use when explaining or *predicting one variable from another* 2. what test is best to use when explaining or predicting one variable from *several others*

1. simple linear regression or simple nonlinear regression 2. multiple linear regression or multiple nonlinear regression

1. the larger the sample size, the _____ the CI 2. the larger the SD, the ____ the CI 3. the higher the degree of confidence, the ____ the CI

1. smaller (more narrow) -can be more certain of the mean since sample would be closer to actual population; if sample size increases by factor of 4, CI is expected to narrow by factor of 2 (inversely proportional to the sq root of the sample size) 2. larger -with more variation, the mean is less certain 3. larger (wider) -the CI has to be larger if you want to be 99% certain that the true mean lies within it

1. can mean or median be negative or equal 0 2. can SD be negative or equal 0

1. yes 2. SD can not be negative; it can be 0 if all the values are the same

if you measured the average weight within a certain community and found that the mean weight was 50 kg with a SD of 7kg, 95% of the population should weigh how much?

36 to 64 kg (2 SD above and below the mean)

Different kinds of multiple regression

A family of methods with the specific type of regression used depending on what kind of outcome was measured - can all be "multiple" - The models all predict an outcome, called the dependent variable (Y), from one or more predictors, called the independent variables.

Model

A mathematical model is an equation or set of equations that describe, represent, or approximate a physical, chemical, or biological state or process - Fitting models to data and simulating data from models

How do the parameters affect the model?

A useful comparison must take into account the number of parameters fit by each model - A model with too few parameters won't fit the sample data well - A model with too many parameters will fit the sample data well, but the CIs of the parameters will be wide

Advantages and challenges of Case-control studies

Advantages of case-control studies: - Can be done relatively quickly with a relatively small sample size from previously recorded data Challenge with case-control studies: - To pick the right controls - Idea is to control for extraneous factors that might confuse the results, but not control away the effects sought How do you know when to accept the results? - Results are more likely true when the odds ratio is large, when the results are repeated, and when the results make sense biologically

What is an example of a confounder?

Age -Anything that may increase the risk

What is the difference between a p value and alpha?

Alpha is set in stone (0.05 usually) and the p value will differ from one sample to the next.

Risk

Balancing the benefits of treatment plans against potential liabilities. **NEED to know the denominator- the number of people at risk!!

What happens to the CI when you add more groups?

CI between means become wider

If I wanted to shrink a confidence interval, what could I do?

Can increase sample size, decrease standard deviation or decrease degree of confidence (go from 95 to 90)!!

Conventional approach to sample size

Choose a sample size (power calculation), collect data and then analyze- no adjustments

Ad hoc approach to sample size

Collect and analyze some data, if CIs are not as narrow as you like or the results are not statistically significant, collect more data and reanalyze - P values and CIs cannot be interpreted with this method - not valid **Ad hoc is NOT recommended!!

Unpaired t test

Compares the means of two groups, assuming data were sampled from a Gaussian population. - Type of Parametric test!!

Paired t test

Compares two matched or paired groups when the outcome is continuous. Experiments are often designed so that the same patients or experimental preparations are measured before and after an intervention - Data should not be analyzed with an unpaired t test or a nonparametric test - Unpaired tests do not distinguish variability among subjects from differences caused by treatment

All normality tests compute what?

Compute a p value that answers the following question: - If you randomly sample from a Gaussian population, what is the probability of obtaining a sample that deviates from a Gaussian distribution as much (or more so) as this sample does?

What is the correlation coefficient?

Direction and magnitude of linear correlation can be quantified with a correlation coefficient (r) - Values range from -1 to 1 - If the correlation coefficient is 0, then the 2 variables do not vary together at all. - If + the variables tend to increase or decrease together. - If negative the 2 variables are inversely related. - If values = 1, the 2 variables are the same.

Why would you not want to do multiple t tests?

Doing multiple two-sample t-tests would result in an increased chance of committing a type I error - ANOVAs are useful in comparing two, three, or more means

Multiple linear regression

Finds the linear equation that best predicts Y from multiple independent variables

best test for comparing *binomial variables* (yes/no) in *unmatched group* (know)

Fisher's exact test -tells you if there is a difference in the proportions between the yes and no answers; related to chi square test if the binomial groups were matched/paired use Mcnemar's

What can be predicted in a linear regression?

For any value of X, the model can predict the Y value. - Works if assumption is made that the relationship between X and Y is linear within a defined range of X values.

What does it mean if a p value is large in a Chi-square goodness-of-fit test?

If the p value is large, the observed distribution does not deviate from the theoretical distribution any more than expected by chance.

Two-tailed test

In general a test is called two-tailed if the null hypothesis is rejected for values of the test statistic falling into either tail of its sampling distribution *This is when you do not know which direction it will go. Ex: don't know if the cancer treatment will make the patient better or worse. **2 tail test --> have to split that 0.05 P value between the 2 sides, so 0.025 on each side (more conservative, need more extreme results) --> less likely to reject a false null hypothesis

Type II error

Incorrectly retaining a false null hypothesis ***False negative*** - There was a difference, but did not find it.

Multiple subgroups is a type of what?

It is a form of multiple comparisons

One-tailed test

It is called one-sided or one-tailed if the null hypothesis is rejected only for values of the test statistic falling into one specified tail of its sampling distribution - When you know what direction it will go in. Ex: hypertension meds will either stay the same or decrease the patient's BP.

Methodology of an article

Labeled "Materials and Methods", "Patients and Methods". Details: - Patient populations studied - Study designs - Data collection techniques - Analytical and evaluative procedures used ***Often skipped, but VERY important!

Why is the median robust and the mean is not?

Median is not affected by outliers, while the mean is very sensitive to outliers.

Interpreting the coefficients in Multiple linear regression

Multiple linear regression models do not distinguish between the X variable(s) you really care about and the other X variable(s) that you are adjusting for (covariates) - Distinction is made when interpreting the results

A study is comparing pain management with one group taking ibuprofen and the other taking acetaminophen. A normality test is done and it says that their p value is 0.06 (above 0.05). They decide to proceed with a Mann-whitney test. Is this appropriate?

No; should be using an unpaired t test! -since they got a high p value this means that they cannot reject the null hypothesis that their data is different from gaussian distribution so they should be using a parametric test; -more specifically an unpaired t test since using two unmatched groups

Multivariate analysis example (physical activity protects women against heart disease)- confounding variables

Physical activity protects women against heart disease. Loads of opportunity for confounding - Age - Other health related factors: Smoking, MVI use, ETOH

What are Descriptive studies?

Record events, observations and activities **Do not provide detailed explanations of the causes of disease **Do not offer evidence needed to evaluate the efficacy of new treatments - Example - Kaposi's sarcoma - Can be starting point for more elaborate studies

if you increase your sample size would SD get smaller or larger or stay the same?

SD is not affected by sample size -it only quantifies scatter of the data **SEM would get smaller with a larger sample size

what can be used to construct confidence intervals around the sample mean?

SEM (looking at how close your sample mean is to the mean of the entire population)

The width of the CI depends on what?

Sample size!! - A larger sample size produces a narrower CI - By stating the desired width of the CI, the number of subjects needed can be calculated.

Linear regression

Special meaning to describe the mathematical relationship between model parameters and the outcome

Follow-up design

Start with people who have not yet experienced the outcome - Commonly referred to as cohort- Follow cohort until outcome appears

5 year survival

Survival with cancer is often quantified as 5-year survival - A matter of tradition, somewhat arbitrary.

Simple versus multiple linear regression

Term simple linear regression means that there is only one X variable - multiple linear regression is used when there are two or more X variables

One-way ANOVA is based on the assumption of what?

That all the data are sampled from populations with the same SD, even if their means differ. - Assumptions give the multiple comparisons test more power.

What does it mean if the relative risk is above 1?

That the treatment is worse than the placebo

Origin of the Gaussian Distribution

The Gaussian bell-shape distribution is the basis for much of statistics. Happens because random factors tend of offset each other - Many values end up near the center (the mean) - Fewer values end up farther away from the mean - Very few values end up very far from the mean *Data plotted on a frequency distribution tends to result in a symmetrical, bell-shaped curve *The curve is idealized as the Gaussian distribution

A paired t test looks at what?

The difference in measurements between 2 matched subjects or a measurement made before and after an experimental intervention.

Type I error

The incorrect rejection of the true null hypothesis. ***False positive*** - Rejecting the null hypothesis when there really wasn't a difference

When approaching an article which section should you concentrate on?

The methods section! - Look to see if they are excluding anyone from the study

Null Hypothesis

The null hypothesis states that there is no difference in population parameters among the groups being compared and that any observed differences are simply a result of random variation in the data rather than a result of actual disparity in the data itself

Relative risk

The ratio between two proportions- progression in the treatment group/progression in the placebo group. - A relative risk between 0 and 1 means the risk decreases with treatment!

What is Relative risk?

a measure of the strength of association between a particular exposure (risk factor) or intervention and an outcome - Larger the risk or rate ratio, the stronger the association

Wilcoxon's test

a nonparametric test that compares two paired groups- tests the null that there is no difference. - Assumes that the pairs are random and independent- no assumption of Gaussian distribution.

Mann-Whitney test

a nonparametric test to compare two unpaired groups to compute a p value. - Ranks all the values without paying attention to which of the 2 groups the values come from. Sum the ranks in each group. Calculate the mean rank of each group. Calculate p value.

(R)2

The square of the correlation coefficient - The fraction of the variance shared between 2 variables. - Always between 0 and 1 **Also called Coefficient of determination!

Tail values

The test is named after the "tail" of data under the far left and far right of a bell-shaped normal data distribution - The terminology is extended to tests relating to distributions other than normal *Both one- and two-tail p values are based on the same null hypothesis.

Two-Way ANOVA

Two-way ANOVA simultaneously tests three null hypotheses and computes 3 p values. Interaction- the null is that there is no interaction between the 2 factors. - First factor - the null is that the population means are identical for each category of the first factor - Second factor - the null is that the population means are identical for each category of the second factor

Repeated-measures ANOVA

Use repeated-measures ANOVA to analyze data collected in 3 kinds of experiments - Measurements are made repeatedly in each subject - Subjects are recruited as matched sets and each subject in the set receives a different intervention - A laboratory experiment is run several times, each time with several treatments handled in parallel

Logistic regression

Used when there are 2 possible outcomes

Familywise error rate

When each comparison is made individually without any correction for multiple comparisons, the traditional 5% significance level applies to each - per-comparison error rate - Chance that random sampling would lead this particular comparison to an incorrect conclusion that the difference is statically significant when this particular null hypothesis is true - More comparisons = larger chances for Type I error

Actuarial method

X-axis is divided up into regular intervals - Creating a survival table.

What does a One-way ANOVA test compute?

a single p value testing the null hypothesis that all groups were sampled from populations with identical means. - Multiple comparison tests after ANOVA allows for digging deeper to see which pairs of groups are statistically distinguishable

What is a confounder?

an extraneous variable that correlates (positively or negatively) with both the dependent variable and the independent variable. Ex: Age

when using normally distributed data: what test is best to use when comparing three or more groups

anova! -one-way anova if comparing three or more *unmatched/unpaired* groups -repeated-measure anova followed by multiple comparisons tests if comparing three or more *matched/paired* groups

if an experiment wants to make 20 comparisons, what should be done in order to avoid a multiple comparisons error?

apply the family wise error rate by performing the bonferroni correction 0.05/20=0.0025 so a result is only declared significant if it has a p value less than 0.0025 -*this decreases risk of type I error but increases risk of type II since it will be more difficult to find a difference if one exists*

Binomial (Dichotomous) variables

binomial variable (dichotomous variable): more specific nominal variable; categorical outcomes with *only two distinct possible outcomes* - No consideration of order or magnitude - Usually not numeric (gender, yes/no)

graph that uses horizontal lines to mark the median of each group and represents quartiles

box-and-whisker plot (larger the box, the more variable) -middle line of box= 50th percentile (median of entire set) -bottom line of box=1st quartile (median of lower half of the set; 25th percentile) -top line of the box= 3rd quartile (median of upper half of the set; 75th percentile) *if 25th and 75th percentiles are very close to your mean tells you there's less variation* **can extend lines out to represent 5th and 95th percentiles or plot highest and lowest values

how can you remove the influence of outliers from the mean?

by using a *trimmed mean* -ignores the highest and lowest values (usually a percentage) - Removes influence of outliers.

a study looking at whether people with cholera are less likely to have been vaccinated than those who did not get cholera is what kind of study?

case control (retrospective) -disease is cholera and looking back in time to see if people with cholera did not have the vaccine -null=there is no difference in vaccination rates between people who did and did not have cholera

which type of study usually has more controls compared to cases?

case control studies -this is in order to make sure you have a good comparisons; accounting for outliers

3 types of observational studies

case-control (begin with the outcome; subject already has outcome of interest and looking back in time at risk factors) follow-up/cohort (following specific cohort until outcome appears) cross-sectional

a *small SEM* suggests that the sample mean is _____ to the population mean

close! *SEM is always smaller than SD (SD is always divided by square root of n)

small SD means more people are ____ to the average

closer! this means there is less variation from the mean

Analysis of variance (ANOVA)

collection of statistical models in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation - In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are all equal (Generalizes t-test to more than two groups)

Interval variable

continuous spaced with equal intervals or distances; the zero point is not considered meaningful (example: IQ) - A difference (interval) means the same thing all the way along the scale, no matter where you start (example: 1°C) - Computing the difference between two values can make sense with interval variables (90°C versus 80°C ) - Calculating the ratio of two variables is not helpful - definition of zero is arbitrary *0°C is not "no temperature" *100°C is not twice as hot at 50°C

when looking at women who are physically active and comparing it to their risk for coronary events; they perform an age adjusted variate. What is this an example of?

controlling for a confounding factor -if they had 18 year olds compared to 65 year olds obviously the risk would be lower in 18 year olds and their physical activity would probably be higher also -controlling age keeps the data from getting skewed

a "slice in time" or prevalence survey is what kind of study?

cross-sectional -*looking at single point in time for prevalence to see if disease is present based off risk factors that are already there* -starting at a certain point in time and looking for prevalence of disease -ex: looking at the number of people who have a cold right now in class

as power increases, chance of making a type II error _____

decrease! - probability of type II error occurring is called the false negative rate (Beta) - power=1-B (aka sensitivity)

probability

desire for statistical calculations to yield definite conclusions -all statistics can do is report probabilities -fraction of times you expect to see that event in many trials (ranges 0-1)

error and bias

error: refers to *variability* bias: caused by any factor that consistently alters the results not just preconceived notions of the experimenter - Biased measurements tend to result from systematic errors.

Informal sequential approach

experiment continues when the result is not significant and stops when it reaches significance. Should not be used- invalid.

explanatory studies include

experimental and observational -observational: group people up and watch what happens -experimental: making an intervention; doing something with the groups and seeing what happens

experimental study vs prospective study

experimental: -single sample selected and randomly divided into two groups -each group gets a different treatment (or no treatment) -*researchers control who is exposed and not exposed* prospective: -groups are selected based off if they have already been exposed or not; just lumping people together based off of exposure

Ordinal variable

express rank and order matters (though not the exact value) (pain scale, level of education, restaurant ratings 1-5 stars) - Intervals between values may not be equal (e.g., Poor, fair, good, very good)

if you are following smokers to see how many of them develop COPD this is an example of what kind of study?

follow-up/cohort study -you are following a specific cohort until the outcome appears -this is a type of prospective design

when would you use a wilcoxon's rank-sum test?

for non-parametric data -when you just want to make an *inference about one population*

if you get a p value of 0.02 (with your significance level being 0.05) this means

if the null hypothesis were true (there is no difference between groups), the probability of getting your result by just chance alone is about 2%--> it is likely not due to just chance alone so you can reject the null hypothesis in this case

the width of the CI is proportional to the sample SD meaning that

if you have a larger SD (more variability) then your CI gets wider

is it better to have higher or lower NNT?

lower! the less people you have to see in order to see a benefit, the more effective the medication/treatment

are results that fall within the CI range statistically significant?

no -95% CI says that you are 95% confident that the population value will fit within the given range; if the null hypothesis were true then your CI range would include the experimental result-->no significant difference between the groups then (avg body temp example when CI did not include 37*C, there was a significant difference) *any result outside of the CI range is considered significant*

does CI of a mean quantify variability?

no -CI are not the spread of values like SD -depends on the spread of values AND sample size -if CI is 95%: this means 95% of the time you expect the population mean to be within the CI

Non-parametric tests are for

non-normal distributions. - Use non-parametric tests when you get a small p value during a normality test. - Presence of 1 or a few outliers might be causing the normality test to fail- need to run an outlier test.

are parametric or non-parametric tests more robust?

non-parametric tests are for non-normal distributed data and *are more robust* -these tests are more conservative-->*less likely to make a type I error but also more likely to make a type II error*

Parametric tests are for

normal distributions

if a graph does not have a "bell shape" distribution does this mean it is not gaussian?

not necessarily -*ideal gaussian distribution includes very low negative numbers and super high positive values* (outliers) -in science you often don't have these extremes -in the picture all the data is normally distributed but *you rarely see bell shaped curves unless the sample size is enormous*

if a study of acupuncture for osteoarthritis showed that pain decreased in most patients does this mean that the acupuncture worked?

not necessarily; we tend to ignore alternative explanations -placebo effect -patients want to be polite -other changes in tx (ASA, exercise) -subject with worsening pain may be excluded for variety of reasons -pain from osteoarthritis varies day to day

what is the odds ratio and what type of study should it be used in?

odds ratio= *the probability that the event will occur divided by the probability that the event will not occur* -To convert probability to odds, divide probability by 1-probability *used in case control studies*

statistical test for normally distributed data that tells us whether or not the means of several groups are all equal; generalizes t-test to more than two groups

one-way ANOVA -best for comparing two,three, or more means; tells you if there is at least some difference between some of the groups ex: looking at LH hormone levels in non-runners, recreational runners, and elite runners

the difference between unpaired and paired t tests is the same for _____ and ____

one-way and repeated-measure ANOVA -ANOVA is just looking at three or more groups; T test only look at two groups

when can using the mode be useful?

only variables that can be expressed as integers or whole numbers - does not always assess the center of a distribution - Not useful with continuous variables assessed to at least several digits of accuracy- each will be unique

(assuming normally distributed data) what kind of test would be appropriate if you were doing a *pre and post test in the same person or a test on siblings, subjects recruited as pairs, or a mom and child*?

paired t test -best when wanting to show that there is some type of relationship between the two groups being tested

if you are running a study that's trying to rate the most accurate way to monitor temperature (oral vs rectal) or the best way to measure BP (auscultation or invasive) what would be the best test (assuming this is normally distributed data)

paired t test -comparing matched groups meaning the values are measured in the same patient

censoring survival data

people are taken out of calculations for certain reasons and not included in the final results (they move, they stop following up, they survive past the study period, etc) -important to talk about this in your study because if you started with 80 people but only did calculations on 65 this would look suspicious -if you censor certain subjects it is important to note how long they were actually in the study for

what tells researchers how many people they need to include in their study to find the difference they are looking for?

power! -a way to calculate minimum sample size required to detect an effect of a given size -if a study says they were "under powered" this means they didn't have enough people to find a difference if one should have been there

political polls

random sample of voters (sample) is polled -results used to make conclusions about entire population of voters

when measuring body weight of individuals in a weight loss program this is a ____ variable

ratio! -0 is meaningful in that you can not have 0 body weight or negative body weight -equal intervals still other examples include: temperature in Kelvin, mass, distance, blood sugar

Continuous variable

represent data capable of possessing any value in a given range (BP, temperature, weight)

Attributable risk

risk of taking placebo in the HIV study

what kind of tests work by ignoring the actual data values and instead analyze only their ranks?

simple non-parametric tests -these tests make no assumption about the population distribution -rank values from low to high and analyze those rankings

what is used to *identify how close our sample is approximated to the sample at large*? (how close you are to the population average)

standard error of the mean (SEM) -the larger the sample size, the smaller SEM; less error the closer you are to the actual population size

a study looked at the effectiveness of zodovudine (AZT) in treating asymptomatic people with HIV. They randomized the groups and had one receive the medication and one receive a placebo. They wanted to know if treatment with AZT reduces progression of the disease. disease progressed in 16% of the patients receiving AZT and the disease progressed in 28% of the patients receiving the placebo. What is the attributable risk/absolute risk reduction?

the attributable risk/absolute risk reduction is *28%-16%=12%* with a 95% CI of 6.7-17.3% and p <0.001 we can say that if we extrapolate this out to the population of HIV patients, they are 95% confident that it will reduce the progression of the disease between 6.7% and 17.3% -since this is a difference would not want 0 in the CI-->would mean that the groups are the exact same-->no significance

what does it mean if you get a high p value from your normality test

the data are not inconsistent with Gaussian distribution -cannot reject the null; assume that your data is following gaussian distribution -this does NOT prove that data were sampled from Gaussian distribution it just demonstrates that deviation from Gaussian is not more than you would expect to see with just chance alone

What does it mean if the p value from the normality test is large?

the data are not inconsistent with Gaussian distribution - cannot reject the null - A normality test cannot prove the data were sampled from a Gaussian distribution - A normality test can demonstrate that the deviation from the Gaussian ideal is not more than you'd expect to see with chance alone - Power of a normality test increases with sample size

Tukey's test

the goal is to compare every mean with every other mean- results include CIs and conclusions about statistical significance. *Multiple comparison test

why is the median robust and the mean is not?

the median is more robust because it's *less sensitive to outliers* so you are *less likely to make a type I error* -non-parametric tests use the median rather than the mean; they can use the median because it's just looking at rank

what does it mean if you get a low p value from your normality test

the null hypothesis is saying that the data are sampled from a Gaussian distribution -*a small p value allows you to reject the null hypothesis and accept that the data are not sampled from Gaussian population* -->switch to non-parametric test now

power of a statistical test

the probability that the *test will reject the null hypothesis when the null hypothesis is false*; *probability of not making a type II error* (making a false negative decision) -*the ability to detect a difference if one should be there*

If the CI for the ratio of 2 proportions does not include 1.0 (the null hypothesis), then

the result must be statistically significant

If the CI for the difference between 2 means does NOT include zero (the null hypothesis), then

the result must be statistically significant (p<0.05)

Linear regression

to "fit the best line" through the graph of data points -wants to determine the most likely values of the parameters that define that model (slope and intercept) -finds the line that best predicts Y from X

Correlation quantifies the association between

two continuous (interval or ratio) variables

using parametric tests on non-normally distributed data puts you at risk for what kind of error?

type I -using these tests on data that is not normally distributed will make it easier for you to find a difference that isn't actually there

definition of significant

unlikely to happen just by chance

what is the best test to do for normally distributed data that is looking at a new medication vs a placebo

unpaired t test -you are comparing two unmatched groups another example of using this test would be taking oral temperatures in patients who take acetaminophen vs patients who take ibuprofen

A Chi-square test is useful for what kind of data?

useful for nominal data looking at how an expected frequency is compared to a theoretical one and if the null is that there's no difference in terms of where my expected value should lie or what the observed values are then I can reject that that P value is small (lot of deviation between expected and what is found).

quality control

using results from a sample and referring it back to entire group ex: factory makes a lot of items (population) but randomly selects a few items to test (sample) *results obtained from sample are used to make inferences about entire population*; sample is expected to be representative of entire population

relative risk example

using the HIV study example -relative risk is the ratio between two proportions; *progression in the treatment group/progression in the placebo group* -16%/28%=0.57 so *subjects treated with AZT were 57% as likely as placebo to have disease progression* -*relative risk between 0 and 1.0 means the risk decreases with treatment* -if relative is 0 then there is no risk of disease progression at all; if it is 1 then this means the groups are the exact same-->no significance (don't want 1 in the CI)

when polling 100 voters before an election, you gather that 33 people would vote for your candidate. If the 95% CI extends from 0.24-0.42 what does this mean?

we are 95% sure that if we extrapolated this out to the entire population, we would be getting somewhere between 24% to 42% of the vote

when are the mean and median very similar in value?

when data is normally distributed (gaussian)

When does Confounding occur?

when factors relate to both the characteristic under scrutiny and the outcome appear as competing explanations. Confounders will skew the data - Ex: Age

when is it appropriate to use a one-tail p value?

when previous data, physical limitations, or common sense tells you that the *difference, if any, can only go in one direction* only choose this test if -you predicted which group will have the larger mean before collecting any data -if the other group ends up having a larger mean you would have attributed that difference to chance and called the difference "not statistically significant"

you perform a hypertension study on two groups and randomly give one group the antihypertensive medication and the other a placebo. Can you use a one-tail p value for this?

yes -you have an idea which values will be higher and lower meaning *you can assume that the group getting the antihypertensive medication will have a lower mean blood pressure compared to the placebo group*

how do you calculate the relative risk for a case-control study?

you can't calculate relative risk from a case-control study because you *can't calculate incidence from this data*; don't have data on the entire population -use odds ratio instead


Related study sets

BIO 141 Chapters 7 & 12 Review Questions

View Set

Anthropology Final Exam: Economy

View Set

Chapter 3: Marine Provinces (questions)

View Set

KNSFHP Healthy Relationships Study 2/3

View Set

Medsurge II - Chapter 25: Nursing Management: Patients With Hepatic and Biliary Disorders

View Set