Biostatistics Terminology

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Variance

How far a set of numbers is spread out from the mean; the sum of the squared differences between each value and the man, divided by the number of values minus one.

Construct Validity

How well did it measure a specific construct?

n>30

If n>30 and a continuous variable is NOT approximately normal, parametric tests can be used (with caution). This should be explicitly stated baby the author.

Unstandardized Regression Coefficient (b)

Indicates average change in dependent associated with 1 unit change in independent when controlling for all other variables.

External Validity

Is it generalizable beyond the sample?

Outlier

An observation point that is distant fro other observations.

Sample Size

1. Statistics calculated from small samples are harder to detect a difference; no difference could simply mean that the study was underpowered [n<30] 2. Statistics calculated from large sample sizes are more resistant to chance because more people from the population are represented in the sample. 3. Because most statistical tests are a function of sample size, it is easier to detect statistical difference with a large sample. (BUT there may be no practical significance)

Interquartile range

A measure of the 'middle fifty' in the data set; where the bulk of the values exist.

Power

Ability of a test to detect a difference when a difference exists.

Boneferonni Correction

An approach to reducing the statistical significant level when multiple comparisons are made on the same set of data.

Confidence Interval

An estimate of the population parameter that will contain the population mean a specified proportion of the time, typically either 95% or 99% of he time.

Contingency Table

Frequency distribution of multiple variable represented in a single table. How can you measure the strength of an association between an exposure and a disease? - Relative Risk (RR): most commonly used in cohort studies. - Odds Ratio (OR): most commonly used in case-control studies.

Hypothesis Testing

1. Define alpha (generally 0.05 = 5/100 or 1/20 chance of incorrectly rejecting the null hypothesis. 2. Conduct statistical test (generates a p-value) 3. Compare alpha to p-value generated by the test. P-value < alpha means the results are statistically significant.

Confounding Variable

An extraneous variable that correlates with the dependent variable and at least one independent variable.

Independent Variable

An input, which may be varied or simply observed by the researcher. Sometimes called an experimental or predictor variable.

Proportion

A mathematical value describing the numerator, which is also included in the denominator. A type of ratio where the numerator is included in the denominator (x/x+y)

Dichotomous

A nominal variable le that contains only two categories or levels.

Mann Whitney U

A non-parametric statistical test used in biostatistics to compare differences between two independent groups. It is often used as an alternative to the independent samples t-test when the data do not meet the assumptions of normality or when the sample sizes are small.

Kruskal Wallace

A non-parametric statistical test used in biostatistics to determine whether there are statistically significant differences between the medians of three or more independent groups. It is an extension of the Mann-Whitney U test for more than two groups and serves as a non-parametric alternative to the one-way ANOVA.

Pearson Correlation Coefficient

A number (rp) that represents the strength of the association/correlation between two variables.

Statistic

A number or quantity that is calculated from a sample of data.

Parameter

A number that is calculated from an entire population.

Measure of Central Tendency

A single value that attempts to describe the central position of a set of data.

Correlated

A statistical relationship existing between two variables or datasets that reflects a dependence between the two.

Correlation

A statistical relationship that reflects the association between two variables. NEED TO REPORT BOTH STRENGTH AND DIRECTION. 1. Strength: - range from -1.00 to 1.00 - closer to 1 or -1 = stronger - closer to 0 = weaker 2. Direction - Positive: as x increases, y increases; OR as x decreases, y decreased. - Negative: as x increases, y decreases, OR as x decreases, y increases.

Sample

A subset or group drawn from the population.

Normal Distribution

A symmetric, bell-shaped distribution for a continuous variable; 68% of observations fall within 1 standard deviation of the mean, 95% fall within 2 standard deviations of the mean, and 99.7% fall within 3 standard deviations of the mean.

Frequency Distribution

A table or graph that illustrates how frequently each value appears in the data set.

One Sample T-Test

A test that compares the mean score of a sample to a known value, typically the population mean.

Paired T-Test

A test that examines the difference between paired values in two samples. Common Types: 1. Same individual in both groups. - Matched sample (e.g. diseased sample, healthy sample) - Time-based (e.g. measur3e at time 1, measure at time 2); pre/post measurement is most common. 2. Matched individuals (i.e. for each individual in group A, a near-identical individual method on certain characteristics is place in group B)

Analysis of variance (ANOVA)

A test to determine the probability that the difference in three or more group means happened by chance.

Student T-Test

A test to determine the probability that the difference in two group means happened by chance. Also called the independent samples t-test or two sample t-test.

Categorical Variable

A type of variable that can take on one of a limited, fixed number of possible values, representing different categories or groups. Can be nominal or ordinal (usually)

Measure of dispersion/variation

A value that describes how the data are dispersed around the measure of central tendency, or the extent to which individual values differ from the mean, median, or mode.

Predictor

A variable that is used to predict or explain changes the outcome. Also known as an independent variable or explanatory variable. Can be continuous or categorical.

Proxy Variables

A variable that serves in place of an unobservable or immeasurable variable.

Frequency Measures

Counts, Ratios, Proportions, Prevalence, and Incidence.

Nonparametric

Data for which the probability distribution is unknown or known not to be normal.

Parametric

Data with an underlying normal distribution.

Reliability

Degree to which results and stable and consistent.

Type I Error

Detecting a difference when there isn't one "False Alarm"

Interval

Each value on the scale has a unique meaning, can be rank ordered, and are equally spaced.

Ratio

Each value on the scale has a unique meaning, can be rank ordered, are equally spaced, and has a minimum value of zero. The value obtained when one quantity (x) is divided by another quantity (y).

Standardized Regression Coefficient (beta)

Each variable is transformed to have a mean of 0 and standard deviation of 1 so that the regression coefficients represent how predictive the variable is.

Selection Bias

Error associated with how participants are selected for studies.

Publication Bias

Error associated with not selectively submitting (researchers) and/or publishing research (journals).

Validity

Extent to which a measure actually represents what it claims to measure.

Type II Error

Failing to detect a difference when there is one. "Missing it"

P<alpha

Finding suggest that there is a relationship.

P>alpha

Findings suggest that there does not appear to be a relationship.

Prevalence

Measure of all individuals affected by the condition at a particular time. Proportion of cases in the population at a given time. How widespread a condition is.

Incidence

Measure of new individuals who acquire the condition over a period of time. Rate of occurrences of new cases over a period of time. Risk of contracting a condition.

Power Analysis

Method for determining how large a sample size must be to detect a difference if in face a difference exists.

Nominal

Named categories.

Standard Deviation

On average, how much individual values differ from the mean; the square root of the variance.

Ordinal

Ordered categories.

Outcome

The result or endpoint that is measured in a story or experiment.

Beta

Probability of incorrectly failing to reject the null hypothesis. Is standardized to allow for comparison across variables.

Alpha

Probability of incorrectly rejecting the null hypothesis.

P-value

Probability that the observed statistic occurred by chance. (if the null hypothesis is true) 1. Failure to find a statistical significant (p-value>alpha) means that our observed data can be explained by chance. 2. Do not indicate the importance or magnitude of the finding. A smaller p-value simply means the finding is less likely to have happened by chance.

Specificity

Proportion of negatives that are correctly identified.

Sensitivity

Proportion of positives that are correctly identified.

Interpreting RR and OR with Confidence Intervals

Relative Risk (RR) and Odds Ratio (OR) are typically reported with 95% Confidence Intervals (CI). - Determines statistical significance. If the confidence interval includes the null hypothesis (i.e. 'no difference'), then the results are not statistically significant. - For ratios (RR or OR), the null hypothesis is 1. If the 95% CI for the OR or RR does not contain 1.0, we can conclude that there is a statistically significant association between the exposed and the diseased.

Statistical Control

Separating out the effect of one independent variable from the remaining independent variables to reduce the effect of confounding.

Descriptive Statistics

Statistics that describe the sample without attempting to generalize the results to other groups or the population.

Inferential Statistics

Statistics that infer the likelihood that the results can be generalized to the population. Allows us to make inferences based on the sample that we have studied.

Alternative Hypothesis

The proposed hypothesis or idea, typically that there is a difference, relationship, or effect.

Bias

Systematic error introduced by selecting or encouraging one outcome over others.

Counts

The absolute number of an 'event' or condition. Occurring in a specified area in a specified time period in a specified population.

R-squared

The amount of variance in the data accounted for by a regression model. How close each data point is to the regression line.

Mean

The average value.

Test-retest Reliability

The degree to which test or instrument scores are consistent from one point in time to the next (the test taker and test conditions must be the same at both points in time).

Range

The difference between the largest and smallest value in the data set.

Population

The entire collection of people, animals, cells, or other things from which we collect data.

Recall Bias

The error associated with remembering.

Interrater Reliability

The extent of agreement between two or more raters.

Relative Risk

The likelihood of developing the disease/outcome given that one is exposed divided by the likelihood of developing the disease/outcome given that one is non-exposed.

Probability

The likelihood that an event will occur.

Measure of Disease Association

The magnitude of the effect of an exposure on an outcome.

Median

The middle value.

Mode

The most frequent value.

Kruskal-Wallis One-Way ANOVA

The non-parametric equivalent of the ANOVA.

Wilcoxon Matched-Pair Test

The non-parametric equivalent of the paired t-test.

Mean-Whitney U

The non-parametric equivalent of the student t-test.

Spearman Correlation Coefficient

The nonparametric equivalent (rs) of the Pearson correlation coefficient.

Frequency

The number of times a value appears in the data set.

Odds

The number of ways an event can occur relative to the number of ways an event cannot occur.

Independent

The occurrence of one variable does not influence the probability of another variable.

Odds Ratio

The odds of disease in exposed versus odds of disease in non-exposed.

Null Hypothesis

The opposite of the hypothesis proposed, typically that there is no difference, relationship, or effect.

Dependent Variable

The output, outcome or effect of interest.

Binomial Distribution

The probability distribution for a binomial variable (i.e. a variable that has only two possible values) with fixed probabilities that add up to one.

Multiple Regression

Using two or more independent variables to predict a dependent variable.

Continuous

Variables that can take on any value in a given range; typically interval or ratio variables and sometimes ordinal variables. Can be ratio, interval, or sometimes ordinal.

Discrete

Variables that have a finite number of possible values.

Discrete

Variables that have a finite number of possible values; typically nominal or ordinal variables.

Internal Validity

Was the study designed well and did it limit or control for possible confounders?

Interviewer Bias

When the interviewer influences the participant response during an interview (e.g. giving certain reactions/social cues, presenting questions a certain way).


Set pelajaran terkait

CHAPTER 1: Political Thinking & Political Culture: Becoming a Responsible Citizen

View Set

Marketing Communication 2110-01 Midterm

View Set

MIS Databases Adaptive Learning Questions

View Set