Epidemiology Exam 1 Chapter 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Types of distribution curves

- Normal Distribution/Gaussian -standard normal -Same mean and different dispersions -Skewed distributions -measures of central tendency - median (better measure of central tendency for skewed distributions)

Graphical presentations

- Used to summarize key aspects of the data set - Types of graphs: - Bar chart - Line graph - Pie chart

parameter

- a variable for describing a population - measurable attribute of a population - designated by the symbol μ, mu

examples of populations

- all of the inhabitants of a country (e.g. China) - all of the people who live in a city (e.g. NY) - all students currently enrolled in a particular university - all of the people diagnosed with a disease such as type 2 diabetes or lung cancer

Nominal scales

- are qualitative and consist of categories that are not ordered (whereas ordered data has categories like best to worst) - include dichotomous scales ex: race, religion, gender, car ownership, cell phone provider

reasons for multimodal distributions of health outcomes:

- changes in lifestyle and immune statues of the host - latency effects latency refers to the time period bw initial exposure and a measurable response ie occurrence of conditions such as chronic diseases that have long latency periods and occur later in life

Rationale for using samples

- improved parameter estimates - possible cost savings

Curvilinear (inverted U-shape) scatterplot

- its possible for scatterplots to conform to nonlinear shapes, like a curved line - in these, the linear association bw X and Y is essentially 0 (-0.09), indicating that there is no linear association - However, nonlinear curves do not imply that there is no relationship bw two variables, ONLY that their relationship is nonlinear

Types of scatter plots

- perfect direct linear association - perfect inverse linear association - no association - positive relationship (r= 0.7 - curvilinear (inverted U-shape)

simple random sampling

- samples are selected by a random process - unbiased - the average of the sample estimates over all possible samples (of size n from N) is equal to the population parameter -example of a mean sample mean μ - population mean mean of all sample means of size n from N the mean of all sample means of size n will be equal to the population mean

Random Sampling

- simple random sampling - stratified random sampling

Symmetrical (non-skewed) distributions

- the mean and median are identical and can be used interchangeably - general rule: the arithmetic mean is preferred over median as a measure of central tendency

stratified random sampling

- uses over-sampling of strata in order to ensure that a sufficient number of individuals from a particular stratum are included in the final sample - can improve parameter estimates for large, complex population, especially when there is substantial variability among subgroups

Ordinal scales

-are categorical data that can be ordered and ranked, but are still qualitative; you are able to assign #'s/ranks even though your data is qualitative - the intervals between each point on the scale are not equal intervals - use bar graphs ex: socioeconomic status, occupational prestige, level of educational attainment, self-perception of health (strongly agree, agree, etc.)

Ratio scale

-most used scale in Epi - the ratio scale has a true zero point, so one can create ratio comparisons - ex: the Kelvin scale: 0 degrees K represents the absence of all heat, therefore we can say 200 degrees K is twice as hot as 100 degrees K

Measures of variation/dispersion/spread

-range -mean deviation -variance and standard deviation

in a generic contingency table

A = exposure is present and disease is present B = exposure is present and disease is absent C = exposure is absent and disease is present D = exposure is absent and disease is absent where A+B+C+D = all study subjects

contingency table

Another method for demonstrating associations A type of table that tabulates data according to two dimensions - there is an exposure variable (like viewing or not viewing alcoholic beverage commercials) - and an outcome variable (like whether study subjects engage in binge drinking) - column and row totals are known as marginal totals

Drawback of simple random sampling

Most large populations in the US and other countries are comprised of numerous subgroups, which an epidemiologist may want to investigate the characteristics of. Unfortunately, when a simple random sample of a large population is selected, members of subgroups of interest may not appear in sufficient numbers in the chosen sample to permit statistical analyses of them.... Stratified random sampling offers a work-around for this problem

parameter estimation

Recall that epidemiologists use statistics to estimate parameters... 2 types of parameters are - point estimate - interval estimate

interval estimate

Uses a range of values for estimation of a parameter in other words, it is defined as a range of values that with a certain level of confidence, contains the parameter ex: one common level of confidence is the 95% confidence interval, meaning that one is 95% certain the confidence interval contains the parameter or value that we are interested in

point estimate

Uses a single value for estimation of a parameter ex: is the use of x-bar (the sample mean) to estimate mu (the corresponding population mean)

Estimation

Using sample-based data to infer conclusions about the population - thus x̄ can be used as an estimate for μ (the pop. mean)

pie chart

a circle or pie - shows the proportion of cases according to several categories - the size of each piece of the pie is proportional to the frequency of cases - the pie chart demonstrates relative importance of each subcategory

population

a collection of people who share common observable characteristics

Epidemic Curve

a graphic plotting of the distribution of cases by time of onset ; its a unimodal curve - there is a baseline mean of cases over 5 years for ex (blue line), which tell you when most cases occurred - aids in identifying the cause of a disease outbreak - helps you understand outbreak and the distribution of cases - so you can prevent it from happening again ex: Foster farms Salmonella outbreak

Sample

a sub-group that has been selected, by using one of several methods, from the population

bar chart

a type of graph that shows the frequency of cases for categories of a discrete variable - height of each bar represents frquency of cases for each category -ex: qualitative, discrete variable such as a Yes/No variable - along base/x-axis of bar chart are categories of the variable: Sex Injection Drug use? Shared needle?

Normal distribution: Standard normal distribution

a type of normal distribution with: a mean of 0 and a standard deviation of 1 unit - is created when you perform the mathematical process of standardizing the distribution, and create the bell curve 68-95-99.7

Mean (x̄)

also called arithmetic mean or average - in distribution curves, the mean is the location on the X-axis

subthreshold phase of dose-response curve

at very beginnning, before threshold on curve - suggests that at low levels of dosage, no or minimal effect occurs

example of parameter

average age of the population

Interval scale

consist of continous data with equal intervals between points on the measurement scale without a true zero point - therefore we cannot calculate ratios ex: IQ Fahrenheit scale does not have a true zero point which is why 100 degrees F is not twice as hot as 50 degrees

qualitative data

do not have numerical values or rankings - measured on a categorical scale ex: marital status, sex, occupation (have no natural ordering)

Distributions with Multimodal Curves

has several peaks in the frequency of a condition

frequency tables

helpful in identifying outliers, extreme values - one of the most convenient ways summarize or display data in a grouped format - after tabulating data in freq. table, an epidemiologist might plot the data graphically as a bar chart, histogram, line graph or pie chart

positive relationship (ex r = 0.7) between variables scatterplot

if the relationship is fairly strong with r =0.7, then the points are going to be very close together and almost form a straight line - if an oval were to be drawn around the points, the oval would be cigar-shaped

68-95-99.7 rule (empirical rule)

in a normal distribution model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean

sampling bias in non-random sampling:

individuals who have been selected are not representative of the population to which epidemiologists would like to generalize the results of the research. (often times, only people who are interested in the survey topic respond to the survey)

when r is negative, the association is

inverse, meaning that the value of one variable will increase if the value of the other variable decreases

Distribution Curves

is a graph that is constructed from the frequencies of the values of a variable - can take various forms. like symmetric and non-symmetric (skewed) - are described in terms of central tendency (mean, median, mode and dispersion/spread (SD, range, percentile, quartiles)

Standard Deviation of a sample (s)

is a measure of the dispersion (spread) - measure of how much your data varies from the mean to determine the standard deviation of a sample, you take the square root of the variance

Normal (Gaussian) Distribution

is a symmetrical distribution where the mean, median and mode are identical and fall exactly in the middle of the distribution

dose-response curve

is the plot of dose-response relationship, which is a type of correlative association bw an exposure (like a dose of a toxic chemical) and an effect (like a biologic outcome) ex: dose response relationship bw # of cigs smoked daily and mortality from lung cancer

continuous variable

made up of continuous data - have infinite number of possible values along a continuum - ex: heart rate, blood cholesterol, blood sugar levels, age, height, weight

discrete variable

made up of discrete data - have finite or countable number of values ex: household size (# of ppl who reside in house) # of doctor visits

Pearson correlation coefficient

measure of strength and direction of linear relationship between 2 continuous variables - varies from -1 to 0 to +1

median

middle point of a set of numbers to find: re-write numbers in data set from lowest to highest, middle number is the median; if even data set then average the 2 middle numbers and that is your median

Measures of central tendency (or location)

mode, median mean

Stanley Stevens' measurement scales

nominal, ordinal, interval, ratio in 1946, Stevens wrote that before conducting a data analysis, one should choose an analysis that is appropriate to the scale of measurement being used

N is deginated as the

number in the population

n is designated as the

number in the sample

mode

number occurring most frequently in a set or distribution of numbers aka the category in a frequency distribution that has the highest frequency of cases

Statistics

numbers that describe a sample

Scatter Plot Diagram

plots two variables, one on the X axis (horizontal) and one on the Y axis (vertical) - the measurements for each case or individual subject are plotted as a single data point (dot)

when r is positive, the association is

positive, when one variable increases so does the other

non-random sampling

prone to sampling bias bc of self-selection in internet surveys, media based polling, etc. - convenience sampling - systematic sampling - cluster sampling

examples of simple random sampling

random digit dialing for phone surveys, drawing a name from hat, or lists that include a large diverse population, like licensed drivers

Threshold dose-response curve

refers to the lowest dose at which a particular response may occur

Analyses of Bivariate Associations examines the

relationships between two variables ex: - scatter plots - correlation coefficients - contingency tables

quantitative data

reported as numerical quantities - obtained by counting or taking measurements (ex is measuring patient's height)

μ represents the

sample mean, the average of a population (ex is average age)

Histogram

similar to bar chart, used for continuous variables -used to display the frequency distributions for grouped categories of a continuous variable -coding procedures are applied to convert continuous variables to convert them into categories on the x-axis ages: 15-19 20-24 25-29 30-34

skewed distributions

skewed data are not equally distributed on both sides of the distribution - so its NOT a symmetrical distribution -they are either right or left skewed: determined by the direction that the tail of the distribution is pointing - in these cases most of the data is not necessarily close to the mean as we saw in normal distributions

a stratum is a

subgroup of the population -ex: a population can be stratified by racial or ethnic group, age category or socioeconomic status

Example of n and N:

suppose an epidemiologist wants to study the health characteristics of racial or ethnic subgroups that are uncommon the general population - the size of n is limited by our available budget - if n is small (which is often the case) in comparison to N, then only a few individuals from the minority group will enter the sample.

Remember that an association between two variables signifies ONLY that they are related and NOT

that the association is causal

simple random sampling is unbiased meaning

that the average of the sample estimates over all possible samples is equal to the population parameter

as r gets closer to -1 or +1

the association becomes stronger

Measures of central tendency of a skewed distribution

the mean, median and mode all have different values in a skewed distribution - and the median is a more appropriate measure of central tendency than the mean in a skewed distribution

Range

the range is the difference between the highest and lowest value in a group of numbers highest - lowest = range

cluster sampling

the researcher divides the population into separate groups (rather than individuals), called clusters. Then, a simple random sample of clusters is selected from the population. -a common method for sampling - can produce cost-savings (more parsimonious than random sampling) - creates unbiased estimates of parameters.

the sample estimate of μ (the sample mean)

in a scatter plot, the close the points lie with respect to the straight line of best fit through them (the regression line)...

the stronger the association between variable X and Y are

the closer r gets to 0,

the weaker the association becomes

if r = 0

there is no association

Same mean different distributions

these two have the same mean (ie location on X-axis) and different dispersions

significance of dose-response relationship

this relationship is one of the indicators used to address a causal effect of a suspected exposure associated with an adverse health outcome - ex: is dose response relationship bw # cigs smoked daily and rates of lung cancer mortality this dose-response relationship was one of considerations that led to the conclusion that smoking is a cause of lung cancer mortality

Universe

total set of elements from which a sample is selected

Line graph

used to display trends - points of graph have been joined by a line - a single point represents the frequency of cases for each category of a variable - when using more than one line, the epidemiologist is able to demonstrate comparisons among subgroups ex: time trends

systematic sampling

uses a systematic procedure to select a sample of a fixed size from a sampling frame (a complete list of people who constitute the population) - feasible when a sampling frame such as a list of names is available - but may not be representative of the sampling frame ex: an epidemiologist wants to select a sample of 100 individuals from an alphabetical list that contains 2000 names

convenience sampling

uses available groups selected by an arbitrary and easily performed method. - highly likely to be biased - not appropriate for application of inferential statistics - but can be helpful in descriptive studies and for suggesting add'l research ex: a group of patients who receive medical service from a physician who is treating them for a chronic disease

Variance of a sample (s^2)

variance is the degree of variability in a set of numbers. - it answers the question below in a mathematical way: "How different are the data points from one another?"

right skewed distribution

when most of the data is on the left side - tail of distribution trails off to the right - positively skewed

left-skewed distribution

when most of the data is on the right side - tail of distribution trails off to the left - negatively skewed

linear state of dose-response curve

where an increase in response is proportional to an increase in dose


Kaugnay na mga set ng pag-aaral

Basic Insurance Concepts & Principles

View Set

SMSH-PROJECTS-HALF LIFE II-10/12/23

View Set

Health Insurance Exam- Unit 17 Other Health Plans

View Set

ECON 1150 7.3 PRODUCTION in the Short Run

View Set