PubHlth 223: Biostats Exam 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

modality

# of areas of pronounced density of observations - unimodal: 1 peak - bimodal: 2 peaks - mulitmodal: 2+ peaks - uniform: 0 peaks

variance: why do we use squared deviation in the calculation of varience?

- so that equally above and below the mean contribute equally. so that larger deviation from the mean weigh more heavily

descriptive statistics for numerical data

-central tendency measures: mean and median -spread of data measures: variance, SD, interquartile range

cluster sampling

-challenge: if it is not practical / economical to list all individuals that may be samples. clusters represent grouping that can be enumerated. cluster may not be made of homogenous observations. sample cluster and study all cases within the sample cluster - analysis methods are more complex for accounting for multiple layers of variability.

independent outcomes

-knowing the outcome of one random process provides no useful information about the second -knowing that a coin landed on a head (outcome of first random process) does not provide information to determine the land on second toss--> outcome of two tosses of a coin are independent

stratified sample

-list all possible cases in the population -classify each into a strata -the cases within a stratum are expected to be similar with respect to some underlying characteristic that may relate to the response variable of interest -take a simple random sample from each stratum

simple random sampling

-list all possible cases in the population -randomly select cases to be studied -there is no implied connection between the cases that are selected beyond the fact that they come from the specified population

rules for probability distribution

-the events listed must be disjoint (mutually exclusive) -all possible outcomes must be delineated (listed) -each probability must be between 0 and 1 -the probability must total 1

histogram

-visual of data density for continuous numerical data -higher bars represent where the data are relatively more common -displays shape of the data distribution -need to carefully consider "bin width"

what are 3 key decisions in hypothesis testing

1. null hypothesis 2. alternative hypothesis 3. the type 1 error you are willing to tolerate (how low does p-value have to be to conclude null should be rejection.. usually 5% or p<0.05

what are the two ways to use normal distribution?

1. uses related to specific values that may be observed - standardized observed data with normal distribution 2. uses related to summary statistics - determine probability of an observed result with normal distribution

sampling

a feature of observational and experimental studies. accurate interpretation of statistical analyses relies on understanding who got into the study and how -simple random -stratified sample -cluster sample -multi-stage sampling

random process

a situation in which we know what outcomes could happen, but we don't know which particular outcome will happen (coin toss, die roll)

explanatory variable

a variable that might cause changes in the response variable

standard error

measure of variability of an estimated statistic (SD of the sampling distribution of a statistic) -a function of both the variability in the data and the sample size -larger sample size= smaller standard error - reflects increasing precision as more information is oberved

disjoint

mutually exclusive

complementary events

mutually exclusive events whos probabilities add up to 1 ex: compliment event for event D (rolling a 2 and a 3) is Dcompliment (rolling a 1, 4, 5 and 6)

robust statistics

not significantly impacted by extreme values/ skewness (outliers) -median and IQR ae more robus to skewness and outliers than mean and SD

descriptive statistics for categorical data

number and percent

what type of variable would GPA be? continuous or numerical

numerical, continuous

how do you determine a histograms normality

over lay it with a normal distribution curve

joint probability

probability of a particular outcome of one random process AND a particular outcome of a second random process (inside contingency table)

Marginal Proability

probability of one random process regardless of another process (the totals on a contingency table)

probability

proportions of times a particular outcome (event) is observed out of all possible outcomes

1-sided hypothesis testing

question is whether a summary statistic is far enough away from a hypothetical value in one specified direction (greater? less?) - ex: is the response rate higher in the treated group?

2-sided hypothesis testing

question is whether summary statistic is different than a hypothetical value -ex: is the response rate different between groups?(allowing for it to be higher or lower)

primary tools for controlling confounding

randomization: ideally distributes a similar population of study participants to each treatment blocking: (stratified randomization) forces distribution of study participants with a particular characteristic to be evenly distributed in treatment group

P-value

represents the probability that the result of a study would be as favorable to the alternate hypothesis as this particular data if the null is true. -Use the information from the p-value to decide if we should reject or not reject the null hypothesis

observational research

retrospective: data collected B events have taken place prosepctive: study individuals and collect information that unfolds

which is larger (mean or median) if left and right skew

right skew: mean> median left skew: mean< median

variance (S^2)

roughly the average squared deviation from the mean

multistage sampling

similar to cluster sampling but there is a second stage of sampling in which cases within the sample cluster are sampled -analysis method are more complex to account for multilayer variability

sample

subset of population that is used to create an estimate of entire population, much more common, calculate statistics that strive to accurately estimate parameters

intensity map

summary of two variables, if one variable is geographic location

central limit theorem

the distribution of many statistics derived from repeated simulated samples will converge on a normal distribution. - as sample size increases, the distribution of sample mean will closely approximate normal distribution -distribution of sample means converge toward the center of the distribution and SD (spread) of sample mean decreases - sample size up, spread down - assume observations in sample are independent and sample is large

z-score

the number of standard deviations the measurement is above or below the age - matched mean bone mineral density

event

the particular outcome of a random process that we want to know the probability of

outcome

the possible results of a random process

statistical inference

the practice of drawing conclusions about a population from a sample of data recognizing that it has been observed in the context of random variation -while a given sample of data may not always lead us to a correct conclusions, statistical inference gives us looks to control and evaluate how often these errors occur

interquartile range

the range between Q1 and Q3: amount of spread observed in the central bolus of data percentiles- value for which a specified percent of observations are below Quartiles- the 3 cut points that delineate quarters of the observation

dot plot

useful for visualizing one numerical value -darker colors represent areas where there are more observations

segmented bar plot

visual representation of 2x2 contingency table

mosaic plot

visual representation of 2x2 contingency table. each box as a width and height corresponding to relative proportion of observation in a particular cell

bar plot

visualize a single categorical variable - can be used for single discrete numerical variable categories in any order along x-axis (unless the variable is ordinal- those stay in order

relative frequency bar plot

visualize a single categorical variable in proportions rather than numbers

stacked dot plot

visualize one variable -higher bars represent areas where there are more observations

pie chart

visualize single categorical data -sections represent promotion of total sample in a particular category -sections are usually sequentially ordered based on proportion

skewness

when a normal distribition has a long tail. skewed to which ever side the tail is on

single process outcome

when only one event is of interest -what is prob of A -what is prob of A or B -addition -disjoint outcomes: mutually exclusive -non-disjoint outcomes: can occur together

what is calculating in a population? what is calculated in a sample?

Population --> parameter sample --> statistic

type 1 error

Rejecting null hypothesis when it is true (H0=true, reject H0)-->

what determines location and what determines spread in normal distribution?

Location determined by mean spread determined by SD

visual summary of categorical data

bar plot, relative frequency bar plot, pie chart

numerical data

can be any numerical value. makes sense to perform mathematical functions on them/ can be placed in ascending or descending order/ would it make sense to subtract or add -continuous numerical: any number is possible within a range (blood pressure, weight) -discrete numerical: integers often naturally ordered. no possible option between variables (number of hospital stays)

categorical data

can be sorted into groups or categories -ordinal categorical: has clear order -regular categorical: groups or levels with no clear order ( blue, orange, purple)

non-disjoint

can happen at the same time ex: sum of two die can be both 2 and even

disjoint

cannot happen at the same time (mutally exclusive outcomes) ex: sume of two die cannot be 2 and 12

would zip code be categorical or numerical?

categorical--> wouldnt make sense to subtract or add zip code

standard deviation

describes how concentrated the data are around the mean -SD is rescaling of the variance back into the scale of the original data - SD is the square root of the variance

sample distribution

distribution of values for a statistic for all possible samples from the same population

population

entire group that researchers are interested in understanding

response variable

expected effect from the explanatory variable

type 2 error

failing to reject a false null hypothesis when Ha is actually true( HA is true, fail to reject H0)

bar graph

for discrete numerical data. also works for categorical -displays shape of data distribution

what is the 68-95-99.7 rule

for normal distribution data -68% falls within 1SD of the mean -95% falls into 2 SD of the mean -99.7% falls within 3 SD of the mean

scatter plot

graphical summary of two variables. each dot is a care with two pieces of information

differences between histogram and bar graph

histogram: continuous numerical variables on X, bins, frequency count on Y, no gaps Bar: discrete numerical variables on X, numerical values on Y, gaps, number with "jump category" instead of bin

purpose of outliers in box plot

identify skew in distribution - may idenetify data collection/entry errors - provide insight into interesting features in daata - provide caution; extreme values may distort out understanding of central tendencies and variability

when is something deemed statistically significant?

if the p-value is less than some prespecified value

dependent

knowing the outcome of one random process provides some information about the outcome of the second random process -knowing that the first card drawn from a deck and not replaces is an ace does provide info for determining a prob of dawing an ace on the second draw

alpha

level of significance in hypothesis testing to determine if H0 can be rejected - our probability of committing type 1 error

PubHlth 223: Biostats Exam 1

Conjuntos de estudio relacionados

Chapter 15

Econ inquizite hw ch 5 review

mkt 300 exam 4

AUT-135 Final Exam

Maternity Test 2 Chapter 23 Application

Season 34 J!Archive (11/15/17-1/22/18)

Physics HW 17

AP ART HISTORY: ch 19 artwork, vocab, and questions

Informational Text Quizlet Practice

American Government Exam 2

MGMT CH16

Exam 3

Wireless Chapter 6 Quiz

APUSH Chapter 18

arm and shoulder 2

modern indian history

Real Estate Chapter 7

Chapter 21 - America and the Great War

Aircraft Performance

Chapter 12