Stats Prelim 1
statistics
a body of principles for designing the process of data collection and making inferences about the population from information in the sample
experiment
a study where a set of conditions under a SPECIFIC PROTOCOL is established to evaluate the implications for response variables -Researcher controls conditions can show causality
observational study
a study where the researcher observes and CANNOT CONTROL the conditions under which the observational units are exposed
what does simple random sample result in
a) every unit of the population has the same probability of being included in the sample b) the units chosen for the sample are chosen independently from one another
double blind experiment
an experiment where neither the subjects nor individuals measuring the repsonse know which treatments were assigned to which subjects
sample
any subset of measurements from the population that are actually collected - data from part of population
measurement/ response bias
bias incurred when a mehtod of observation produces values different from the true value of the obs unit i.e. uncalibrated insturment
selection bias
bias incurred when a sample systematically excludes some part of the population- doesnt represent the population
nonresponse bias
bias incurred when data are not obtained from all observational units selected for the study i.e. self selected sample- people who are most motivated (good or bad) are the ones who write reviews
bar chart is good for what kind of variable (categorical or quantitative)
categorical
types of variables (2)
categorical quantitative
categorical variable
characteristic is a trait that can only be assigned to categories
what are the three alternatives to simple random sampling
cluster sampling stratified sampling systematic sampling
what does relative frequency allow for
comparison of data with different sample sizes
census
data from entire population
frequency distribution
display of the frequency of each category in categorical data
cluster sampling
divide pop of units into distinct subgroups or clusters. the CLUSTERS are randomly sampled and ALL units in the selected cluster are observed
stratified sampling
divide pop of units into distinct subgroups or strata and then sample independently from each strata -sample every group but not everyone in each group
single blind experiment
experiment where subjects do not know which treatment they recieve but the individuals measuring the response know which treatments were assigned to which subjects
what happens if the population is systematically arranged? ie. students standing with friends
leads to selection bias- in order for systematic sampling to be unbiased, population must be in random arrangement
what is an example of selection bias
looking for average level of happiness in ithaca but dont ask people at cornell mail survey for all US people- excludes homeless people
can an observational study show causality
no
convenience sampling
no random mechanism i.e. sampling only thru phone calls- only sampling people with telephones
extraneous variable
not explanatory variable but affects the response variable
effect of marital status on the wellbeing of older adults in china is an example of an observational study or experiment?
observational study
sampling without replacement
once a unit is selected for the sample it may ot be sampled again
what kinds of tests often fall subject ot nonresponse bias
online surveys- only peopel in sample are people that want to fill out the survey
mode
peak of histogram
experimental unit
physical entity or subject exposed to the treatment independently from other units unit to which treatment is applied - treatment condition affects 1 unit independently
example of stratified sampling
pop of units = cornell undergrads strata= freshman, soph, junior, senior, take random sample from each strata
dot plot useful for when:
quantitative data when there are relatively few (less than 20-25) observations
what kind of data are dotpotplots, historgrams, stem and leaf diagram, boxplots good for
quantitative variables
two continuous graphical summary
scatter plot
graphical summary of data with one categorical, one continuous
side by side boxplot
descriptive statistics
the branch of statistics that is concerned with collecting, summarizing and describing data
inferential statistics
the branch of statistics that is concerned with making inferences about the population from data contained in a sample
variables
the characteristics that have been measured or observed
population of units
the collection of units in which we have scientific interest
observational units (cases)
the entity from which we observe and measure characteristics
number of replications
the number of experimental units to which a treatment has been independently applied
sample size (n)
the number of observations in the sample
frequency
the number of occurrences (count) of each category
observational unit
the physical entity from which a response variable is measured, could possibly be a sample from the experiment unit on which response is measured i.e. happiness on a scale of 1-5, observational unit = person
randomization
the random assignment of treatments to experimental units
treatments
the set of circumstances created for the experiment in response to research hypothesis
simple random sample
the simplest way to draw the samples for statistical inference simple random sample of n units is a sample in which every possible set of n units has THE SAME chance of being selected
bias in sampling
the tendency in samples to differ from the population from which they were drawn in a systematic way
response variable
the variable of primary interest in the experiment
explanatory variable
the variables that have values controlled by the experimenter (independent variable)
why repeat treatment
to show that the effect of the treatment is a result of the treatment not the individual who got treated
all the trees in sapsucker the species of all the trees in sapsucker the species of all the trees within 100 feet of sapsucker woods pond what is population unit, population and sample
trees in sapsucker= population unit species of trees in sapsucker= population species of all the trees near pond= sampel
true or false: obs unit can be an experimental unit but not always
true
true or false: you can have different sample sizes in each stratum?
true
all the undergraduate students at cornell the height of all undergraduate students at cornell the height of undergraduate students taking ILR Stats what is the population of units, what is the population and what is the sample
undergrads at cornell = population of units height of undergrads= population height of undergrads in ILRST= sample
blocking
using extraneous variables to create groups (blocks) that are similar- all experimental conditions (treatments) are then tried in each block - accounts for different conditions in each block (extraneous variable recognized) ie. dif types of soil - how does treatment do in wet block - how does treatment do in dry block
quantitative variable
variable that is naturally numeric and for which arithmetic operations make sense
confounding variables
variables whose effects cant be distinguished from one anotehr
population
(statistical) the set of all MEASUREMENTS or record of some QUALITATIVE trait corresponding to each unit in the collection of units
ch. 2
-----
what are the two types of quantitative variables
1. continuous- takes values in any interval i.e. physical measurements (height, weight) 2. discrete: takes values that are distinct numbers (integers) i.e. counts (bacteria in a petri dish) i.e. age in years
what are the two types of categorical data
1. ordinal- categories have intrinsic order i.e. movie ratings 2. nominal- categories are assigned a numerical code wo any intrinsic order i.e. blood type A=1, B=2, AB=3, O=4
what are the different kinds of biases in sampling (3)
1. selection bias 2. measurement or response bias 3. nonresponse bias
what are the 6 steps of the data analysis proces
1. set clearly defined goals for research study 2. make plan of what data to collect adn how to collect it 3. collect the data 4. data summary and preliminary analysis 5. apply appropriate methods for formal data analysis 6. interpret the info and draw conclusions
what do dotplots show
frequency of observations in a sample
relative frequency
frequency/ total number of observations (count/n)
example of cluster sampling
freshman at cornell are divided into clusters based on residence hall (dickson, donlon, mews, etc) dickson and donlon are selected and then all students are sampled from those two halls
systematic sampling
given a list or map of the units, randomly choose a starting unit and then sample every Kth unit k= (divide population by sample) (N/n)
what is the graphical summary of data with two categorical variables
grouped or segmented bar chart
clusters should be homogeneous or heterogeneous
heterogeneous
direct control
holding extraneous variables constant so that their effects arent confounded with those of the treatments
strata should be homogenous or heterogeneous
homogeneous (all freshman)
replication
independent repetition of a treatment to two or more experimental units