Surveys, Samples, Experiments, Data Displays, Descriptive Statistics
When skewness is present in a set of data, which descriptive summary measure is most appropriate
IQR and median
if high concentration of data in the middle...
IQR is small
volunteer/self-selected bias
a call goes out and people enter the study on their own - no sampling procedure and does not represent any population
undercoverage bias
a subgroup of the population is excluded from the very beginning - sampling procedure is used but can only represent the remaining population without the subgroup
response bias
an individual in the sample responds but does not give the correct data
nonresponse bias
an individual is selected to be in the sample but does not respond to the survey - look for high percentage not number
statistical significance
any sizeable enough difference is deemed to be due to treatment - difference beyond chance - due to more than random chance
sample mean
average (x̅) - does not have to be one of the numbers in the data set
convenience bias
choose individuals in the easiest way - sampling procedure but does not represent any population
stratified random sample
compare subgroups of the population equally - divide population into subgroups - choose a simple random sample from each subgroup
a flat histogram...
contains more variability
categorical data
data falling into groups
random sample
each group of the same size has the same chance of being selected as the sample
simple random sample
examine the entire population as it exists
independent variable
factor, x
if the mean of a data set is large, the standard deviation has to be large also
false
control group
given fake or no treatment, placebo
experiment
imposes some treatment and observes their responses
data distribution
list of all possible outcomes and how often they occur
if data is symmetric...
mean and median are similar
skewed right
mean is greater than median
skewed left
mean is less than median
if there are very few large values in a data set compared to the rest of the data...
mean will be greater than median
if there are very few small values in a data set compared to the rest of the data...
mean will be less than median
quantitative data
measurements and counts - numbers, money
from boxplot, can see...
median, IQR, and whether or not data is skewed
five number summary
minimum, Q1, median (Q2), Q3, maximum
treatment group
set of subjects given a certain treatment
which is more affected by skewness, IQR or standard deviation
standard deviation
...can affect graph
starting point and number of bars
most common observational study
survey
bias
systematic favoritism
timing
timing of survey can affect results
an outlier in a data set can significantly affect the value of the mean but not the median
true
you can have two data sets with the same mean but different standard deviation
true
type of survey
type of survey conducted can affect the results
median is least affected when...
values are added to the data set
confounding variable
variable not studied that can affect results
spot/avoid problems in bar graphs
watch the scale and look for sample size
histogram
way to graph quantitative data
if you add 10 to every value of a data set, standard deviation...
will be the same
if you multiply every value of a data set by 10, standard deviation...
will change
question wording
wording of a survey question can affect the results
standard deviation of the data set 1, 1, 1, 1
0
which set of four numbers would give you the largest standard deviation
1, 1, 4, 4
spot/avoid bias
1. sampling procedure must be used 2. sample must represent the entire population
Q1
1st quartile, 25th percentile
Q2
2nd quartile, 50th percentile, median
Q3
3rd quartile, 75th percentile
boxplots...
- cannot tell what sample size is - bigger boxes do not mean more data - cannot see mean
interquartile range (IQR)
- distance taken up by the middle 50% of the data - Q3 - Q1
good experiments...
- make comparisons - avoids bias - has enough data (random data)
standard deviation
- use to measure concentration of data around the mean - same units as original data - never negative - can equal zero - affected by outliers and skewness
frequencies
number in each category - table, bar graph with #
median
number that splits the ordered data in half - does not have to be one of the numbers in the data set
observational study
observes individuals, measures variables and makes conclusions or comparisons, and does not attempt to influence responses
relative frequencies
percentage in each category - table, pie chart, bar graph with %
in general, from a histogram, you cannot...
recreate the original data values
impact of bias
reduces credibility of results
dependent variable
response, y