STAT 220 Midterm 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Event

A subset of a sample space

Histogram

A two dimensional way of plotting; divided into class intervals or bins

How to deal with low response rate

Cannot take a new sample to replace the non-respondents since the new sampled subjects are still the respondents, not the non respondents, Must make more efforts to reach the non-respondents, Always check the response rate, a low one might not result in a trustworthy study

Nominal variable

Categorical and categories do not have a natural order

Ordinal Variable

Categorical and has ordered categories

boxplot (box and whisker plot)

Graph based on the FNS; know how to make

Bin width

Impacts precision of data in a histogram; usually 0 or 1

68% rule

In a normal distribution, about 68% of the cases lie between the mean and ONE standard deviation unit on both sides of the mean

95% rule

In a normal distribution, about 95% of the cases lie between the mean and TWO standard deviation units on both sides of the mean

Quartile

Marker that divides data into four even parts

Numerical Variables

Values that represent quantities; continuous and discrete

Case

What we collect data from (every row represents a case)

random assignment vs random sampling

Whether a result is generalizable from data to a larger population depends on whether the date came form a random sample Whether a cause and effect relationship can inferred depends on whether the subjects are randomly assigned to the control/treatment group

Confounder

a factor associated with some outcome that confuses or confounds the determination of true cause and effect

Mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category; maintains absolute count

Statistic

a number describing a characteristic of a sample; can be computed based on your sample

Parameter

a number describing a characteristic of the population

observational study

a study based on data in which no manipulation of factors has been employed; proves association only

Categorical Variables

any variable that is not quantitative; nominal and ordinal

Histogram in a density scale

bar area = proportion of observations in the bin

non-response bias

bias introduced to a sample when a large fraction of those sampled fail to respond

Rows represent

cases

Variable

characteristic of a case that varies; describes a single case

retrospective study

collect data after events have taken place

Multistage sampling

combine the three basic sample methods in any way

graph for categorical v categorical

contingency tables, segmented/standardized bar plots, mosaic plots

Bad sampling methods

convenience (those who are available) and voluntary response samples (online polls (ex))

Outliers

extreme values that don't seem to belong with the rest of the data

Look for in scatter plots

form of the relationship, direction, strength, deviations from pattern

marginal totals

give distribution of the two variables

Segmented bar plot

graphical display of contingency table information (two categorical variables)

Clustered sampling

have subgroups of entire population (clusters), uniformly choose some of the clusters and take the entire clusters data

bell-shaped distribution

highest point occurs in the middle and tails go off equally to the left and right

Difference between bar plot and histograms:

histograms = numerical data, bar plots =categorical

When are two variables independent

if row/column proportions do not change, the variables are independent

Placebo effect

improvement resulting from the mere expectation of improvement

Clusters

indicate data may be comprised of several distinct kinds of individuals

mean < median impact on graph

left skewed

Probability

likelihood that a particular event will occur

Forms or relationship

linear, no relation, nonlinear

The mean and median of a symmetric distribution:

mean = median

Common measure of center

mean and median

Mean v median

mean is more easily changed with one data point

mean and median in skewed distribution

mean is pulled toward the longer tail (bc its more sensitive to extreme values)

center of histogram

mean or median

Standard deviation (concept)

measure that describes an average distance of every score from the mean

Properties of SD

measures spread about the mean and should be used only when the mean is the measure of center; very sensitive to outliers

Cells

middle columns of table (with information)

five number summary

minimum, Q1, median, Q3, maximum

What to look for in a histogram

modality, skewdness, outliers, center, spread

Blocking

more sophisticated design technique for experiments (know how this works/steps)

Unimodal

one mode/peak

frequentist interpretation of probability

probability of an event proportional to number of times event occurs in a large number of repetitions of the experiment

continuous variable

quantitative and has values are not countable

discrete variable

quantitative and values form a set of of separate numbers (0,1...)

Simple random sampling

random selection; every sampling unit has a known and equal chance of being selected (con: impractical for large populations)

how do you combat confounding

randomization (restrict or balance confounder), make comparisons for small/homogenous groups

Mean > median impact on graph

right skewed

standardized bar plot

same as segmented bar plot but uses proportions, not absolute count

graph for numerical v numerical

scatterplot

Strength of association of variables

seen by how much scatter there is around main form

two-way contingency table

shows data with two categorical variables and are shown as a two-dimensional table of rows and columns.

graph for numerical v categorical

side by side boxplots, histograms on the same horizontal axis

standard deviation formula

sqrt(sum of squares of the deviation from the mean/n-1)

prospective study

study that identifies individuals and collects information as events unfold

left-skewed distribution

tail extends to the left

right skewed distribution

tail extends to the right

Mean of histogram

the balance point of the histogram

Sample

the part of the population we actually examine and have data on

Sample Space

the set of all possible outcomes of an experiment

Trimodal

three modes/peaks

column total

total number in each column

Row total

total number in each row

Bimodal

two modes/peaks

single categorical variable

use a pie or bar chart, bar plot is most common

1.5 IQR Rule

used for identifying outliers: any values that are more than 1.5 times the IQR lower than the first quartile or higher than the third quartile are called outliers

Columns represent

variables

Simpson's Paradox

when averages are taken across different groups, they can appear to contradict the overall averages

negative association

when one variable increases or becomes larger, the other does the opposite

positive association

when one variable increases or becomes larger, the other does the same

Principles of experimenting

Replicate (collecting a large enough sample to make sure the difference in outcome of groups is not by chance), control, randomization, blocking

Standard Variance

Measures the average to which each point differs from the mean, the average of all data points; formula: standard deviation squared

Modes of a histogram

Number of peaks

Types of variables

Numerical and Categorical

Sampling keywords

Population, sample, parameter, statistic

Interquartile Range (IQR)

Q3-Q1

Common measure spread

Range = max-min, SD, IQR

What do you use the mean and median for in a graph

Tell if the graph is skewed and by how much

Skewness

Tells you the shape of the distribution of data

Multimodal

distributions with more than two modes/peaks

stratified sampling

divide population into subgroups (strata) and then perform simple random sampling

column proportions

dividing by column total to find proportions (normalized to 1)

overall proportions

dividing by grand total to find proportions (normalized to 1)

row proportions

dividing by row total to find proportions (normalized to 1)

Population

entire groups of individuals we are interested in

how to establish causality

establish correlation, establish time order, rule out alternative explanations

intersection

event that both A and B occur (A ⋂ B)

Union

event that either A or B occurs (A U B)

double-blind experiment

experiment in which neither the experimenter nor the participants know which participants received which treatment

single-blind experiment

experiment in which the participants are unaware of who received the treatment


Kaugnay na mga set ng pag-aaral

Lewis Chapter 58 Chronic Neurologic Problems Questions

View Set

Herencia multifactorial o poligénica

View Set

MED SURGE 2 MIDTERM PRACTICE QUESTIONS

View Set

Japanese 101 Oral Performance Cards

View Set

CH.10-INDEPENDENCE & MORAL SEDUCTION

View Set

1 Testbank Virtual Computer Tour

View Set

History 201- Unit 3 Review Quizzes

View Set