UCI Stats 7 Terms + Formulas

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Process of Discovery

1. asking the right questions 2. collecting useful data, which includes deciding how much is needed 3. summarizing and analyzing data, with the goal of answering the questions 4. making decisions and generalizations based on observed data 5. turning the data and subsequent decisions into new knowledge

Empirical Rule

68% w/in 1 standard deviation 95% in 2 standard deviations and 99.7% in 3 standard deviations

Variable

a characteristic that differs from one individual to the next. may be numerical or categorical

Statistics

a collection of procedures and principles for gathering data and analyzing information to help people make decisions when faced with uncertainty

Statistically Significant

a difference large enough to be unlikely to have occurred in the sample if there was no relationship or difference in population. does not necessarily have practical significance or importance

Z-score

a measure of how many standard deviations you are away from the norm (average or mean)

Margin of Error

a number added to and subtracted from the sample information to produce an interval that is 95% certain to contain the true value for the population

Practical Significance (Practical Importance)

a statistically significant difference that actually matters greatly

Observational Study

a study in which participants are merely observed and measured

Sample Survey

a survey where investigators gather opinions or other information from each individual included in the sample

Ordinal Variable

categorical variables that can be ordered (ex. drink sizes from small to large)

Population Data

collected when all individuals in a population are measured

Sample Data

collected when measurements are taken from a subset of a population

Population of Interest

collection of all individuals about which information is desired

Dataset

complete set of raw data

Categorical Variable

data consisting of group or category names. no logical ordering

Quantitative Variable (Measurement Variable/Numerical Variable)

data consisting of numerical measurements or counts. does not include numbers that do not follow an order (ex. Social Security numbers)

Census

data is collected from all members of a population

Response Variable (Outcome Variable)

dependent variable (y)

Distribution

describes how often possible responses occur

Location

describes the center, average (either mean or median)

Spread

describes variability (either standard deviation or IQR)

Skewed to the right

description of a shape where data values are concentrated at the left of the graph

Skewed to the left

description of a shape where data values are concentrated at the right of the graph

boxplot (box and whisker plot)

displays information given in a five-number summary, good for seeing location, spread, symmetry vs skewed, outliers, and comparing. not good for judging shape.

Random Assignment

each participant has a specified probability of being assigned to each treatment

Continuous variable

every value within some interval is a possible response. does not skip numbers, even the ones with really long and ugly decimals

Dotplot

graphs a dot for each data value on a number line. easy to see individual data values, easy to make, but gets cluttered with large sample size

Shape

how the graph is shaped

Explanatory Variable

independent variable (x) helps explain response variable but does not always have a causal relationship

Observation

individual measurement of an observational unit

Poll

investigators gather opinions or other information from each individual included in the sample

risk

likelihood of a bad outcome that can be estimated using the past rate for that outcome

Relative frequency distribution

lists categories similar to a frequency distribution but counts by percentages/proportions

Three Summary Characteristics

location, spread, shape

Nonparticipation bias (nonresponse bias)

many people who are selected for the sample do not respond to key survey questions or at all. people who actually participate are those who feel strongly about issues.

Margin of Sampling Error

margin of error in polls, term used to distinguish it from other sources of errors and biases that can distort results

Lower Quartile

median of the lower half of a numerical list

Upper Quartile

median of the upper half of a numerical list

Median

middle value of a numerical list

Five-Number Summary

minimum, Q1, median, Q3, maximum

Mode

most frequent value in a data set

Percentile

number that has __% of the data values at or below it

Raw Data

numbers and category labels that have been collected but have not yet been processed in any way

Unimodal

one peak in the graph

Placebo

pill or treatment designed to look like active treatment but with no active ingredients

Data

plural word referring to numbers or non-numerical labels collected from a set of entities (people, cities, etc)

Stem-and-leaf plot

present all individual values, bad for large sample sizes, restricted in choices for intervals

Multiple Testing (Multiple Comparisons)

refers to the fact that researchers often test many different hypotheses in the same study

Self-Selected Sample (Volunteer Sample)

sample size chosen by people who want to do it, not randomly

Histogram

similar to bar graph, can be used for any number of data values, good for large sets of data, flexibility with intervals, not informative when sample size is small

Observational Unit

single individual entity (ex. a person) in a study

Treatment

specific regimen or procedure assigned to participants by the experimenter

Summary Statistics

statistics that summarize a great deal of numerical information about a distribution, such as the mean and the standard deviation

Randomized Experiment

study in which treatments are randomly assigned to participants

Random Sample

subset of the population selected so that every individual has a specified probability of being part of the sample

Bar Graphs

summarizes one or two categorical variables, useful for making comparisons for two variables

Parameter

summary measure of population data

Statistic

summary measure of sample data

Rate

the number of times something occurs per number of opportunities for it to occur

base rate/baseline risk

the rate/risk at a beginning time period or under specific conditions

Sample Size

total number of observational units

Bimodal

two peaks in the graph

Pie Chart

used for a single categorical variable if there are not too many categories

Frequency distribution

used for categorical variables, lists frequencies (how often it occurs) for all categories

Outlier

values that are unusually large or small

Confounding variables

variable that is not the main concern of the study but may be partially responsible for the observed results

false positive (data snooping)

when researchers do multiple comparisons, they can get statistically significant findings by mistake


Ensembles d'études connexes

CHRS-125 Campbell University Test #4

View Set

10/31 - 45 - Clinical Correlation of Visual Field Deficits

View Set

AGEC 3403 Farm Business Management Midterm

View Set

Quiz #3 Covalent Bonds (Science)

View Set

Mental Health Chapter 27 Anxiety-Related, Obsessive-Compulsive, Trauma- and Stressor-Related, Somatic, and Dissociative Disorders

View Set

RN Adult Medical Surgical Online Practice 2019 A for NGN

View Set

Ch 55: Drugs Acting on the Lower Respiratory Tract

View Set