PSYC210 - Final - Statistical Concepts

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

multiple regression

a statistical method that includes two or more predictor variables in the equation of a regression line to predict changes in a criterion variable; allows the researcher to quantify additional variance, interpret slope when holding other variables constant, identify stronger predictors, and model interactions among variables

independent variable (IV)

the variable that is manipulated in an experiment; conditions are actively created and assigned by researcher; the "presumed cause" or experimental variable

criterion variable

the variable with unknown values, but that can be predicted or estimated, given known values of the predictor variable; would be the dependent variable in an experiment otherwise represented as Y

predictor variable

the variable with values that are known and can be used to predict values of another variable; would be the independent variable in an experiment; otherwise represented as X

between-groups variance

the variation caused by an independent variable (plus some error due to chance and variability); the variation attributed to mean differences between groups

statistics

a branch of mathematics used to summarize, analyze, and interpret a group of numbers or observations

z (as in z-scores)

a calculation that allows us to conceptually compare on a common standardized scale, things that were originally on different scales; accomplished by standardizing the data by setting the mean to 0 and the standard deviation to 1

parameter

a characteristic that describes a population; usually numeric; ex. UNC average weight (as opposed to PSYC210 average weight)

statistic

a characteristic that describes a sample; usually numeric; ex. PSYC210 average weight (as opposed to UNC average weight)

critical value

a cutoff value that defines the boundaries beyond which (1 - α) or less of sample means can be obtained if the null hypothesis is true; sample means obtained beyond this result in a decision to reject the null hypothesis

outlier

a data point that diverges greatly from the overall pattern of data; can be defined as a score 3 or more standard deviations from the sample mean

null hypothesis

a statement about a population parameter, such as the population mean, that is assumed to be true; expects that there is no effect, no difference, no change, no relationship, no happenings

hypothesis

a statement or proposed explanation for an observation, phenomenon, or scientific problem that can be tested or observed; often a statement about the value for a parameter in a population

alternate hypothesis

a statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis; expects that there was an effect, difference, change, relationship, or something happening

pairwise comparison

a statistical comparison for the difference between two group means; all possible such comparisons are evaluated in a post hoc test for an ANOVA

effect size

a statistical measure of the size of an observed effect in a population, which allows researchers to describe how far scores shifted in the population, or the percentage of variance that can be explained by a given variable

positively skewed (distribution)

a distribution of scores where a few outliers are substantially larger (toward the right tail in a graph) than most other scores; just a few larger observations are pulling the data up

negatively skewed (distribution)

a distribution of scores where a few outliers are substantially smaller (toward the left tail in a graph) than most other scores

multimodal distribution (2 = bimodal distribution)

a distribution of scores where more than one score occur most often or most frequently, such that there are multiple modes; can still be discussed even if one or more "modes" occurs slightly less often than the actual arithmetic mode

rectangular distribution (nonmodal)

a distribution of scores with no mode, such that all scores occur at the same frequency; just a few smaller observations are pulling the data down; graphically presents a flat line at the top of the distribution

leptokurtic (distribution)

a distribution whose shape is more peaked or pointed, as the scores are more clustered

platykurtic (distribution)

a distribution whose shape is more platykurtic, as the scores are more spread out

frequency polygon

a dot-and-line graph used to summarize the frequency of continuous data at the midpoint of each interval; most useful for interval or ratio level data with continuous data

pie chart

a graphical display in the shape of a circle that is used to summarize the relative percent of discrete and categorical data into sectors; rarely used in professional statistics; most useful for nominal or ordinal level data

scatter plot (scatter diagram; scatter gram)

a graphical display of discrete data points (x, y) used to summarize the relationship between two variables

histogram

a graphical display used to summarize the frequency of continuous data that are distributed in numeric intervals; graphically expressed with no space between categories; most useful for interval or ratio level data with continuous data

bar graph (bar chart)

a graphical display used to summarize the frequency of discrete and categorical data that are distributed in whole units or classes; graphically expressed with space between categories; most useful for nominal or ordinal level data

box plot (box and whisker plot)

a graphical display where a rectangle or square is constructed using vertical lines at Q1 and Q3 as sides, with a vertical line is drawn at the median and horizontal lines drawn from Q1 to the lowest non-outlier and from Q3 to the highest non-outlier; most useful for interval or ratio level data

stem-and-leaf plot (stem-and-leaf display)

a graphical display where each individual score from an original set of data is listed; organized such that the common digits shared by all scores are listed to the left of the vertical line, showing the first digit or digits for each number in each row, and the remaining digits for each score are listed to the right of the vertical line, showing the last digit or digits for each number in each row; rarely used in more formal statistics literature; most useful for interval or ratio level data with continuous variables

two-tailed test (nondirectional test)

a hypothesis test where the alternative hypothesis is sated as "not equal to" (≠)

one-tailed test (directional test)

a hypothesis test where the alternative hypothesis is stated as "greater than" (>) or "less than" (<) a value in the null hypothesis; convey that the effect is expected in a certain direction; rarely used in real-life practice

Cohen's d

a measure for effect size in terms of the number of standard deviations that mean scores have shifted above or below the population mean stated by the null hypothesis; the larger the value, the larger the effect in the population

proportion of variance

a measure of effect size in terms of the proportion or percentage of variability in a dependent variable that can be explained or accounted for by a treatment

kurtosis

a measure of how peak or flat the distribution is, relative to a normal distribution

slope (regression)

a measure of the change in Y relative to the change in X; positive when X and Y change in the same direction, negative when X and Y change in opposite directions

variability

a measure of the dispersion, spread, or clustering of scores in a distribution; describes how much variation exists around a measure of central tendency; ex. range, variance, standard deviation

standard deviation (SD)

a measure of variability for the average distance that scores deviate from the mean of their distribution; considered very reliable and relatively unaffected by sample size, but can be affected by outliers

variance

a measure of variability for the average squared distance that scores deviate from the mean of their distribution

point-biserial correlation coefficient

a measure used to determine the direction and strength of the linear relationship of one factor that is continuous and a second factor that is dichotomous; represented by r[pb]

phi correlation coefficient

a measure used to determine the direction and strength of the linear relationship of two dichotomous factors on a nominal scale of measurement; represented by r[φ]

coefficient of determination

measures the proportion of variance in one factor that can be explained by known values of a second factor

Pearson correlation coefficient (product moment)

a measure used to determine the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement; represented by r; ranges from |r| = 1 = perfect relationship, |r| = 0.3 = moderate relationship, |r| = 0 = no relationship

Spearman rank-order correlation coefficient

a measure used to determine the direction and strength of the linear relationship of two ranked ordinal level factors; represented by r[s]

significance testing (hypothesis testing; null hypothesis significance testing)

a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample; works through testing some hypothesis by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding the population parameter were true

negative correlation

a negative value for r, indicating that the values of two factors change in different directions, meaning that as the values of one factor increase, the values of the second factor decrease

t distribution

a normal-like distribution with greater variability in the tails because the sample variance is substituted for the population variance to estimate the standard error

proportion

a part or portion of all measured data, the sum of all of which for a distribution of data is 1.0

positive correlation

a positive value for r, indicating that the values of two factors change in the same direction, meaning that as the values of one factor increase/decrease, the values of the second factor also increase/decrease

F-distribution

a positively skewed distribution derived from a sampling distribution of F-ratios

Tukey's Honestly Significant Difference (HSD) test

a post hoc test for analysis of variance that determines the minimum difference between two means that is necessary for significance, and then determines which pairwise comparisons show a significant difference in means

independent outcomes

a probability relationship where the probability of one outcome does not affect the probability of the second outcome

conditional outcomes (dependent outcomes)

a probability relationship where the probability of one outcome is dependent on the occurrence of other outcomes

complementary outcomes

a probability relationship where the sum of the probabilities for two outcomes is equal to 1, such that these outcomes are exhaustive of all possible outcomes and constitute 100% of the sample space

mutually exclusive outcomes

a probability relationship where two outcomes can't occur together, so that the probability of them occurring together is 0

reverse causality

a problem that arises when the direction of causality between two factors can be in either direction

restriction of range

a problem that arises when the range of data for one or both correlated factors in a sample is limited or restricted, compared to the range of data in the population from which the sample was selected

quasi-experiment

a research approach in which levels are not assigned to participant, usually because it is impossible or unethical to assign conditions; comparisons can be made, but causality cannot be inferred; non-experimental

correlational method

a research approach that observes behaviors (with no attempt to change or manipulate any variables) and attempts to understand the relationship between those behaviors; non-experimental

experiment

a research approach that specifically controls the conditions under which observations are made to isolate cause-and-effect relationships between variables

between-subjects design

a research design in which independent samples are selected, so that different participants are observed at each level of a factor

within-subjects design

a research design in which the same participants are observed across many groups but not necessarily before and after a treatment

repeated measures design

a research design in which the same participants are observed in each group or treatment

complete factorial design

a research design where each level of one factor is combined or crossed with each level of the other factor, with participants observed in each cell or combination of levels

population

a set of all individuals, items, or data of interest; the group about which scientists will generalize; ex. UNC as a whole (as opposed to PSYC210 class)

grouped data (frequency)

a set of scores distributed into intervals, where the frequency of each score can fall into any given interval; categories are inclusive and exhaustive, such that every score can be represented, not duplicated, and not overlapping; useful for data with more than 10 individual categories, because can narrow those categories into 7 to 10 intervals; important not to skip any categories (can mark empty ones with zero)

ungrouped data (frequency)

a set of scores or categories distributed individually, where the frequency of each individual score or category is counted; most useful for data with less than 10 individual categories

sample

a set of selected individuals, items, or data taken from a population of interest; ex. PSYC210 class (as opposed to UNC as a whole)

main effect

a source of variation associated with mean differences across the levels of a single factor; the influence of one independent variable on the dependent variable

interaction

a source of variation associated with the variance of group means across the combination of levels of two factors; a measure of how cell means at each level of one factor change across the levels of a second factor; the extent to which the influence of one variable depends on another

post hoc test

a statistical procedure computed following a significant ANOVA to determine which pair(s) of group means significantly different; only necessary when more than two groups are present (with two groups, there is only ONE pair of group means to compare, so they must significantly differ); protect against the alpha inflation that would result from repeated individual t-tests

interval estimation

a statistical procedure in which a sample of data is used to find the range of possible values within which a population parameter is likely to be contained; ideally creates as small a range as is possible

estimation

a statistical procedure in which a sample statistic is used to estimate the value of an unknown population parameter

point estimation

a statistical procedure that involves the use of a sample statistic to estimate a population parameter at a single value

least squares method

a statistical procedure used to compute the slope (b) and y-intercept (a) of the best-fitting straight line to a set of data points; minimizes the distance between the regression line and the actual data points

correlation

a statistical procedure used to describe the strength and direction of the linear relationship between two factors

single sample z-test (one-independent sample z-test)

a statistical procedure used to test hypotheses concerning the mean in a single population with a known variance

single sample t-test (one-independent sample t-test)

a statistical procedure used to test hypotheses concerning the mean in a single population with an unknown variance

factorial ANOVA

a statistical procedure used to test hypotheses concerning the variance of groups created by combining the levels of two or more factors; can detect whether or not outcomes based on 2+ factors are different from just a sum of the effects of the individual factors themselves; used when the variance in any one population is unknown

one-way ANOVA (one-way between-subjects ANOVA)

a statistical procedure used to test hypotheses for one factor with two or more levels concerning the variance among the group means; used when different participants are observed at each level of a factor and the variance in any one population is unknown

analysis of variance (ANOVA; F-test)

a statistical procedure used to test hypotheses for one or more factors concerning the variance among two or more group means, where the variance in one or more populations is unknown

independent groups t-test (two-independent sample t-test)

a statistical procedure used to test hypothesis concerning the difference between two population, where the variance in one or both populations is unknown

omnibus test

a statistical test that tests whether the explained variance in a set of data is significantly greater than the unexplained variance, overall; only returns whether or not at least one mean is not like the others; ex. ANOVA

linear regression

a statistically procedure used to determine the equation of a regression line to a set of data points and the extent to which the regression equation can be used to predict values of one factor

simple frequency distribution

a summary display for the frequency of each individual score in a distribution (using ungrouped data), or the frequency of scores falling within defined groups or intervals in a distribution (grouped data)

relative percentage

a summary display that distributes the percentage of scores occurring in each class interval relative to all scores distributed; reported as [#]%

relative frequency

a summary display that distributes the proportion of scores occurring in each interval of a frequency distribution; computed as the frequency in each interval divided by the total number of frequencies recorded; reported as a proportion

cumulative relative frequency

a summary display that distributes the sum of a relative frequencies across a series of intervals; can be added from the bottom up or the top down in a frequency distribution; reported as the number of scores above or below a certain point

cumulative frequency

a summary display that distributes the sum of frequencies across a series of intervals; can be added from the bottom up or top down in a frequency distribution

cumulative percentage

a summary display that distributes the sum of relative percentages across a series of intervals; presented from the bottom up as a percentile rank

central limit theorem

a theorem that states that regardless of the distribution of scores in a population, the sampling distribution of sample means selected from that population will be approximately normally distributed; covers each aspect of the sampling distribution of means, addressing the center, spread, and shape

sampling distribution

a theoretical distribution that shows the frequency of each possible value of a statistic calculated from a certain sample size, given that the null hypothesis is true, the population was sampled randomly and with replacement, and all possible samples have been taken; will be normal when drawn from a normal population; will be normal when large-n samples are drawn from a non-normal population; ex. the distribution of the values for IQR if we draw a huge number of n-sized samples, replacing them each time

normal distribution (symmetrical; Gaussian; bell-shaped)

a theoretical distribution with data that are unimodal and symmetrically distributed around the mean, median, and mode, so that most values occur in the middle of the distribution; has asymptotic tails that go out in either direction to infinity

correlation coefficient

a value that measures the strength and direction of the linear relationship (correlation) between two factors; ranges from -1.0 to +1.0

quantitative variable

a variable that varies by amount; measured numerically; often collected by measuring or counting; produces a measurement with a non-arbitrary meaning; ex. age, weight

qualitative variable

a variable that varies by class; often represented as a label; describes non-numeric aspects of phenomena; involves categories or designations to arbitrary groups assigned an arbitrary number; can only be discrete variables; ex. brands of cereal, gender, color

quasi-independent variable

a variable whose levels are non-randomly assigned to participants; differentiates the groups or conditions being compared

estimated standard error

an estimate of the standard deviation of a sampling distribution of sample means selected from a population with an unknown variance; estimate of the standard distance that sample means can be expected to deviate from the value of the population mean stated in the null hypothesis

standard error of estimate

an estimate of the standard deviation or distance that a set of data points falls from the regression line

matched-pairs design (matched-subjects design)

an experimental method in which pairs of participants are matched (either experimentally or naturally) based on common characteristics or traits that they share

z (as in z-tests)

an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates from the population mean stated in the null hypothesis

t-obtained

an inferential statistic used to determine the number of standard deviations in a t-distribution that a sample mean deviates from the mean value or mean difference stated in the null hypothesis

paired samples t-test (related samples t-test)

an inferential statistic used to test hypothesis concerning two related samples selected from populations in which the variance in one or both populations is unknown

confound (third variable)

an unanticipated variable not accounted for in a research study that could be causing or associated with observed changes in one or more measured variables

variable

any characteristic or condition that changes or has different values for different individuals

biased estimator

any sample statistic obtained from a randomly selected sample that does not equal the value of its respective population parameter on average

unbiased estimator

any sample statistic obtained from a randomly selected sample that equals the value of its respective population parameter on average

continuous variable

data measured along a continuum at any place beyond the decimal point; can be measured in whole units or fractional units; ex. weight, age

discrete variable

data measured in whole units or categories that are not distributed along a continuum; organized into categories and cannot be organized into infinite fractions; can only be measured in whole numbers; ex. number of children (b/c you can't have half a child)

ratio level (ratio scale)

data measured where a set of values has a true zero and equidistance; can discuss true numerical comparisons; will seldom involve discrete and almost always continuous variables; one of two "scale level" types; ex. weight, height, time, calories

anticipating main effect (in factorial ANOVA)

determining possible significance by comparing marginal means in a factorial ANOVA; significance is likely when differences are found; on a graph, significance is likely when averages for individual factor level lines are different

anticipating interaction (in factorial ANOVA)

determining possible significance by comparing mean differences between cells (by subtracting across) in a factorial ANOVA; significance is likely when differences are found; on a graph, significance is likely when, if extended outward infinitely, the lines representing two factors would cross

statistical significance

found when the null hypothesis is rejected, because there is a low probability of the observed outcome happening by chance alone if the null hypothesis is really true; does not imply importance, weightiness, or worth of finding

nominal level (ordinal scale)

measurements where a number is assigned to represent something or someone; typically categorical variables that have been coded by assigning arbitrary numbers; will always involve discrete variables; ex. gender

interval level (interval scale)

measurements where the values have no true zero and the distance between each value is equidistant; associated with a defined order; will sometimes involve discrete and sometimes continuous variables; one of two "scale level" types ex. Likert-type scales, temperature, latitude, longitude

ordinal level (ordinal scale)

measurements where values convey order or "rank" alone, such that the space between the numbers is not necessarily equal; will always involve discrete variables; ex. ranks in a race

inferential statistics

procedures that allow researchers to infer or generalize observations made with samples to the larger population from which they were selected; ex. z-test, t-test

descriptive statistics

procedures used to summarize, organize, and make sense of a set of scores or observations; typically represented graphically, in tabular form, or as a summary; taking a set of numbers and organizing or summarizing them in some way; ex. table, graph, giving an average

mean calculations (adding or subtracting a constant to every number)

produces the same effect as adding/subtracting that constant to the mean itself

mean calculations (multiplying or dividing by a constant every number)

produces the same effect as multiplying/dividing the mean itself by that constant

central tendency

statistical measures for locating a single score that is most representative or descriptive of all scores in a distribution; ex. mean, median, mode

experimentwise alpha

the alpha level (as in probability of committing a Type I error) for all tests, when multiple tests are conducted on the same data; minimized by performing an ANOVA instead of multiple t-tests

random sampling

the assumption that each individual has in the population has an equal chance of selection; keeps probability constant across populations

sampling with replacement

the assumption that each selected individual or score is returned to the population before another is selected; keeps probability constant across populations

regression line

the best-fitting straight line to a set of data points, which is best-fitting when it minimizes the distance of all data points that fall from it

cell (ANOVA)

the combination of one level from each factor; in a research study, represents one group

pooled sample standard deviation

the combined sample standard deviation of two samples; estimates the standard deviation for the differences between two population means

correlation does not imply causation

the concept that events or statistics that happen to coincidence with each other are not necessarily involved in a cause and effect relationship

significance level

the criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis; based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true

effect

the difference between a sample mean and the population mean stated in the null hypothesis; is not significant when the null hypothesis is retained and is significant when the null hypothesis is rejected

range

the difference between the largest and the smallest values in a data set; somewhat crude and unreliable; does not describe how evenly distributed scores are or where they are clustered

within-group variance

the difference in spread of scores within each group, due to error (individual differences, sampling error, uncontrolled variables, etc.); the variation attributed to mean differences within each group; cannot be attributed to or caused by having different groups, so is also known as error variation

sampling error

the discrepancy between a population parameter and a sample statistic

binomial distribution

the distribution of probabilities for each outcome of a bivariate/dichotomous/binomial random variable (that is, any random variable with only two possible, mutually excclusive outcomes); appears closer to normal as sample size increases

probability distribution

the distribution of probabilities for each outcome of a random variable, the sum of which is equal to 1.0

probability

the frequency of times an outcome occurs divided by the total number of possible outcomes

confidence interval (CI)

the interval or range of possible values within which an unknown population parameter is likely to be contained

alpha level

the level of significance or criterion for a hypothesis test; the largest probability of committing a Type 1 error that researchers will allow and still decide to reject the null hypothesis; defines the low probability that serves as a cut-off value and determines what is considered "unlikely"; generally set at 0.05 or 0.01

margin of error

the maximum expected difference between the true population parameter and a sample estimate of that parameter; equal to half the width of a given confidence interval

expected value

the mean or average expected outcome for a given random variable; the sum of the products for each random outcome times the probability of its occurrence; when equal to the parameter value, the statistic is unbiased

median (Q2; second quartile)

the middle value in a distribution of data listed in numeric order; resistant to extreme observations; appropriate for ordinal level data; should be chosen for use when the distribution is skewed or there is a significant outlier

degrees of freedom

the number of scores in a sample that are free to vary; the number of independent unrestricted scores

p value

the probability of obtaining a sample outcome, given that the value stated in the null hypothesis for a population parameter is true; this value is compared to the significance level to make a decision regarding the null hypothesis

increasing power

the probability of rejecting a false null can be increased by increasing distance between the two groups' means (designing experiment to exaggerate differences), increasing alpha, using a one-tailed test (directional hypothesis; not done in practice), decreasing variability (using more reliable measurements), or increasing sample size

power

the probability of rejecting a false null; the probability that a randomly selected sample will show the null hypothesis is false when the null hypothesis is, in fact, false

Type I error

the probability of rejecting a null hypothesis that is actually true; directly controlled for by stating an alpha level; described as a "false positive"

Type II error

the probability of retaining a null hypothesis that is actually false; specified as a beta level; described as a "false negative"

level of confidence

the probability or likelihood that an interval estimate will contain an unknown population parameter

interquartile range (IQR)

the range of a distribution of scores falling within the upper (Q3) and lower (Q1) quartiles of a distribution; resistant to skews, outliers, and extreme scores, but fluctuates between samples

empirical rule

the rule that states that for any normally distributed set of data, at least 99.7% of data lie within 3 SD of the mean, at least 95% of data lie within 2 SD of the mean, and at least 68% of data lie within 1 SD of the mean

third quartile (Q3)

the score after which data fall in the top 25% of a distribution of scores

first quartile (Q1)

the score before which data fall in the bottom 25% of a distribution of scores

continuity correction

the solution to the problem that normal distributions are continuous, but binomial values are discrete variables; involves remembering that, for a discrete variable, the category actually starts halfway between itself and the previous category; ex. in a category with possible values 10, 11, 12, the probability of getting 11 or higher actually starts (theoretically) at 10.5

mean (average)

the sum of a set of scores in a distribution, divided by the total number of scores summed; not resistant to extreme observations; should be chosen for use when all measures of central tendency seem equally valid or when the data is continuous, and ~normal

sum of squares

the sum of the squared deviations of scores from their mean; the numerator in the variance formula

F

the test statistic for an ANOVA; the mean square between groups divided by the mean square within groups; is always positive; determines how large or disproportionate the differences are between the groups, compared to within the groups alone by chance

mode

the value in a data set that occurs most often or most frequently; only valid option for nominal level data; should be chosen for use when the distribution is bimodal

obtained value

the value of a test statistic that is compared to the critical value(s) for a hypothesis test in order to make a decision; when this exceeds a critical value, the null hypothesis is rejected, and otherwise, the null hypothesis is retained

y-intercept (regression)

the value of the criterion variable (Y) when the predictor variable (X) equals 0

dependent variable (DV)

the variable that is believed to change in the presence of the other variable; the "presumed effect"; the results or outcomes being examined


Set pelajaran terkait

Chapter 1 and 2 terms H US history

View Set

quotes - speaker , act , who is being spoken to

View Set

Chapter 7 Cellular Respiration and Fermentation

View Set

Ch 48 Management of Patients with Kidney Disorders

View Set