Statistics C797

Ace your homework & exams now with Quizwiz!

Inference:

A conclusion about a population drawn from results based on a sample of data from that population.

Covariate:

A continuous variable used to adjust the mean scores of groups; a method for control of extraneous variation.

Scatter diagram:

A diagram that graphically represents the relationship of two ordinal-, interval-, or ratio-level variables to each other. The diagram is typically presented with correlation coefficients.

Directional hypothesis:

A directional hypothesis is a prediction regarding a positive or negative change, relationship, or difference between two variables of a population.

Bar graph:

A graph used for nominal or ordinal data. A space separates the bars.

Box plots:

A graphic display that uses descriptive statistics based on percentiles.

Bell shaped:

A graphical shape, typical of the normal distribution.

Sample:

A group selected from the population in the hope that the smaller group will be representative of the entire population.

Mean:

A measure of central tendency. It is the arithmetic average of a set of data.

Standard deviation:

A measure of dispersion of scores around the mean. It is the square root of the variance.

Goodness-of-fit statistic:

A measure of how well the data fit the model; compares the observed probabilities to those predicted by the model.

Variance:

A measure of the dispersion of scores around the mean. It is equal to the standard deviation squared.

Correlation coefficient:

A measure of the extent to which the variation in one variable is related to the variation in another variable. Values range from _1 to +1.

F:

A measure of the ratio of between group variability and within group variability produced by analysis of variance.

Partial correlation:

A measure of the relationship between two variables after statistically controlling for the influence of some other variable(s) on both variables being correlated.

Skewness:

A measure of the shape of an asymmetrical distribution.

Kurtosis:

A measure of whether the curve is normal, flat, or peaked.

Variable:

A measured characteristic that can take on different values.

Indicator:

A measured variable in structural equation models; may also be called a manifest variable.

Nominal measure:

A measurement scale in which the numbers have no intrinsic meaning but are merely used to label different categories. Ethnic identity, religion, and health insurance status (eg, none, Medicaid, Medicare, private) are all examples of nominal-level data.

Ratio scale:

A measurement scale in which there are both equal intervals between units and a true zero. Most biologic measures (eg, weight, pulse rate) are ratio-level variables.

Likert scale:

A measurement scale that asks respondents to register the level to which they agree or disagree with a set of statements. There are typically five to seven response categories that range from strongly disagree to strongly agree. Other descriptors, such as level of satisfaction, may be used instead of level of agreement. Likert scale data is often ordinal data because the data can be ordered, but it can be nominal if the scale is not set up to be a rank order scale.

Ordinal scale:

A measurement scale that ranks participants on some variable. The interval between the ranks does not necessarily have to be equal. Examples of ordinal variables are scale items that measure any subjective state (e.g., happiness: very happy, somewhat happy, somewhat unhappy, very unhappy; attitude: strongly agree, somewhat agree, somewhat disagree, strongly disagree; and military rank: general, colonel, sergeant, private).

Dummy coding (indicator coding):

A method of assigning nominal-level variables 1s and 0s to denote the presence or absence of the value.

Indicator coding (dummy coding):

A method of coding nominal-level variables using 1s and 0s. Reflects a comparison of the control group mean with other group means.

Dichotomous variable:

A nominal variable having only two categories.

Wilcoxon matched-pairs signed rank test:

A non-parametric technique analogous to the paired t test. Used to compare paired measures. Tests for differences between matched pairs. Variables: One group measured on the same continuous DV at Time-1 and Time-2.

Kruskal-Wallis H-test:

A nonparametric alternative to ANOVA. It is used to determine the difference in the medians between three or more groups. Variables: One categorical with three or more groups; One continuous DV.

Analysis of variance (ANOVA):

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors.

MANOVA - One-Two way

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors. Tests differences among two or more groups on a combination of DVs. Creates a new summary DV which is a linear combination of each of the original DVs. Variables: 2 or more categorical IVs, 2 or more related DVs.

Analysis of variance (ANOVA) - Between-Within

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors. Variables: One categorical IV (nominal, ordinal data) between 2 or more groups IV and one categorical IV within groups (1 group measured 2 or more times) and One continuous DV (dependent variable that is ratio or interval data).

Analysis of variance (ANOVA) - 1 way

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors. Variables: One categorical IV (nominal, ordinal data) with 3 or more categories with One continuous DV (dependent variable that is ratio or interval data).

Analysis of variance (ANOVA) - 2 way

A parametric statistical technique used to compare the means of three or more groups as defined by one or more factors. Variables: Two categorical IV (nominal, ordinal data) with 3 or more categories with One continuous DV (dependent variable that is ratio or interval data).

Correlated t test (paired t test):

A parametric test to compare two matched pairs of scores.

Paired t test:

A parametric test to compare two pairs of scores.

Pearson correlation coefficient:

A parametric test used to determine if a linear association exists between two measures of interval or ratio measurement scale. The variables need to be normally distributed.

Independent t test:

A parametric test used to determine whether the means of two independent groups are significantly different from each other.

Probability:

A quantitative description of the likely occurrence of a specific event, conventionally expressed on a scale from 0 to 1.

Confidence interval:

A range of values around a measurement that describes how precise the measurement is - the wider the range, the less precise the estimate.

Interval-level measurement:

A rank-order scale with equal intervals between units but no true zero. IQ scores, SAT scores, and GRE scores are all examples of interval-level data.

Causal relationship:

A relationship in which one or more variables are presumed to cause changes in another variable.

Experiment:

A research study with the following characteristics: an intervention that the investigator controls and that only some groups receive, random selection of participants into the study, and random assignment of participants to intervention and control groups.

Independent random sample:

A sample in which the value of the variables for each subject is not related to the value of the variables for the other subjects and in which each subject has an equal chance of being selected to be in the study.

Phi:

A shortcut method of calculating Pearson correlation coefficient when both variables are dichotomous.

Critical value:

A specific point on the test distribution that is compared to the test statistic to determine whether to accept or reject the null hypothesis.

Nondirectional hypothesis:

A specific statement that a difference exists between groups or a relationship exists between variables, with no specification of the direction of the difference or relationship.

Correlation matrix:

A square symmetric matrix containing correlations between pairs of variables.

Significance test:

A statistical calculation that assigns a probability to a statistical estimate; a small probability implies a significant result.

Regression:

A statistical method that makes use of the correlation between two variables and the notion of a straight line to develop a prediction equation.

Mixed design:

A study that includes between- and within-group factors.

Frequency distribution:

A systematic array of data together with a count of the raw frequency that each value occurs, the relative frequency with which it occurs, and the cumulative frequency with which it occurs.

Logistic regression:

A technique designed to determine which variables predict the probability of an event.

Biserial correlation:

A technique used when one variable is dichotomized and the other is continuous to estimate what the correlation between the two variables would be if the dichotomized variable were continuous.

Levene's test:

A test for homogeneity of variance that tests the null hypothesis that the variances of two or more distributions are equal. It is less sensitive to departures from normality than are other tests of this type. If x>.05, then then variances are similar or equal. If x<.05, then the variances are NOT similar or equal.

Box's M test

A test of the assumption that the variance-covariance matrices are equal across all levels of the between-subjects factor in a repeated-measures analysis of variance.

One-tailed test of significance:

A test used with a directional hypothesis that proposes extreme values are in one tail of the distribution.

Two-tailed test of significance:

A test used with a nondirectional hypothesis, in which extreme values are assumed to occur in either tail of the distribution.

Normal distribution:

A theoretical probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability that these values will occur. Normal distributions are unimodal

Normal curve:

A theoretically perfect frequency polygon in which the mean, median, and mode all coincide in the center, and which takes the form of a symmetrical bell-shaped curve.

Wald statistic:

A value tested for significance in logistic regression.

Extraneous variable:

A variable that confounds the relationship between the dependent variable and the independent variable.

Histogram:

A way of graphically displaying ordinal-, interval-, and ratio-level data. It shows the shape of the distribution.

Cross-tabulation:

A way of presenting the relationship between two variables in a table format; the rows and columns of the table are labeled with the values of the variables.

Type II error:

Accepting the null hypothesis when it is false.

Standard regression:

All the independent variables are entered together.

Fisher's exact test:

An alternative to chi-square for 2 × 2 tables when sample size and expected frequencies are small.

Analysis of covariance (ANCOVA) :

An analysis of variance technique that allows comparison of group means after controlling for the effect of the covariate. Used for two-group pre-test/post-test where the pre-test is treated as a covariate to 'control' for pre-existing differences between the groups. Variables: one or two categorical IVs; one continuous DV; one continuous covariate.

Multivariate analysis of variance:

An analysis of variance with more than one dependent variable.

Compound symmetry:

An assumption made when conducting a repeated-measures ANOVA. It means the correlations and variances across the measurements are equivalent.

Positively skewed distribution:

An asymmetric distribution with a disproportionate number of cases with a low value. The tail of this distribution points to the right. Also known as a right-skewed distribution.

One-way analysis of variance:

Analysis of variance with one factor (independent variable) with three or more levels.

Negative relationship:

As the value of one variable increases, the value of the other decreases. Also called an inverse relationship.

Randomization:

Assignment of individuals to groups by chance (i.e., every subject has an equal chance of being assigned to a specific group).

Negatively skewed distribution:

Asymmetric distribution that has a disproportionate number of cases with high values and a tail that points to the left. Also called a left-skewed distribution.

Empirical study:

Based on observation or experience.

Continuous variable:

Can take on any possible value within a range. Example: Weight is 152.5 lbs., can take on any value within a range. Non example: Number of children, can take on only specific values (0, 1, 2, and so on). A value of 1.2 for children does not make any conceptual sense.

Listwise deletion:

Cases (subjects) are dropped from analysis if they have any missing data.

Parameters:

Characteristics of the population.

Data set:

Collection of different values of all the variables used to measure the characteristics of the sample or population.

Positive relationship:

Commonly referred to as a "direct" relationship. The values of x and y increase or decrease together. As x increases, y also increases.

Percentile:

Describes the relative position of a score.

Mutually exclusive and exhaustive categories:

Each participant (e.g., item, event) can fit into one and only one category, and each participant (e.g., item, event) fits into a category.

Outliers:

Extreme values of a variable that are at the tail end of the distribution of the data. Sometimes outliers are defined as being greater than ±3.0 standard deviations from the mean.

Hypothesis:

Formal statement of the expected relationships between variables or differences between groups.

Adjusted group mean:

Group mean scores that have been adjusted for the effect of the covariate on the dependent variable.

Exp(B):

In Logistic Regression, it is the exponent of b or the odds ratio.

Beta coefficients:

In a regression equation, theses are the estimates resulting from an analysis performed on variables that have been standardized so that they have variances of 1.

Probability value (p-value):

In a statistical hypothesis test, the likelihood of getting the value of the statistic by chance alone.

Factor(s):

In an ANOVA model, the independent variable(s) that defines the groups whose means are being compared.

Pairwise deletion:

In correlational analyses, cases (subjects) are excluded when they are missing one of the two variables being correlated.

Multicollinearity:

Interrelatedness of independent variables.

Blinding:

Keeping subjects and observers unaware of treatment assignments.

Dependent variable:

Measures the effect of some other variable

Coefficient of variation:

Measures the spread of a set of data as a proportion of its mean; usually expressed as a percentage.

Chi-square goodness of fit

Non-parametric test comparing a 1 sample proportion to a hypothesize value; Variable: 1 categorical with two or more categories

Chi-square for Independence

Non-parametric test comparing frequencies between two groups (sample, and expected in population). Variables: Two categorical with 2 or more categories each.

Cochran's Q test

Non-parametric test comparing one sample over time (e.g., three or more measurement periods). Variables: Three categorical variables measuring the presence or absence (Yes=1, No=0) of a characteristic at Time 1, Time 2, Time 3, etc.

McNemar's Test

Non-parametric test comparing one sample over time (matched, repeated measure or pre-post test); Variables: Two categorical measuring the presence or absence (Yes=1, No=0) of the same characteristic at Time 1 and Time 2

Chi-square:

Non-parametric test comparing the frequencies of categories of items in a sample to the frequencies that are expected in the population.

Kappa Measure of Agreement

Non-parametric test measuring the proportion of agreement between two raters/tests; Variables are Two categorical variables with an equal number of categories (Rater = 1 or 2), Diagnosis Present Yes=1, No=0)

Sample size:

Number of subjects included in the study.

Within-sample independence:

Observations within the sample are independent of each other.

T-Test (Student's t-test) - Independent

Parametric test for differences in the mean between two groups; Variables: One categorical IV with two levels (e.g. Males, Females) and one continuous DV (ratio, interval data)

T-Test (Student's t-test) - Paired

Parametric test for diffrences in the mean within one group at different times; Variables: One categorical IV with two levels (Time 1, Time 2); one continuous DV (dependent variable that is ratio or interval data)

R:

Pearson's correlation.

Heteroscedasticity:

Refers to situations in which the variability of the dependent variable is not equivalent across the values of the independent variable.

Type I error:

Rejecting the null hypothesis when it is true.

Eta:

Sometimes called the correlation ratio. It can be used to measure a nonlinear relationship. Eta-squared is used as the effect size in t-Tests and one-way ANOVA. The range of values is from 0 to 1.

Significance level:

Specifies the risk of rejecting the null hypothesis when it is true.

R2:

Squared multiple correlation; the amount of variance accounted for in the dependent variable by a combination of independent variables.

z-scores:

Standardized scores calculated by subtracting the mean from an individual score and dividing the result by the standard deviation; represents the deviation from the mean in a normal distribution.

Parametric tests:

Statistical tests based on assumptions that the sample is representative of the population and that the scores are normally distributed. Examples include t-tests and ANOVA. These tests are used with ratio or interval data, and have more statistical power than nonparametric tests.

Nonparametric tests:

Statistical tests designed to be used when the data fail to meet one or more of the assumptions required of parametric tests. These tests are "distribution free" but usually have less power than parametric tests. Examples include Chi-Square, Fisher exact probability, Mann-Whitney, Wilcoxon, Kruskal-Wallis.

Descriptive statistics:

Statistics used to summarize and describe data.

Within-subjects designs:

Subjects serve as their own controls. Subjects are measured more than once on the same variable, or subjects are exposed to more than one treatment.

Post hoc tests:

Tests of paired comparisons made when an overall test, such as an ANOVA, is statistically significant. Post hoc tests are used to control for the problems caused by multiple comparisons. Many post hoc tests, including Tukey's honestly significant difference test, and Bonferroni post hoc test, are available.

p-value:

The actual probability of getting the obtained results or results even more extreme. The smaller the p-value, the more statistically significant (i.e., the less likely the result is due to chance).

Measurement:

The assignment of numerals to objects or events, according to a set of rules.

Meaningfulness:

The clinical or substantive meaning of the results of statistical analysis.

Coefficient of determination:

The correlation coefficient squared (r2); a measure of the variance shared by the two variables; a measure of the "meaningfulness" of the relationship.

Reliability:

The degree of consistency with which an instrument measures what it purports to measure. Reliability can be broken down into test-retest reliability, interrater reliability, and internal consistency.

Efficiency:

The degree to which the test result and the diagnosis agree, that is, the overall accuracy of a test in measuring true findings; expressed as a percentage.

Range:

The difference between the maximum and minimum values in a distribution.

Homogeneity of regression:

The direction and strength of the relationship between the covariate and the dependent variable must be similar in each group.

Population:

The entire group having some characteristic (e.g., all people with depression, all residents of the United States). Often a sample is taken of the population and then the results are generalized to that population.

Validity:

The extent to which an instrument measures what it intends to measure; the extent to which the measurements are "true."

Internal validity:

The extent to which the findings of a study truly and accurately represent the relationship between the independent variable(s) and the dependent variable.

Generalizability:

The extent to which the research findings can be applied to situations beyond those of the immediate group that was studied. The extent to which the findings can be used to make inferences about the population that the sample came from.

External validity:

The extent to which the results of a study can be generalized to other populations or settings than the sample that was studied.

Statistics:

The field of study that is concerned with obtaining, describing, and interpreting data; the characteristics of samples.

Quartile:

The four "quarters" of the data distribution. The first quartile is the 25th percentile, the second quartile is the 50th percentile, the third quartile is the 75th percentile, and the fourth quartile is the 100th percentile.

Ratio-level measurement:

The highest level of measurement. In addition to equal intervals between data points, there is an absolute zero.

Alternative hypothesis (Ha) :

The hypothesis that states a statistically significant relationship exists between the variables. It is the hypothesis opposite to the null hypothesis. It is also referred to as the "acting" hypothesis or the research hypothesis.

Null hypothesis:

The hypothesis that states that two or more variables being compared will not be related to each other (i.e., no significant relationship between the variables will be found).

Conditional probability:

The likelihood an event will occur given the knowledge that another event has already occurred.

Regression line:

The line of best fit formed by the mathematical technique called the method of least squares.

Nominal:

The lowest level of measurement; consists of organizing data into discrete units that represent "name" of something or a category.

Effect size:

The magnitude of the impact made by the independent variable on the dependent variable.

Median:

The middle value or subject in a set of ordered numbers.

Mode:

The most frequently occurring number or category.

MANCOVA

The multivariate extension of ANCOVA where the DVs are adjusted for by one or more continuous covariates to remove their relationship from the DV before assessing differences on the Ivs. Variables: 2 ore more categorical Ivs, 2 or more related DVs, one or more covariates.

Mann-Whitney U-test:

The nonparametric analogue of the independent t test. It is used to determine the statistical significance of the difference in the medians of two independent groups. Used when there are outliers. Variables: One categorical with two groups; one continuous dpeendent variable (DV).

n:

The number of participants in specific subgroups.

Degrees of freedom:

The number of values in a statistic that can vary given what is already known about the other values and the sum of the values.

Alpha level (_-level):

The p-value defined by the researcher as being statistically significant. It is the chance that the researcher is willing to take of committing a type I error. The most commonly used _-levels are .05, .01, and .10.

Intercept constant (a):

The point at which the regression line intercepts the Y-axis.

Y_:

The predicted score in a regression equation.

Line chart:

The preferred type of chart to show many changes over time for many periods of time, or to place emphasis on a specific factor.

Alpha:

The probability of making a type I error.

Beta (_-level):

The probability of making a type II error. Between-group variance: A measure of the deviation of group means from the grand mean.

Odds ratio:

The probability of occurrence over the probability of nonoccurrence.

Likelihood:

The probability of the observed results given the parameter estimates.

Power:

The probability that the null hypothesis will be correctly accepted by the test. It is denoted by _.

Estimation:

The procedure for testing a model whereby a sample of observed data is used to make estimations of population parameters.

Negative predictive value:

The proportion of people who do not have the disease who tested negative for the disease, that is, "true negatives"; expressed as a percentage.

Sensitivity:

The proportion of people with disease who have a positive test result.

Positive predictive value:

The proportion of people with the disease who tested positive for the disease, that is, "true positives"; expressed as a percentage.

Specificity:

The proportion of people without the disease who have a negative test result.

Interquartile range:

The range of values extending from the 25th to the 50th percentile.

Regression coefficient (b):

The rate of change in Y with a one-unit change in X.

Multiple correlation:

The relationship between one dependent variable and a weighted composite of independent variables.

Hierarchical regression:

The researcher determines the order of entry of the variables into the equation. Variables may be entered one at a time or in subsets.

Adjusted R2:

The statistical measure of how close the data are to the fitted regression line adjusted for the number of subjects and variables.

Sum of squares:

The sum of the squared deviations of each of the scores around a respective mean.

N:

The total number of participants in a study across all groups.

Multiple group comparisons:

The two most common are a priori (before the fact) and post hoc (after the fact) comparisons of group means.

Independent variable:

The variable that influences the dependent variable. In experimental designs, the treatment is manipulated.

Regression sum of squares:

The variance that is accounted for by the variables in the equation.

Graphs:

The visual representations of frequency distributions.

Control group:

Used for comparison in an experimental or quasi-experimental study.

Likelihood Ratio Test

Used to compare the goodness of fit of two models - null and alternative. The test is based on the likelihood ratio which expresses how many times more likely the data are under one model than the other.

Missing values:

Values that are missing from a variable for some participants. These values may be missing because the participant refused to answer certain questions or because certain questions do not apply to the participant (e.g., the question, "Are you pregnant?" would be missing for male study participants).

Stepwise regression:

Variables are entered into the equation based on their measured relationship to the dependent variable. Methods include forward entry, backward removal, and a combination of forward and backward called stepwise.

Within-groups variance:

Variation of scores within the respective groups; represents the error term in analysis of variance.

Central limit theorem:

When many samples are drawn from a population, the means of these samples tend to be normally distributed.

Homogeneity of variance:

When there are no significant differences in the variance of the values of the dependent variable within two or more groups that are being compared with each other. It is also called homoscedasticity.

Standard scores:

z-scores; represent the deviation of scores around the mean in a distribution with a mean of "0" and a standard deviation of "1."


Related study sets

AP Gov Supreme Court Cases - MSHS

View Set

010 - Chapter 10 - Industrialization in America in the Late 1800s

View Set

Chapter 6: Extracting Meaning from Data on the Web

View Set

Chapter 17 & 18 Blood & The Heart

View Set

Фармакология,Антибиотици

View Set

Gerontology Tabloski 1, 2, 3, 5, 8, 12

View Set

Life and Health (FL 2-15) Part 1

View Set