Chapter 12- Analyzing Data
- estimation of parameter - hypothesis testing
Inferential statistical procedures are divided into two types
postive correlation
Correlation in which high scores for one variable are paired with high scores for the other variable, or low scores for one variable are paired with low scores for the other variable.
negative correlation
Correlation in which high scores for one variable are paired with low scores for the other variable.
probability
Likelihood that an event will occur, given all possible outcomes
negative; positive
Raw scores below the mean have a _____ deviation, whereas raw scores above the mean have a ______ deviation
Source (source of variation),SS (sums of squares), df (degrees of freedom), MS (mean square), and F (F-statistic)
5 columns in ANOVA summary
negative
A (positive/ negative) correlation reflects an inverse relationship between two variables.
central tendency; dispersion
A ________ descriptive stat will approximate the location of the center of the data while a ________ descriptive stat will depict the variability or spread
Pearson Product-Moment Correlation
A common correlational technique researchers use to examine the relationship between two variables using ratio/interval data
skewed distribution
A distribution of scores with a few outlying obser vations in either direction.
mean
A measure of central tendency (aka descriptive statistic) that is calculated by adding up all of the scores and dividing teh sum by the number of scores
mean
A measure of central tendency calculated by summing a set of scores and dividing the sum by the total number of scores
correlation
A measure that defines the relationship between two variables
level of confidence
A probability level in which the null hypothesis can be rejected with confidence and the research hypothesis can be accepted with confidence
confidence interval
A range of values that has some specified probability (e.g., 0.95 or 0.99) of including a particular population parameter.
correlation matrix
A specialized form of a correlation table that presents all possible combinations of the study variables
Analysis of variance (ANOVA)
An inferential procedure used to determine whether there is a significant difference among three or more group means
-The dependent variable being measured on an interval/ratio scale that is normally distributed in the population -Groups being mutually exclusive (independent of each other)
General assumptions associated with parametric procedures include
less
If a number of samples are randomly drawn from a target population, the mean varies (more/ less) compared with the median and mode.
larger
If the between-group variance is (smaller/larger) than the within-group variance, scores in the different groups are far enough apart for a researcher to conclude that the group means are significantly different
smaller than or about the same as
If the between-group variance is (smaller/larger) the within-group variance, the group means are not significantly different.
assumptions associated with a test include those associated with parametric procedures: - dependent variable being measured on a interval-ratio scale that is normally distributed to the population - groups being independent of each other AND - both groups have similar variances
Assumptions with a t test
homogeneity of variance
Situation in which the dependent variables do not differ significantly between or among groups.
descriptive statistics
Statistics that describe and summarize data.
inferential statistics
Statistics that generalize findings from a sample to a population.
- arrange all scores in rank order - odd # = middle number is median - even #= add the two closest middle points and divide by two
Steps in order to calculate the median
true
T/F Distribution of scores can have more than one mode
TRUE! (if an even number of socres is the data set then you have to add and divide by two)
T/F The median is not necessarily one of the scores in the distribution.
FALSE!
T/F The median is sensitive to extreme score
false (it is based on only two values in the distribution)
T/F The range is a stable measurement
true!
T/F The range is extremely sensitive to outliers
false
T/F The standard deviation (SD) is not sensitive to outliers
true
T/F The standard deviation (SD) takes into account all of the scores in the distribution
x ̄ ; ∑
The symbol that represents the mean is ______ the symbol to denote sum of values is _____
median
To describe a skewed distribution, the researcher chooses the (median/mode/mean) to give a balanced picture of the extreme scores or outliers
measures of central tendency and measures of dispersion
Two groups of procedures that are found within descriptive statistics
descriptive; inferential
Two groups of statistical procedures that can be applied to a study to describe, analyze, and interpret quantitative data
two-way ANOVA
Type of ANOVA in which there are two independent variables and several levels
one-way ANOVA
Type of ANOVA in which there is one independent variable with several levels
nonparametic
Type of Statistical Test that makes no assumptions about the shape of the distribution and are referred to as "distribution-free" tests.
- range - variance - standard deviation - percentile - interquartile range
Types of Measures of dispersion
- mean - median - mode
Types of measures of central tendency that describe the location or approximate the center of a distribution of dat
deviated scores
What are variance and standard deviation based on?
both groups must have similar variances with respect to the dependent variable
What is homogeneity of variance with t test?
always somewhere in between the mode and the mean
Where is the median located for a skewed distribution?
the larger the F stat--> the greater probability of finding a statistically significant difference amoung group means
How does the F stat relate to the probability of finding a statistically significant difference
Mean square of within group variance/ mean square of between group variance
How is the F value calculated
mean is "pulled" in the direction of those extreme values.
How is the mean affected by outliers (a data point isolated from other data points)?
examined to determine whether between-group variance is greater than within-group variance
How is varation examined in ANOVA?
on a nominal scale
How would an IV be measured in ANOVA?
separate t test (takes into account the fact that the variances are not equal)
T test that is used if violation of the assumption of homogeneity of variance excises and variances are sigfinanctly different
true
T/F Because the mode requires only a frequency count, it can be applied to any set of data at the nominal, ordinal, or interval-ratio level of measurement.
true
T/F It is not uncommon for researchers to use parametric procedures for data that are measured on an ordinal scale or data that violate the assumption of normality.
true
T/F Range does not take into account variations in scores between extremes
true (take absolute value)
T/F The direction of the relationship does not affect the strength of the relationship
true
T/F The varaince takes into account all of the scores in the distribution
true
T/F The variance is sensitive to outliers
false (its not based on ranks; nominal is things like martial status, religious affiliation)
T/F You should apply median to nominal set of data
mode
The (median/mode/mean) is always used in nominal data
range
The (range, SD, variance) although quick and simple to obtain, is not very reliable
SD
The (range, SD, variance) squares the deviated scores and returns them to their original units of measure
research design; type of research collected
The ______ and _______ determine selection of appropriate statistical procedures
estimation of parameters
Type of inferential stat that is based on data collected from a study sample, evaluated by a single number or an interval
inferential statistics
Type of statistic that focus on the process of selecting a sample and using the information to make generalizations to a population
dependent/matched t test
Type of t test that test for situations in which scores in the first group can be paired with a score in the second group and the means of the two related groups are compared
dependent/ matched t test
Type of t test used in pretest-posttest design, in which a single group of subjects is measured twice
accuracy of a statistic and test a hypothesis
What does probability help to evaluate?
how severely the assumptions are broken
What does the degree of risk for a robust procedure depend on?
Is the difference between the two sample means large enough to conclude that the population means are different from one another?
What does the statistical question become with a T test?
allows researchers to make inferences about the larger population; normally cant sample everyone in a population
What does the use of interferential stats do for a reaseracher? Why is this imporant?
the average squared deviation from the mean
What does variance mean mathmatically?
symmetrical about the mean and is unimodal
What is a normal curve?
probabilty
What is central to hypothesis testing?
you have to have #'s!!!! otehrwise you cant add or divide
What is needed to calculate the mean?
OUTLIERS! Sensitive to extreme values
What is the mean of a data set extremely sensitive to?
answer research questions and/or test hypothesis
What is the purpose of data analysis?
relative positions remain constant in moving away from the peak toward the tail (i.e. mode towards the peak, median ALWAYS in the middle, and mean towards the tail)
What remains constant regarding the mean, median, and mode for skewed data?
in experimental and quasi-experimental designs
When are t test commonly used?
nominal or ordinal data
When is Nonparametic Statistical Test usually used?
when both variables are ratio/interval measures
When is Pearson Product-Moment Correlation used?
nominal data
When is mode an appropriate measure of central tendency to use?
when the data represents either a interval or ratio scale
When is the mean the preferred measure of central tendency to use?
nominal
what sort of data does chi-square analysis use?
measures of central tendency
Subtype of Descriptive statistics that describe the location or approximate the center of a distribution of dat
Point estimation
Type of estimation of parameter in which the estimate consists of a single value
inferential statistics
Group of statistical procedures that will make generalizations about populations based on data collected from samples
If randomization is used
How can independence be met for paramertic procedures?
Each deviated score is squared, so that all numbers become positive
How do researchers stop the devation sum from being zero?
off-centered peaks and longer tails in one direction
How does a Skewed Distribution look?
outliers
A data point isolated from other data points
symmetrical distribution
A distribution of scores in which the mean, median, and mode are all the same.
median (50th percentile)
A measure of central tendency that is the middle score or midpoint of a distribution
range
A measure of variability that is the difference between the lowest and highest values in a distribution.
correlation
A measure that defines the relationship between two variables.
range
A measurement of dispersion that is the simplest. It is calculated by subtracting the lowest score in the distribution from the highest score
chi-square analysis
A nonparametric procedure used to assess whether a relationship exists between two nominal-level variables; symbolized as x^2
parameter
A numerical characteristic of a population (e.g., population mean, population standard deviation)
population; sample
A parameter is a characteristic of a ____, whereas a statistic is a characteristic of a _____.
Analysis of Variance (ANOVA)
A parametric procedure used to test whether there is a difference among three group means.
t- test
A popular parametric procedure for assessing whether two group means are significantly different from one another.
robust procedure/statistic
A statistical procedure that is appropriate even when its assumptions have been violated
F stat
A three- or four-digit number that indicates the size of the difference between the groups relative to the size of the variation within each group.
t-test
An inferential statistical procedure used to determine whether the means of two groups are significantly different
close to the low score values
Because the mean lies closest to the tail in a skewed distribution, the mean score in negitivly skewed distributions lies toward what score values?
the high score values
Because the mean lies closest to the tail in a skewed distribution, the mean score in positively skewed distributions lies toward what score values?
NO just symmetric
Can you use SD as a dispersion for skewed data?
symmetrical distribution
Data distribution in which the mean, median, and mode are all the same
skewed distribution
Data distribution that has outlying observations in either direction, above or below the mean
outlier
Data point isolated from other data points; extreme score in a data set
unimodal; bimodal
Data with a single mode are called _____; data with two modes, _____
measures of dispersion
Descriptive statistics that depict the spread or variability among a set of numerical data.
measures of central tendency
Descriptive statistics that describe the location or approximate center of a distribution of data.
variance
Despite its relative stability, the (range, SD, variance) is not widely used because it cannot be employed in many statistical analyses.
degree of freedom
Element that is often reported with a t test that represents the freedom of a score's value to vary given what is know about the other scores and the sum of scores
Descriptive statistics
Group of statistical procedures that will describe, organize, and summarize data
variance
Measure of variability, which is the average squared deviation from the mean.
parameter
Numerical characteristic of a population (e.g., population mean, population standard deviation).
takes into account each score in the distribution
One major characteristic of the mean
it DOES NOT take into account each score in the distribution
One major characteristic of the median
level of confidence
Probability level in which the research hypothesis is accepted with confidence. A 0.05 level of confidence is the standard among researchers.
robust
Referring to results from statistical analyses that are close to being valid even though the researcher does not rigidly adhere to assumptions associated with parametric procedures.
mode
Regardless of the purpose of the study, the researcher uses the (median/mode/mean) as a preliminary indicator of central tendency
level of measurement, shape, form of the distribution
Researchers choose a measure of central tendency depending on the ________, __________, or __________ of data and research objective.
Measures of dispersion
Subtype of Descriptive statistics that depict the spread or variability among a set of numerical data
0
The closer the coefficient is to ___, the lower or weaker the correlation
-1 or +1
The closer the coefficient is to ____, the higher or stronger the correlation.
deviated score
The difference between a raw score and the mean of that distribution (x2 = X - X)
number of groups being compared (ANOVA is 3 or more; t-test is 2)
The difference between the t-test and ANOVA
between-group variance
The distance among group means
probability (p)
The likelihood that an event will occur, given all possible outcomes
validity
The magnitude of p does not indicate the amount of _____ associated with each research hypothesis
less powerful than parametric tests
The major drawback to nonparametric tests
do not
The mean, median, and mode (do /do not) coincide in skewed distributions
do
The mean, median, and mode (do /do not) coincide in symmetrical distributions
mode
The measure of central tendency that is the most frequently occurring score in a distribution
median (because it always falls between the median and the mode)
The most appropriate measure of central tendency for describing a skewed distribution
standard deviation (SD)
The most commonly reported measure of dispersion and the most stable
chi sqaure
The most commonly reported nonparametric statistic that compares the frequency of an observed occurrence (actual number in each category) with the frequency of an expected occurrence (based on theory or past experience).
standard deviation (SD)
The most frequently used measure of variability; the distance a score varies from the mean.
single value in the F column
The most important number in the ANOVA summary column
MEAN !
The most stable/ reliable measure of central tendency and is the most commonly used
-1.0 - +1.0
The range for correlation coefficient
mode
The score or value that occurs most frequently in a distribution; a measure of central tendency used most often with nominal-level data.
hypothesis testing
The second type of inferential statistical procedure which involves concerning the parameter, sampling the population of interest, and making objective decisions about the sample results of the study.
standard deviation
The square root of the variance
0
The sum of the deviated scores in a distribution always equal to...
variance
The sum of the squared deviations divided by the number of scores
parametric statistical test
Type of Statistical Test that require assumptions to be met for statistical findings to be valid
independent t test
Type of T test that is used when scores in one group have no logical relationship with scores in the other group.
negative correlation (0-1-)
Type of correlation in which low scores on one variable are paired with high scores on the other variable
postive correlation (0-1+)
Type of correlation in which shows that high scores on one variable are paired with high scores on the other variable
interval estimate
Type of estimation of parameter in which the estimate consisting of an interval; need to specify the amount of confidence
when the data is sweked distrubution meaning that is has an outlier in either direction (because the median does not take the outliers that skew the data into account as much as the other measures would)
When is the median a appropriate measure of central tendency to use?
level of confidence
When looking at probability, what must be established in order to determine if the outcome is statically significant
closest to the tail of the curve (where the relatively few extreme scores are located; the mean is an average and the average will be thrown off accordingly by the outlier)
Where is the mean located for a skewed distribution?
closest to the peak of the curve (this is where the most frequent scores are found and the mode is a measure of the most frequent number)
Where is the mode located for a skewed distribution?
median
Which central tendency is a ordinal statistic based on ranks?
Used to assess whether a relationship exists between 2 categorical values (used for variables that are on a nominal scale)
Why is chi-sqaure used?
it takes into account every single score in the distribution
Why is the mean a more precise and stable measure than the median or mode?
Statistics
a descriptive index calculated from data as an estimate of a population parameter