Data Analysis: Descriptive and Inferential Statistics -Chapter 14
unimodal
1 peak
Index of variability includes...
1. Range: highest value minus lowest value 2. Standard deviation (SD): average deviation of scores in a distribution
Inferential Statistics
1. Used to make objective decisions about population parameters using sample data 2. Based on laws of probability 3. Uses the concept of theoretical distributions
Descriptive Statistics Critical Thinking Decision Path (p.311)
1. What type of measurement is being used? 2. if NOMINAL, use frequency distribution, range, and mode 3. if ORDINAL, use range, percentile, rank order coefficients of correlation, mode and median 4. if INTERVAL, use mean, median,mode, standard division, range, and percentile 5. if RATIO, use mode, median, mean, range, percentile, and standard division
Hypothesis-Testing Procedures
1. select an appropriate test statistic. 2. stablish significance criterion (e.g., = .05). 3. compute test statistic with actual data. 4, calculate degrees of freedom (df) for the test statistic. 5. obtain a critical value for the statistical test (e.g., from a table). 6. compare the computed test statistic to the tabled value. 7. make decision to accept or reject null hypothesis.
t-test
1. tests the difference between two means 2. t-test for independent groups: between-subjects test e.g., means for men vs. women t-test for dependent (paired) groups: within-subjects test e.g., means for patients before and after surgery, differences within couples in bereavement reactions
bimodal
2 peaks
multimodal
2+ peaks
descriptive statistics
A systematic arrangement of numeric values on a variable from lowest to highest, and a count of the number of times (and/or percentage) each value was obtained Frequency distributions can be described in terms of: Shape Central tendency Variability Can be presented in a table (Ns and percentages) or graphically (e.g., frequency polygons)
Contingency Table
A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated "Cells" at intersection of rows and columns display counts and percentages Variables usually nominal or ordinal
contingency table
A two-dimensional frequency distribution; frequencies of two variables are cross-tabulated "Cells" at intersection of rows and columns display counts and percentages Variables usually nominal or ordinal
continuous variable
If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable. Some examples will clarify the difference between discrete and continuous variables. Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.
Pearson correlation coefficient (Pearson r; Pearson product moment correlation coefficient) (p.327)
Pearson's r is both a descriptive and an inferential statistic. -1-----------------0--------------------1 (negative and positive direction) tests that the relationship between two variables is not zero
Shapes of distribution
Symmetry Symmetric Skewed (asymmetric) Positive skew (long tail points to the right) Negative skew (long tail points to the left)
analysis of covariance (ANCOVA) (slide 38)
Tests the difference between more than 2 means One-way ANOVA (e.g., 3 groups) e.g., test of competency ratings among patients with 3 TBI severity levels (mild, moderate, severe) Multifactor (e.g., two-way) ANOVA e.g., test of competency ratings among pts. with 3 TBI levels and by gender Repeated measures ANOVA (RM-ANOVA): within subjects e.g., test of competency ratings among pts. with 3 TBI levels and at three different times post injury
t statistic (slide 37)
Tests the difference between two means t-test for independent groups: between-subjects test e.g., means for men vs. women, t-test for dependent (paired) groups: within-subjects test e.g., means for patients before and after surgery, differences within couples in bereavement reactions
chi-square (x²) (slide 39)
Tests the difference in proportions in categories within a contingency table Compares observed frequencies in each cell with expected frequencies—the frequencies expected if there was no relationship
Bivariate Descriptive Statistics
Used for describing the relationship between two variables in Quantitative Research only Two common approaches: Contingency tables (Crosstabs) Correlation coefficients
parameter (p.318, slide 15)
a characteristic of a population
statistic (p.318)
a characteristic of a sample
statistic
a descriptor for a population (e.g., the average age of menses for female students at McGill University)
multivariate statistics (p.328)
a form of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
example of ratio scale
a person's weight (A person who weighs 200 pounds is twice as heavy as someone who weights 100 lbs.)
factor analysis
a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables.
multiple analysis of variance (MANOVA)
a statistical test procedure for comparing multivariate (population) means of several groups. As a multivariate procedure, it is used when there are two or more dependent variables,[1] although statistical reports provide individual p-values for each dependent variable in order to test for statistical significance. It helps to answer: [2] Do changes in the independent variable(s) have significant effects on the dependent variables? What are the interactions among the dependent variables? And among the independent variables?
Fisher's exact probability test
a test used to compare frequencies when samples are small and expected frequencies are less than six in each cell
factor analysis (p.328)
a type of validity that uses a statistical procedure for determining the underlying dimensions or components of a variable
categorical variable
a variable that can take on one of a limited, and usually fixed, number of possible values, thus assigning each individual to a particular group or "category."
effect size (slide 42)
an important concept in POWER ANALYSIS, effect size indexes summarize the STRENGTHof the relationship or strength of the difference between two groups..... In a comparison of two group means (i.e., in a t-test situation), the effect size index is d, by convention: d ≤ .20, small effect d = .50, moderate effect d ≥ .80, large effect
Inferential statistics are used to...
analyze the data collected, test hypothesis, and answer the research questions in a research study.(p.310)
logistic regression
analyzes relationships between 2+ independent variables and a nominal dependent variable, yields an odds ratio—the risk of an outcome occurring given one condition, versus the risk of it occurring given a different condition OR is calculated after first removing (statistically controlling) the effects of confounding variables.
measures of variability (p.316)
answer the questions such as "is the sample homogenous or heterogenous?
standard deviation or SD (p.318)
based on the concept of the normal curve, it is a measure of the average deviation of the scores from the mean
nominal
categories that are not more or less, but are different from one another in some way; mutually exclusive and exhaustive categories
nominal (p.312)
classification, categorical; the categories are mutually exclusive (the variable either has or does not have the characteristic), the numbers assigned to each category are only labels
inferential statistics (p.310)
data-collection procedures that allow researchers to estimate how reliably they can make predictions and generalize findings based on data
measures of central tendency (p.314)
describe the pattern of response among a sample, including mean, median, and mode
type II error (p.321, slide 34)
failure to reject a null hypothesis when it should be rejected; a false-negative result, the risk of this error is beta (β).
ratio scale measurement
highest for measurement, continuum of values, absolute zero point (e.g. from sea level)
peakedness
how sharp the peak is
central tendency
index of "typicalness" of a set of scores that comes from center of the distribution
semiquartile range (p.318)
indicates the range of the middle 50% of the scores
multiple regression (p.328)
measure of the relationship between one interval level dependent variable and several independent variables, canonical correlation is used when there is more than ibe dependent variable
Descriptive techniques include....
measures of central tendency, such as mean, median, and mode; measures of variability, such as range and standard deviation (SD); and correlation techniques, such as scatter plots. (p.310)
nonparametric statistics (p.323)
not based on the estimation of population parameters, so they involve less restrictive assumptions
modality
number of peaks
dichotomous variable (p.312)
only has two true values, such as true/false or yes/no
** Tests of Relationships
p.326
** Critiquing Criteria
p.330
** Key Terms
p.332
descriptive statistics (p.310)
procedures that allow researchers to describe and summarize data
How would heterogeneity show itself in members of a class?
races, religions, ethnicity, gender, geographic area, etc.
interval (p.312)
rank ordering with EQUAL INTERVALS
ratio (p.312)
rank ordering with equal intervals and absolute zero
null hypothesis
refers to a general statement or default position that there is no relationship between two measured phenomena.
type I error (p.321)
rejection of a null hypothesis when it should not be rejected; a false-positive result, the risk of error is controlled by the level of significance (alpha)
ordinal (p.312)
relative rankings
percentile (p.318)
represents the percentage of cases the score exceeds
To compensate for the use of non probability sampling methods...
researchers employ such techniques as sample size estimation using power analysis (p.319)
interval measurement (p.313)
shows rankings of events or variables on a scale with equal intervals between numbers
ratio measurement (p.313)
shows rankings of events or variables on scales with equal intervals and absolute zeros
most frequently used measure of validity (p.318)
standard deviation
hypothesis testing involves..
statistical decision-making to either: accept the null hypothesis or reject the null hypothesis Researchers compute a test statistic with their data and then determine whether the statistic falls beyond the critical region in the relevant theoretical distribution. Values beyond the critical region indicate that the null hypothesis is improbable, at a specified probability level.
multivariate statistics
statistical procedures for analyzing relationships among 3 or more variables, two commonly used procedures in nursing research include: (1) multiple regression (2) analysis of covariance (ANCOVA)
nonparametric statistics
statistics not based on parameterized families of probability distributions. They include both descriptive and inferential statistics
normal curve (p.316)
symmetrical about the mean and unimodel; the mean, median and mode are equal
correlation
tests that the relationship between two variables is not zero.
mean (p.315)
the arithmetical average of all the scores (add all of the values in a distribution and divide by the total number of values)
sampling error (p.321)
the basis for statistical probability, even when samples are randomly selected there is a possibility of sampling error
scientific hypothesis
the initial building block in the scientific method. Many describe it as an "educated guess," based on prior knowledge and observation. While this is true, the definition can be expanded. A hypothesis also includes an explanation of why the guess may be correct, according to National Science Teachers Association.
mode (p.315)
the most frequent value in a distribution
modality (p.315)
the number of modes contained in a distribution
frequency distribution (p.313)
the number of times each event occurs
degrees of freedom (DOF)
the number of values in the final calculation of a statistic that are free to vary.
level of significance (alpha level) (p.322)
the probability of making a type I error, the probability
If p = .30, what does this mean?
the probability that my results are due to chance is 30/100 (30%)
If p < .01, what does this mean?
the probability that my results are due to chance is less than 1/100 (1%)
If p < .05...
the probability that my results are due to chance is less than 5/100 (5%)
measurement (p.310)
the process of ASSIGNING NUMBERS to variables or events according to rules, every variable in a research study that is assigned a specific number must be similar to every other variable assigned to that variable (male=1, female=2 example... females should be 1)
median (p.315)
the score where 50% of the scores are above it and 50% of the scores are below it
range (p.316)
the simplest but most unstable measure of validity, the difference between the highest and lowest scores
levels of measurement (p.311)
there are 4 levels of of measurement: 1. nominal 2. ordinal 3. interval 4. ratio the level of measurement of each variable determines the type of statistic that can be used to answer a research question or test a hypothesis
bivariate statistics
used for describing the relationship between two variables in Quantitative Research only, two common approaches include: (1) contingency tables (Crosstabs) (2) correlation coefficients
nominal measurement (p.312)
used to classify variables into variables or events, these categories are mutually exclusive
Z score (p.318)
used to compare measurements in standard units
parametric statistics (p.318)
used to estimate population parameters
analysis of covariance (ANCOVA) (p.325)
used to measure differences among group means, but it also uses a statistical technique to equate the groups under study on an important variable
multiple regression
used to predict a dependent variable based on two or more independent (predictor) variables, dependent variable is continuous (interval or ratio-level data), and predictor variables are continuous (interval or ratio) or dichotomous
ordinal measurement (p.312)
used to show relative rankings of variables or events
mutually exclusive (p.313)
variable either has or does not have the characteristic
What kind of statistics are typically normally distributed?
weight, height, IQ, etc.