Test 3 ch. 18-27

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Type 1 error

false positive The null hypothesis is rejected when it is true. A real difference exists when in reality it is due to chance Type I error occurs when we erroneously reject the null hypothesis, due to sampling or statistical error. We assert that our research hypothesis was supported but it was not in actuality -The alpha level is the max acceptable risk of making a type 1 error

How to determine which t-test to use

scores from the two means from the same subjects --yes: Paired t-test --no: Are there the same number of people in the two groups?? --yes: equal variance --no: are the variances of the two groups different? --yes: unequal variance --no:equal variance

The more the variables overlap...

the more shared variance they have and the more they are interwoven into each other

What are the types of parametric tests

unpaired t-test (two independent groups) Paired t-test (two related scores) One Way ANOVA (Three or more independent groups) One way repeated measures ANOVA (three or more related scores)

F-statistic

-A ratio of the between groups variance to the error variance (or within group variance) -When the null is true and no treatment effect exists, the total variance in a sample is due to error, and the within group variance > between group variance = F-stat < 1.0. -When the null is false and the treatment effect is significant, the between group variance is large... within group variance < between group variance = F-stat > 1.0. -The larger the F-stat the greater the variability between the group means. -When theres a larger number on top and a smaller number on bottom you will have a larger f-stat.

Power efficiency

-A statistical test's relative ability to identify significant differences for a given sample size. -Usually it takes an increase in sample size to make a nonparametric test as powerful as a parametric one. -Power efficiency is expressed as a percentage. -Examples: a nonpara test may need a sample size of 100 to achieve the same degree power as a parametric test with a sample size of 50. Therefore, we say it has a power efficiency of 50%. -A power of 70% means to achieve equal power with the nonpara test, we would need 10 subjects for every 7 under the parametric analysis

What does a t-test do?

-A t-test allows us to compare the means of two groups, and determine how likely the difference between them is due to chance only (rather than to the "treatment" (Ind. Var.)

Labeling an ICC model

-Based on single vs average ratings --Most reliability studies are based on comparison of scores from individual raters, but there are times when the mean of several raters or ratings may be used as the unit of reliability (e.g., w/unstable measurements) -Classified using two numbers in parentheses --The first number is the model (1, 2, or 3) --The second number indicates whether it is a single measurement or avg of several measurements (k) --Example: Model (2,k), Model (3,1), Model (1,k) --Must always indicate the type of ICC being used

ICC advantages

-Can assess reliability among two or more sets of ratings giving it broad clinical applicability -Does not require the same number of raters for each subject → provides more flexibility in clinical studies -Designed for interval/ratio data, but can be applied to ordinal scale when intervals are assumed equal -Can also be used with dichotomous variables (e.g., presence or absence of impairment) -There are three ICC models, but six different "formulas"; base selection on purpose of the study, study design, and type of measurements (DVs)

What is a correlational relationship?

-Changes in one variable accompany changes in another -Such covariation does not mean that changes in one variable cause changes in another

Independent sample t-test

-Compares the averages of two samples that are selected independently of each other (the subjects in the two groups are not the same people... one group has 10 people and another has 9) -When to use: Statistical difference between two groups Statistical difference between two interventions Statistical difference between two change scores -When to not use: comparing more than 2 groups, continuous outcome that is not normally distributed, ordinal/ranked data

What to consider when choosing a nonparametric test

-Data/measures DO NOT appear to have population normality and homogeneity of variance; -Variables may not have been studied sufficiently to support the above assumptions -Nonparametric tests require fewer statistical assumptions -Small clinical samples or convenience samples cannot automatically be assumed to be representative of a larger normal distribution -Some pathological populations have "built in" skew, (e.g. the population of people with Parkinson's on a motor coordination test) -Nonparametric tests can also be appropriate for interval and ratio data when distributions are skewed or w/ a small sample -ordinal data: MMT, FIM scores

Confidence interval

-Describes the range of scores along the normal distribution of a sample, wherein the "true" population mean is contained. -The "wiggle room" to allow you to be in the range of what the population mean should be -Based on the sample mean and its standard error (standard deviation = SD). -A 95% CI is 2.0 SD wide; it is +/- 1.0 SD's on either side of the sample mean. Expressed as probability % "We are 95% sure the population mean will fall within this interval" So if m=40, and s= 1.96, the CI is from 37.33 to 42.67 -Formula: CI = X +/- (z) sX --Z-score for 95th = 1.96 --Z-score for 99th = 2.576 -The confidence limits get wider as the confidence level increases (99th confidence interval is 3 SD wide) -The choice of 95th or 99th depends on the nature of the variables being studied and the researcher's desired level of accuracy. If they want to be more accurate, the use the 99th percentile

effect size

-Distribution-based approach used to assess the degree of change in a group of patients to determine the effectiveness of an intervention and to generalize the results to others -Effect Size --Standardized measure of change from baseline to final measurement --Provides information on the magnitude of change in standardized units --Not affected by sample but may vary among samples with different baseline variability --Evaluated using the following: Mean change/SD of baseline scores = < 0.20=small change; 0.50-0.80 = moderate change, and > 0.80 = large change

Ruling in and ruling out

-High levels of sensitivity and specificity provide a certain level of confidence in interpretation SpPin (Spin Rules In): With high Specificity, a Positive test rules IN the diagnosis --A highly specific test will properly identify most of the patients who do not have the disorder, so if it is so good at finding those who are "normal" then someone who gets a "+" result likely does have it -SnNout (Snout Rules Out): With high Sensitivity, a Negative test rules OUT the diagnosis --A highly sensitive test will properly identify most of the patients who do have the disorder, so if it is so good at finding those who do have the condition then someone who gets a "-" result likely does not have it

How to know which test when looking at SPSS?

-Independent Samples t-test Equal Variances = Equal variances assumed = Levene's t-test significance result will be > 0.05 meaning no significant difference between groups; therefore there is no variance between the groups -Independent Samples t-test Unequal Variances = Equal variances not assumed = Levene's t-test significance result will be < 0.05 meaning there is a significant difference between groups

Low ICC.. why?

-Low ICC usually due to rater error -One reason: Raters do not agree --ICC is an average based on variance across all raters, so nonagreement could involve all raters, some raters, or only one rater --Think of it as a correlation across all raters and not the reliability of any individual rater --There could be an interaction effect between a rater and a subject -Second reason: Variance --The variability of subject scores is small; remember ↑variability can contribute to ↑ reliability --Decreased variability can come from a very homogeneous sample, when raters are very lenient or strict, or when the rating system falls within a restricted range

Choosing an ICC model

-Must select before you implement your study If you are doing instrument development research, you would select Model 2 because you want to demonstrate that your tool can be used by other therapists with confidence by all equally-trained clinicians. -If it is your thesis group and you are administering a performance-based measure everyone needs to be able to administer and score the measure, so you would go with Model 3 for inter-rater reliability -Model 1 seldom used; Example- want to establish reliability among raters who are "doing an important job in a process we need to have confidence in (e.g., RCT grant reviewers)

measuring change

-Need to document change in meaningful ways for ourselves, 3rd party payers, patients, and referral sources. -Qualitative descriptions of change not acceptable. -Responsiveness: The ability of an instrument to measure true clinical change --Ratio of "signal (true change) to noise (error/variability)" of tool.

What are the major features of a correlational relationship?

-No independent variables are manipulated -Two or more dependent variables are measured, and a relationship is established -Correlational relationships can be used for predictive purposes --A PREDICTOR VARIABLE can be used to predict the value of a CRITERION VARIABLE -Correlational research cannot be used to establish causal relationships among variables

non parametric data

-Not as reflective of the population -uses nominal and ordinal data -used when the assumptions for parametric tests are not met

Intraclass correlation coefficient

-Not exactly the same as a correlation coefficient (CC), differs on --CC explains shared variance not agreement to examine relationships --CC is bivariate and can only correlate two ratings or raters at a time --CC has limitations on how variance is calculated (r vs r2) -ICC has same range of possible scores (0.00 to 1.00) -Calculated using variance estimates obtained through ANOVA; thus reflects BOTH degree of correlation and agreement (reproducibility)

What is a causal relationship?

-One variable directly or indirectly causes changes in another -Can be unidirectional (Changes in A cause changes in B, but not the other way around) -Can be bidirectional (Changes in A cause changes, but changes in B cause changes in A)

What are the 3 types of t-test

-Paired t-test (correlated or dependent t-test): --IE pre and post tests -unpaired t-test (independent sample t-test): --equal variance --unequal variance

pre test/post test probabilities

-Pre-Test Probability (Before Testing): Our "best guess" based on history-taking, preliminary screening, or other subjective procedures to begin to rule in or out certain conditions --This is what we think may be the problem before any formal testing --Can help you decide what you should go onto do for further testing -Post-Test Probability (After Testing): A good test has high post-test probability when there is a high likelihood that a diagnosis/condition are confirmed after formal testing

Interpreting ICC

-Range 0.00 to 1.00 -Depends on the nature of the measured variable in terms of its' stability and precision required to make sound clinical judgments -No absolute guideline for standard value; suggested r of > .75 = good reliability, and < .75 poor to moderate reliability; however, for many clinical measurements you may want r > .90 but .90 up may be needed in some cases

Homogeneity of variance

-SPSS has a test ("Levene's Test") that it will automatically conduct on the data, that compares the two groups for homogeneity of variance. -SPSS will run the Levene's Test and show you your results in the same table. You have to interpret the p value (significance) to be able to decide if you should use the results for the unequal variance test or the equal variance test. -looking to make sure that the variance between two groups that possibly exists is homogeneous

Wilcoxin-Signed Ranks Test

-Similar to sign-rank test but includes both the direction ( + or -) of the difference AND the relative magnitude of the difference -Usually an ordinal scale with >2 values -You do basic arithmetic to add or subtract the side-by-side paired values and write down the answer. A pair of same scores is counted as 0 and are removed from the analysis, thus reducing the N -Does not matter which column is used as the reference....just once you use the same order consistently -Null = expect half the differences to be positive and half to be negative; You reject the null if the sum of the negative and positive ranks is not equal/the same (i.e., if either sum is too small)

Restriction of use of nonparametric test

-Some type of randomization procedure used in forming groups -Can't be adapted to sophisticated clinical designs -Data at least at ordinal level; underlying distribution that can be ranked in some way --Examples: Likert scales: meas. of strength; opinion scales "Strongly Disagree Strongly Agree; -Nominal scale, use Chi Square (upcoming lecture!) -Less sensitive than parametric tests due to use of ranking of values, rather than "true" measure. -Many researchers do apply parametric tests to ordinal measures, especially to Total Scores (but has more variance.) -Nonparametrics not as powerful, may need larger sample size to get same level of power as parametric

What are the two types of "r"s

-Statistical test for parametric data (interval and ratio) is the Pearson Product-Moment Correlation or "Pearson r." -Statistical test for non-parametric data (ordinal) data is the Spearman Rank Order Correlation or "Spearman rho."

standard error of the mean

-Tells you how accurate your estimate of the mean is likely to be. -When you take a sample of observations from a population and calculate the sample mean, you are estimating the parametric mean, or mean of all of the individuals in the population. -Your sample mean won't be exactly equal to the parametric mean that you're trying to estimate, but you'll have an idea of how close your sample mean is likely to be. -If your sample size is small, your estimate of the mean won't be as good as an estimate based on a larger sample size. Formula: sx = s/ square root of n (sample size):

Sampling error

-There will always be some error in a study, no matter how well-designed. -The difference between the sample mean minus the population mean -Our duty is to minimize it; then provide a measure of what it probably was. -Sampling error is the degree to which sample values differ from actual population values. -Sampling error of the mean for any single sample is the sample mean - population mean -The greater the sampling error, the less accurate/less reflective the sample is an estimate of the population. -solution for sampling error is to recruit a larger sample and a more reflective sample of the target population (inclusion/exclusion criteria

One vs. Two tailed test

-Two-tailed test says, "I don't know how these groups are going to be different; I just know they will be different in some way. (y "not equal" to x). --You have results that falling into both extremes of the bell curve -One-tailed test says, "I think the direction of the differences is going to be y > x; or x>y. -Tailed = bell curve

correlation coefficients

-Usually expressed as a two place decimal number, such as r=-.23, r=.82 -A mathematical calculation of the distances of each data point from the +/- 1.0 slope of the line. -The closer to the line this calculation reaches, the higher the coefficient. -The more "scatter" away from the line, the lower the coefficient.

Kruskal Wallis One Way Anova by ranks

-When 3 or more groups are compared (k>3) -Nonparametric version of a one-way ANOVA -Generates an H-statistic, which is same thing generated for a chi-square (you will see Chi-square in the SPSS output and not H-statistic) Example (Table 22.4): Effect of three modalities (ice, heat, ultrasound) for relief of chronic back pain Scores reflect change scores for pain level from pre- tx to post-tx. Null = Equal distribution of ranks under the three conditions (i.e., there is no difference in mean rank pain level change scores across the three conditions)

When is correlational research used?

-When gathering data in the early stages of research -When manipulating an independent variable is impossible or unethical -When you are relating two or more naturally occurring variables

ANOVA (analysis of variance)

-an inferential statistical test for comparing the means of three or more groups -test difference between means, for normally distributed variables. -A one way ANOVA has one IV or factor with three or more levels -Does not tell which groups are significantly different, but just that they are different -If the results of your ANOVA are non significant there is no need to go any further with the multiple comparisons test

3 things you must be able to interpret in the correlation

-be able to interpret strength, direction, and significance

Type 2 error

-false negative -The null hypothesis is not rejected when it is false. Differences are due to chance but in fact they are not -Type II error occurs when we have erroneously accepted the Ho (and rejected our H1) when in fact the null should have been rejected. We were right, but we chalked it up to chance probability when it was not -This can lead to us ignoring an effective treatment.

phi coefficient

-going to be used when looking at correlating dichotomies (male-female; when both x and y are dichotomous

Unequal variance independent sample t-test

-is used when the number of subjects in the two groups is different OR they could be the same AND the variance of the two groups is different.

Equal variance independent sample t-test

-t-test is used when the number of subjects in the two groups (conditions or measurements) is the same OR the variance of the two groups is similar. -pooled variance

Parametric tests

-tests that represent the population -allow us more strength in the data analysis- they use scaled data (ratio or interval data) -based on assumptions: samples are randomly drawn from population with normal distribution and variances in samples are homogeneous

point biserial correlation

-used when one variable is dichotomous and the other is continuous -something like the size of the CVA and the degree of spasticity

What 3 pieces of information do you need to calculate a t-test?

1. The differences between the means; 2. The standard deviation for each group (away from the mean) 3. Number of subjects in each group Use these statistics to compute degrees of freedom (df) (i.e. The # of variables within a distribution that are free to vary, usually N-1.)

curvilinear relationship

A relationship in which increases in the values of the first variable are accompanied by both increases and decreases in the values of the second variable.The scatterplot may be curved because as X increases, Y starts out lower, increases, and then decreases again.

What is the minimum risk of social science research accepted

Alpha level .05

Statistical vs. Clinical Significance

Clinical significance can be considered for: p values = >.05 to < .10 (or equal to) --This means a p value of 0.07 won't have statistical significance, but you can infer clinical significance

Paired sample t-test

Concerned with difference between mean scores of a single sample of subjects, measured at two different times (such as before and after treatment); or on two different measures. Can also compare average scores of subjects who are paired in some way (i.e. siblings, matched on some characteristic.) When to use: Statistical difference between two time points Statistical difference between two conditions Statistical difference between two measurements Statistical difference between a matched pair When you cannot use it: unpaired data, comparing more than 2 groups, when a continuous outcome is not normally distributed, with ordinal/ranked data

Correlation with a scattered plot

Each dot represents one person. r=reliability coefficient -positive relationship: As A goes up, B goes up and the dots are close together -negative correlation= As a goes up (x axis), B goes down (y axis) r=1.00 is a strong correlation, r=.00 is a poor correlation

the statistic for an ANOVA is called

F-statistic

correlational causes causation

FALSE correlation only looks at the association between two variables. It is a mirror image relationship, no causal

Mann-Whitney U Test

For 2 independent samples Tests null hyp. that two indep. Samples come from same population Analogous to parametric t-test for independent samples. U test just like unpaired t-test does not require groups to be of same size. In this analysis R is the sum of the ranks of a group (R1=group 1, R2= group 2) The first step is to combine both groups and rank all scores in order of increasing size

Sign Test

For testing difference between two correlated samples Analogous to parametric t-test for correlated or paired samples Useful when quantification is impossible or unfeasible and when subjective ratings are necessary Very simple (no math involved - all you use is a + or - symbol/sign); Uses subjective binomial data (more-less; higher-lower; larger-smaller) Null = expect half the differences in rank to be positive and half to be negative; You reject the null if there is a significant difference between the negative and positive ranks You simply count the number of "+" signs and "-" signs; Any ties are counted as 0 and are removed from the analysis, thus reducing the N

how to know if there is generally a relationship in correlations

Generally 0.0 to .25 = little or no relation-ship; .25 to .50 = fair relationship; .50 to .75 moderate to good; .75 to .99 good to substantial

What are inferential statistics?

Inferential statistics are designed to address objectives, questions, and hypotheses in studies to allow inference from the study sample to the target population. They help us to: identify relationships examine predictions determine differences among groups We are drawing "conclusion" about the population based on sample data. "making inferences about what will happen"

What are the types of nonparametric tests

Mann-Whitney U Test (two independent groups-unpaired t-test) Sign Test & Wilcoxin signed-ranks tests (two related scores-paired t-test) Kruskal Wallis ANOVA by ranks (Three or more independent groups-One way ANOVA) Friedman two way ANOVA by ranks (three or more related scores)

Use of validity "numbers"

Measurement validity is essential for making evidence-based decisions in clinical practice. Used for ensuring diagnostic accuracy Methods for evaluating change, & progress related to treatment Confusion about terminology, many statements used to signify the same thing Clinicians must appraise information for its applicability to their patients. Affects definitions of "clinical significance" vs. "statistical significance." "How much better is "better"?

ICC random and fixed effects model 1

Model 1 (limited use) -Each subject assessed by different set of k (#) raters across different trials. -Rater is considered random effect (considered randomly chosen from larger population of raters) -Since rater is different, you are assessing variance or difference among subjects

ICC random and fixed effect model 2

Model 2 (most commonly used) -Used to establish inter-rater reliability particularly with development and testing of new measure -Each subject assessed by same set of raters -Raters and subjects are random (theoretically) effects and results can be generalized to other raters with similar characteristics

ICC random effects model 3

Model 3 (most applies to thesis groups) -Each subject is assessed by the same set of raters but the raters represent the only raters of interest -No intent to generalize beyond raters involved; -Used when investigator wants to establish that specific raters were reliable in their data collection (e.g., our thesis group) -A "mixed" model because "Rater" is considered a fixed effect (they were purposely selected), but subjects still considered random effect -Also appropriate for intra-rater reliability as ratings of single rater cannot be generalized to other raters

Procedure for ranking score

Nonparametric tests based on score ranking Smallest to largest, "1" goes to smallest Highest rank = N (i.e., sample size) In case of a tie score, take those ranks (e.g., rank 3, 4, 5) and then divide by # of scores (e.g., 3). This value now becomes the rank for these three scores. The next highest rank is the next number up If 5 position is shared, 5.5, 5.5, next rank up is 7. Yes, SPSS does this for us! But w/o SPSS, it's possible to do nonparametric analyses by hand

ICC models (models are accorded to how the raters are chosen and assigned to subjects)

PRIMARY MODEL -random and fixed effects (model 1, model 2, model 3) -One way ANOVA (model 1) -repeated measures ANOVA (model 2, model 3)

When to use a one way ANOVA

Statistical difference between two or more groups Statistical difference between two or more interventions Statistical difference between two or more change scores Assumptions (requirements): -Dependent variable that is continuous (i.e., interval or ratio level) -Independent samples/groups (i.e., independence of observations); -There is no relationship between the subjects in each sample. -Random sample of data from the population -Normal distribution (approximately) of the dependent variable for each group ---Homogeneity of variances (i.e., variances approximately equal across groups); When this assumption is violated and the sample sizes for each group differ, must defer to post hoc tests that do not assume equal variances -No outliers

Why should you not infer causality from correlational data?

Third-variable problem --There may be an unmeasured variable that actually causes changes in observed behavior Directionality problem --Not always possible to specify the direction in which a causal arrow points

Probability

Used as a means of prediction. Represents what should happen; not what will happen. Uses normal distribution of the characteristic in order to predict.

What does correlation ask?

What is the relation between A and B? B and C? Always a pairwise observation

z-score

a measure of how many standard deviations you are away from the norm (average or mean) -distance between the score and the sample mean divided by the SD -Used to calculate confidence intervals


Ensembles d'études connexes

Cialdini's Six Principles of Persuasion

View Set

Quiz 8: Assessing firm performance

View Set

ATI RN Learning System 3.0: Maternal Newborn 1

View Set

Mastering Quiz: Chapter 8 Recombinant DNA Technology

View Set

SMO 311- Management of Human Resources - Midterm #1

View Set

Kings of ISRAEL and JUDAH and their characteristics

View Set