exam2

¡Supera tus tareas y exámenes ahora con Quizwiz!

notes

-linear reg: trying to predict b from a (predicting weight from height, is it possible? see if correlated (if not correlated, dont run regression), then look at anova to see if signif, then look at predictor equation -multiple reg: tell it to do certain variables ahead of others -regressions take all data points and creates line of best fit -r^2 is amount of variance explained by predictor variable -multiple variables will have multiple r^2 and explain %s of the variance -can be used to calculate ORs

(CH9: HOW DO WE FIND DATA SOURCES?) Sampling

a key component of a research project. If not done properly, research results can become meaningless. -Sampling techniques are numerous. Some of the more common Probability sampling techniques include the following: -convenience -cluster -random -stratified -systematic -voluntary response -snowball

Friedman Test

nonparametric version of a one-way ANOVA with repeated measures

DELETE

delete

Mann-Whitney U Test

(Mann-Whitney-Wilcoxon, Wilcoxon rank-sum test, or Wilcoxon-Mann-Whitney test). A nonparametric version of a t-test.

CORRELATED T-TEST

(Paired sample, repeated measures) The paired-samples t-test is used to determine whether the mean difference between paired observations is statistically different. >see 3 tables; Refer back to the T-Table. With df = 30, the t-statistic must equal or exceed 2.043 to be significant at the two-tailed .05 level.

Articles: What Does the Law Say about Relative Risks?

(this shows that just bc stat signif, doesnt mean youre going to buy it) Relative Risks (RR) and Odds Ratios (OR) are deemed significant at the .05 level if the RR and/or OR is above 1.0 and the 95% confidence intervals (CI) are both above 1.0; or, RR and/or OR is below 1.0 and the 95% confidence intervals (CI) are both below1.0. However, an article stated that: "plaintiffs must offer a statistically significant study, with a risk ratio (RR) greater than 2, to show general causation. A RR ≤ 2 can be a strong practical argument against specific causation in many cases. "we are looking for a relative risk of 3 or more, particularly if it is biologically implausible or if it's a brand new finding." "If you see a 10-fold RR and it's replicated and it's a good study with biological backup, like we have with cigarettes and lung cancer, you can draw a strong inference. If it's a 1.5 relative risk, and it's only one study and even a very good one, you scratch your chin and say maybe." "suggesting that a study should show a four-fold increased risk, while another says suggesting that a single epidemiologic study would not be persuasive unless the lower limit of its 95% confidence interval exclude 3.0" "relative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor, those over 5.0 are unlikely to do so."

Correlation Coefficient

-A correlation analysis produces a correlation coefficient (r). -Coefficient Value vs Strength of Association: >0.1 - 0.3=small correlation >0.3 - 0.5=medium/moderate correlation >greater than 0.5=large/strong correlation

In Order For Regression to be Useful:

-Must have a moderate to strong correlation ( .6 to .7) -Must have a significant ANOVA.

COMMON DEFINITIONS AND FORMULAS ASSOCIATED WITH BIOSTATISTICS

-TP: true positive -TN: true negative -FP: false positive -FN: false negative -Sensitivity (SENS) = TP / (TP+FN) -Specificity (SPEC) = TN / (FP+TN) -Prevalence = (TP+FN) / (TP+FN+FP+TN) -Predictive value positive = TP / (TP+FP) -Predictive value negative = TN / (FN+TN) -Positive Likelihood Ratio = SENS / (1-SPEC) -Negative Likelihood Ratio = (1-SENS) / SPEC -Pre-test Probability = Prevalence -Pre-test Odds = Pre-test Prob / (1 - Pre-test Prob) -Post-test Odds = Pre-test Odds x Likelihood Ratio -Post-test Probability = Post-test Odds / (1 + Post-test Odds)

Pearson Correlation SPSS Output

-The Kolmogorov-Smirnov and Shapiro-Wilk tests are used to determine if the data sets have a normal shape. -The result of the Shapiro-Wilk test is most commonly used. -Examine the Sig. column. These values indicate the p values. -Note that all p values are greater than .05, indicating no significant deviation from normal. -notes: K is used if data set is 50+, S is used if less than 50

(CH12)

-see examples in literature if want; last pic shows that by picking ball #23 or 30, you already have significance to begin with...tis is why a lot of false positives occur (you may have just had a weird sample, even tho normal distributed, and still find signif)

Chi-Square for Association Output:

-see table in book >The Chi-square statistic is 5.33. The P value is 0.021. This result is significant at p < 0.05. >Observed frequencies were: Males: 30 for video 1; 10 for video 2; Females: 20 for video 1; 20 for video 2. >Expected frequencies: To get the expected values, multiple the row total (40) by the column total (50), and then divide by the overall total (80). >How this result may appear in the literature: χ2 (1) = 5.201, p = .021. >test q: what kind of test is this? Chi square; how many groups? 2 (bc df=n-1) chi statistic? 5.20, tells us nothing; pval? 0.021; is it signif at 0.5? yes, signif at 0.01? no

Chi Square Output (SPSS)

-see table in book The Pearson Chi Square result indicates a two-tailed p value of .021.

Pearson Correlation SPSS Output: Positive Correlation

-see table in book The correlation coefficient = .522. This is a positive correlation.10.38 The two-tailed p value associated with this correlation = .001 (Note that it is common to report out to three decimal places with the rounding up of the third decimal from zero to one.

Interpretation of Effect Size

0.2 = small 0.5 = medium 0.8 = large Reporting: t = 3.00 (30), p=.01, d = .566

practice qs:

1. Research involving observation of subjects in natural settings.: QUALITATIVE 2. A value describing the amount of spread in a data set.:VARIANCE 3. Variables that are on the interval or ratio level. CONTINUOUS VARIABLE 4. Statistical tests designed to compare mean differences after a significant analysis of variance is found.:POST HOC TESTS -True or False?: 5.Alpha (α) is the probability of a Type I error given that the null hypothesis is true. TRUE 6.The p-value is compared to the alpha level set by the researcher. : TRUE 7. Given a z distribution and a two tailed approach, with alpha set at .05, 95 percent of the samples drawn from the population will fall between +- 2.0 standard deviations. FALSE

z-score

A measure of how many standard deviations an individual data point is from the mean.

Wilcoxon's signed rank test

A non-parametric version of the paired t-test.

bonferroni post hoc

A type of post hoc test.

Reliability Co-efficient

A value indicating the repeatability of a data set.

Levene's Test

An F-test used to determine if the variances from data sets are significantly different.

Bonferroni adjustment

An adjustment to the Alpha level due to multiple statistical tests applied to the same sample. The alpha level is divided by the # of statistical analyses being conducted.

ANOVA

Analysis of variance. Examines variations btw 2 or more sets of interval or ratio data.

RELATIVE RISK:

Comparison of the probability of an event occurring in different groups. (The probability of drawing an ace out of a deck of cards is 4/52.)

EXAMPLE OF INDEPENDENT T-TEST

Continuous variables were analyzed by means of Student's t-test and the Wilcoxon rank-sum test. Categorical variables were analyzed with the use of Fisher's exact test, and the Holm procedure was used to correct for testing associations with multiple clinical variables. Sensitivity, specificity, and negative and positive predictive values were calculated with the use of established methods. Two-sided P values of less than 0.05 were considered to indicate statistical significance. Confidence intervals for proportions are reported as two-sided exact binomial 95% confidence intervals

Adjusted OR

Most ORs are generated through logistical regression. If confounding variables are suspected, the linear regression can be programmed to treat these variables as covariates. An adjusted OR has taken into account the confounding variables identified by the researcher.

Stem and Leaf Plot

Notice the sideways shape of the data. Make a visual assessment as to whether or not this shape resembles a normal distribution. Examples: Frequency Stem & Leaf 1.00 1 . 4 5.00 1 . 56789 3.00 2 . 023 1.00 2 . 8 Stem width: 100.00 Each leaf: 1 case(s) Frequency Stem & Leaf 1.00 0 . 9 3.00 1 . 034 5.00 1 . 56789 1.00 2 . 0 Stem width: 100.00 Each leaf: case(s)

What Does the Law Say about Relative Risks?

Numerous sources infer that OR's and RR's should be at least 2.0 before being considered as an indication of a relationship.

Kruskal-Wallis Test

The non-parametric equivalent to the one-way ANOVA.

Calculating Effect Size

To calculate an effect size (d or Cohen's d) for a paired-samples t-test: divide the mean difference by the standard deviation of the difference. d = M / sd d = 3.0 / 5.3 = .566

Dependent T-Test

Used to compare two related samples -see book for forumla

interval/ratio scale

Values that express consistent magnitude between each value. Ratio data possess an absolute zero value. Interval data do not possess an absolute zero.

EXAMPLE 2: Calculating Sensitivity, specificity, positive and negative predictive value for this example. "Does 5 degrees of trunk angle indicate the presence of scoliosis? Gold standard revealed that 47% really had scoliosis."

a(TP): 150 b(FP): 130 c(FN): 10 d(TN): 50 160 subjects really had scoliosis. 180 subjects did not have scoliosis. Sensitivity: 150/160 = .94 Specificity: 50/180 = .28 PPV: 150/280 = .54 Only about 1/2 of those testing positive actually have scoliosis. NPV: 50/60 = .83 About 4/5 who tested negative were w/o scoliosis.

cronbach alpha reliability coefficient

coefficient of reliability relating to the internal consistency of a questionnaire or test.

notes

more data points you have, the tinier differences/zscores? you will see, but they will still be signif; the more you do something the more you regress to the mean -test Q: if had binomial dist 59/41, and 100 trials vs 59/41 w/ 1000 trials, which is more likely to be stat signif? the one with 1000 trials (bc w/ more trials it should regress more to the mean of 50/50)

Example of ANOVA results

see table >Results indicate a significant F value with p = .001 >This could be reported as: F=16, df (2,35), p = .001 >notes: were interested in the between groups; this says theres is a signif difference btw groups, but doesn't say which groups (could be one, or even all three); so do post hoc test(ex: tukey, scheffe, dunn, newman-juel, and bonferroni)

Test of Homogeneity of Variances

see table >This result indicates that the p value (sig) = .199. >Since p is not less than .05, we conclude that the variances are not significantly different, and we can proceed with the parametric version of the one-way ANOVA.

EXAMPLE: No Differences Between Groups

see table Where, F = ANOVA Coefficient = MST / MSE MST = Mean sum of squares due to treatment MSE = Mean sum of squares due to error. df = With ANOVA there are two degrees of freedom. The between groups df refer to the number of data sets - 1. The within groups degrees of freedom is based on a formula using the number of subjects. •These results indicate the probability of obtaining the observed F-value if the null hypothesis is true. •The p-value exceeds .05. Therefore, we do not have reason to suspect mean differences other than by chance.

EXAMPLE 2: Significant Difference is Present

see table and Fstat chart >These results indicate the probability of obtaining the observed F-value if the null hypothesis is true. >The p-value is lower than .05. Therefore, we have reason to suspect mean differences may not be due to chance alone. >Reporting these results: F(2, 38) = 16, p < .001) The following table illustrates the relationship between degrees of freedom and significant F values. >notes: >F table doesn't give you option (have to set alpha at 0.05); compare df of numerator and denominator(# subjects); if have 38 cant go to 40, have to go LOWER to 30 (can never go higher); tells you value that Fstat has to equal or exceed to be significant; the more subjects, the smaller the Fstat can be to still be significant

see

survey handout

Linear Regression

technique for modeling the relationship btw a dependent variable (y) and one or more explanatory variables (X).

Regression

test that determines the degree of relationship btw variables.

F statistic

test to determine the relationship btw the variances of 2 or more samples.

Kolmogorov-Smirnov Test

test used to determine if the shape of a data set approaches a normal distribution. Used when the # of subjects is greater than 50

Shapiro-Wilk

test used to determine if the shape of a data set approaches a normal distribution. Used when the number of subjects is less than 50

dependent variable

variable that is being measured.

Random sample

when members of a pop have an equal chance at being selected. Random # generators are often used to select subjects.

Cluster sampling

when the pop is divided into clusters/groups. Clusters are then randomly selected.

Normal Q-Q Plots

• Make a visual assessment about the relationship between the data points and the line. Are the data points close to the line or scattered away from the line? -notes: line of best fit produced by computer...look at how data points fit into line or if there are outliers; xaxis is observed values, y axis is expected/normal values

REPEATED MEASURES ANOVA

•A repeated measures ANOVA is used when three or more data sets are being compared, and repeated measurement accounts for the data. •Must test for the assumption of sphericity. •Sphericity indicates the degree of within-subjects factor variance. (to see if variances differ significantly) •If the sphericity is violated, the degrees of freedom will need to be adjusted.

Sphericity Is Violated

•When sphericity is violated, researchers most often refer to the results the Greenhouse-Geisser test. •Note that the Greenhouse-Geisser test decreases the df in order to adjust for the violation. -see table -If the Greenhouse-Geisser F statistic is significant, appropriate post hoc analyses will be conducted.

Use Caution When Interpreting Odds Ratios

•When the relative risk is very low, an odds ratio will produce a similar value to the RR. •As risk rises, the odds ratio will produce an overestimate of the true risk.

EXAMPLE 3: Sensitivity and specificity of rapid HIV testing of pregnant women in India.

"CONCLUSION: In this relatively low HIV prevalence population of pregnant women in India, the sensitivity of the rapid HIV tests varied, when compared to a dual EIA algorithm. In general, the specificity of all the rapid tests was excellent, with very few false positive HIV tests."

example of proof that educated students/statisticians dont always understand

"Suppose 0.8% of women who get mammograms have breast cancer (8/1000). In (approximately) 90% of women with breast cancer, the mammogram will correctly detect it (sensitivity). (That's the statistical power of the test. This is an estimate, since it's hard to tell how many cancers are missed if we don't know they're there.) However, among women with no breast cancer at all, about 7% will get a positive reading on the mammogram, leading to further tests and biopsies and so on. If you get a positive mammogram result, what are the chances you have breast cancer? The answer is 9%. This is referred to as the positive predictive value. If you administer questions like this one to statistics students and scientific methodology instructors, more than a third fail. If you ask doctors, two thirds fail. They erroneously conclude that a p<0.05 result implies a 95% chance that the result is true - but as you can see in these examples, the likelihood of a positive result being true depends on what proportion of hypotheses tested are true. And we are very fortunate that only a small proportion of women have breast cancer at any given time." -Using MedCalc (free online medical calculator), we can see how these conclusions were reached. We simply enter the number of true positives, false positives, true negatives and false negatives. Several outcomes are presented. Note in the replication of the Reinhert example, our calculation supports the positive predictive value = 9.04%. -with low prev, even a positive test is not even a true positive; even educated people dont understand this (too many researchers get their results and run w/ it, w/o looking at other factors -The Susan G. Korman Foundation addressed the issue of accuracy of mammograms in finding breast cancer, especially in women ages 50 and older: "The more mammograms a woman has, the more likely she is to have a false positive result that will require follow-up tests. Studies have shown the chances of having a false positive result after 10 yearly mammograms are about 50 to 60%. Getting a false positive result can cause fear and worry (and result in more testing). However, this does not outweigh the benefit of mammography for most women. The goal of mammography is to find as many cancers as possible, not to avoid false positive results."

Example of Chi Square

"To examine baseline differences in participant characteristics among groups, we conducted X^2 tests or Fisher's Exact Tests for categorical variables and a univariate analysis of variance (ANOVA) for age. Kolmogorov- Smirnov tests confirmed that all continuous variables were normally distributed."

NUMBER NEEDED TO HARM (NNH)

# of indivs that need to be treated before one additional harmful effect occurs.

(CH11: Biostatistics NOT TESTED ON THIS CHAPTER)NUMBER NEEDED TO TREAT (NNT):

# of indivs that need to be treated before one additional positive benefit occurs.

chi-square goodness of fit

(X^2) A nonparametric statistical technique used to compare expected verses observed outcomes.

chi square test of independence

(X^2) A nonparametric statistical test used to determine if responses from two groups are independent.

Convenience sampling

(aka Non-Probability Sampling) occurs when the samples from a population are selected based on convenience without attention to randomization.

control variable

(aka covariate) A variable that is controlled by the researcher in an attempt to prevent such a variable from unduly influencing a dependent variable.

Student's t-test

(aka t-test) measures the difference btw means of two independent samples.

RELATIVE RISKS AND ODDS RATIOS

-A great deal of medical research examines the association between a condition and a possible risk factor. -Odds Ratios (OR) are reported when the experimental design involves a retrospective approach (case-control). -"OR's were developed to deal with case-control studies examining conditions that are rare or slow developing.....where prospective studies would be impractical." Boslaugh, S. (2013) Statistics in a Nutshell. -Relative risks (RR) are reported when the experimental design involves a prospective approach (cohort).

(CH10: INTERPRETING STATISTICAL TESTS RESULTS) Types of statistical analyses

-ANCOVA -ANOVA -Bonferroni adjustment -bonferroni post hoc -chi-square goodness of fit -chi square test of independence -control variable -correlation -cronbach alpha reliability coefficient -dependent variable -F statistic -Friedman Test -Incidence -independent variable(s) -interval/ratio scale -Kolmogorov-Smirnov Test -Kruskal-Wallis Test -Levene's Test -Linear Regression -MANOVA -MANCOVA -Mann-Whitney U Test -Paired T-test -Regression -Reliability Co-efficient -Repeated measures design -Shapiro-Wilk -Student's t-test -Validity -Variable -Welch Test -Wilcoxon's signed rank test -z-score

The Pearson correlation or the Spearman rank-order correlation

-Both tests calculate a coefficient, (ρ, pronounced "rho") -ρ (rho) is a measure of the strength and direction of the association between two variables. -Example: correlation btw GRE scores and GPA.

notes

-CI is estimating how big of a spread the real pop falls in the curve -control group is NOT the placebo group -the higher the df, the lower the t stat value can be to be significant -power tells you how many subjects you need; if you want higher power, you need more subjects; too many subjects can still be a problem -68% of t-values/statistic will be in +/- 1 SD; the smaller the sample, the farther the t-value you need to go out on the curve (probably need to go farther than 1.96 SD) -curve never touches x-axis...so theres always a possibility you got a weird data set...you can never prove something -t/f question: are more data points always better? no -CI: (LOOK UP THIS IS WRONG) if dont straddle 1, you have stat signif; if you straddle 1, you have stat signif (ONLY FOR RR and OR); it is dif than for ttest bc for rr/or, if set at 0, you cant go below 0 -for ttest: CI signif for MEAN DIF not straddling 0 is signif; if both values are above zero or both below zero then its signif -Greater # subjects, the smaller the tstat/fstat can be..ex: higher df=closer to 1.96 -if i increase the n, would t stat need to be larger or smaller to be signif? could be smaller and still be signif -if one tail tstat can be smaller too (1.6 instead of 1.9) -dep, correlated, and paired all mean that ?; same thing? -if get signif in levenes=variances are not similar enough -test q:whats the penalty from unequal variances in a ttest (violates assumptions)? Reduces number of degrees of freedom, meaning your tstatistic has to be bigger(so data has to be ? or bigger or SD) -if fewer subjects(or df)=bigger tstat needs to be(to get big ttsat:bigger mean difference w small variance) >this is why w few subjects you get false neg (type 2 error) where says no signif but there is significance (should've done power analysis to see how many subjects you need) -with small number of coin flips, may have signif, but effect size isn't big enough -type 1: 51 heads 49 tails could be false positive if 10,000 flips -don't use man whitney u if didn't violate assumptions -use fishers when cell size is 5 or less -test q: Holm is another type of bonferroni correction(has similar purpose to it); a way to correct for doing multiple tests -if p val is 0.01, is it signif at 0.02 level? Yes bc its below 0.02 -if see paired ttest want to do bonferroni -if test three times(or even pre and post), but 3 dif groups: repeated measures ANOVA (like dependent ttest/correlated) -levenes: can use mean difference to test if statistically vs clinically signif -as prev goes up, PPV goes up; if low prev, likely to get false pos -CI: as increase percentage, your interval increases, ex: 20% confident, so you say the answer is A, 99% confident, you say the answer is A, B, or C

One sample t-test (Z test)

-Comparison between a population and sample. -Not found very often in the literature because population data is not readily available. -Difference between the z-test and t-test: >t-statistic uses standard deviation to estimate standard error >z-statistic uses the population standard deviation Mean - population mean / SD

SURVEYS

-Prior to using a survey instrument, the validity and reliability of the instrument must be measured. -Existing survey instruments should report the validity and reliability associated with the instrument. If the survey instrument is being used for the first time, the initial assessment should be a measure of the validity and reliability of the instrument. -A vast majority of surveys use a Likert scale that involves subjects selecting from predetermined choices that are linked to a numerical value. For example, a common Likert scale survey statement associated with student evaluations might read, rate 1 to 5, 5 being the best. -These numerical values are ordinal in nature. The values represent nothing more than a rank order. -Therefore, descriptive statistics, and a non-parametric analysis of the responses, usually a form of Chi Square or a rank based test such as the Mann-Whitney-Wilcoxon (MWW) test or a Kruskal-Wallis test, is the typical statistical protocol. -Realize that if each statement is analyzed, each analysis carries with it an alpha level, usually .05, that will become inflated with each additional statement analysis. Thus, it is important for the survey instrument to not only be valid and reliable, but also be limited to the minimum number of statements needed to address the hypothesis. -Likert responses that all address the same question can be summed and the sums treated as interval data. However, this process is complicated and beyond the scope of this text. -SPSS offers an excellent resource pertaining to the survey process.

Does the regression model fit the data?

-R is the correlation coefficient. -R indicates the correlation strength between the variables. -R2 value = the proportion of variance explained by the independent variable. This value is an indicator or how well your prediction equation will perform. -Regression analysis produces an ANOVA -The ANOVA indicates if the predictive power is significant.

Reliability

-Reliability tests produce a reliability coefficient. -The coefficient value (R) will lie between 0-1. **A capital R also symbolizes regression. -Common interpretations of reliability coefficients: >.90 and above: Excellent reliability >.80 - .90 Very good for a classroom test >.70 - .80 Good >.60 - .70 Somewhat low >.50 - .60 Low >.50 or below- Questionable reliability -notes: ex: cronbach's alpha or coefficient

Chi Square goodness-of-fit test (Pearson's Goodness of Fit Test)

-The chi-square goodness-of-fit test is a nonparametric test used to determine whether observed sample frequencies differ significantly from known frequencies. -The chi-square goodness-of-fit test is used with one dichotomous, nominal or ordinal categorical variable. -The chi-square goodness-of-fit test is often used with Likert scale data. -The chi-square goodness-of-fit test is comparing observed outcomes to known expected outcomes. Since expected outcomes are known, only observed outcomes are collected.

Independent T-test

-The independent-samples t-test is used to determine if a difference exists between the means of two independent groups on a continuous dependent variable. -This test has other names, including the independent t-test, independent-measures t-test, Student t-test, between-subjects t-test and unpaired t-test. -Assumptions: >A single continuous dependent variable that consists of interval or ratio data, but a case has been made to use t-tests with ordinal data. >A single categorical independent variable that consists of two, independent groups. >There cannot be any relationship between the participants in each group. Data from group one cannot be influenced by the members in group two, and visa versa. >Homogeneity of Variance assumptions are met. The difference between variances for both populations is statistically insignificant. >Normal distribution assumptions are met. >The two independent samples are equal in size. -An independent-samples t-test will produce a p-value. The p-value is compared to the alpha value, usually .05. -If homogeneity of variance assumptions are not met, use a Welch's t-test (does not require equality of variance). -There are also t-tests to use with unequal sample sizes.

ONE-WAY MANOVA

-The one-way multivariate analysis of variance (MANOVA) is an extension of the ANOVA with the addition of another dependent variable, for a total of two dependent variables. -Reporting a significant MANOVA result: F(3, 68) = 20.554, p < .001; partial η2 = .627 -A significant MANOVA result will be followed by univariate ANOVAs. -For any univariate ANOVAs that are significant, appropriate post hoc tests will follow (Tukey, Bonferroni, etc.) -Assumptions are numerous and beyond the scope of this test. >notes: there are at least 2 dependent measures; > if a research reports they did 3 dif analysies of variance, the aprop response would be an adjustment of alpha: TRUE >if ttests calculated, the approp response would be adjusted alpha: true?

CHI SQUARE (X^2)

-The two most common chi square tests (X^2) are: >X^2 goodness of fit >X^2 independence (association) -The Chi-square statistic (X2) is used to determine if distributions of categorical variables differ from one another. -P- values are calculated. -Must have a minimum of 5 responses per cell to run a Chi-square test. -Fisher's test is used if a cell size is less than 5. -Reported as chi-square statistic (χ2)...... χ2(^2) = 12.580, p = .001).

Statistical Packages

-There are many statistical packages used to examine data sets (SPSS, SAS, R, etc). The use of the Statistical Package for the Social Sciences (SPSS) is common. -Note that SPSS outputs label the test statistic value as "Statistic", the degrees of freedom as "df", and the p value as "Sig."

Validity

-There are several types of validity assessment tools (face, concurrent, construct, etc.). The most common assessment of validity is face validity. >Face validity is often simply described. There is not a value associated with validity unless a statistical evaluation, for ex: a correlation, is presented as evidence.

ANALYSIS OF VARIANCE (ANOVA) ONE WAY ANOVA:

-Used to evaluate the differences between three or more independent means. -This acronym can be confusing. The ANOVA is comparing the variances and the means within each group and the variances between each group. One dependent variable -One independent variable with three or more levels -Determine if your data are normally distributed >Visual inspection of Q-Q Plots or other graphical methods can be done. However, this type of interpretation is difficult. >Use the Kolmogorov Smirnoff test, or the Shapiro-Wilk test if sample sizes are fewer than 50. >Use Levene's test to determine if the data have equality of variances. If not signif, then can continue with parametric one-way ANOVA

Correlation

-aka "r" -Correlation tests are used to determine how strong of a relationship exists between two sets of data. -The data need to be continuous. -The relationship can be positive or negative. -Values will be between -1 and +1. -Correlations do not indicate causation. -Pearson correlation is a parametric test. -Spearman correlation is a nonparametric test.

The change in critical t values as degrees of freedom change.

-df infinty: tval=1.96 -df 6: tval=2.45 df 3: tval=3.18 >as df increases, critical tval decreases

Article examples

-one article adjusted alpha due to a large sample size (lowered it to 0.001); this is good to do -Why Most Published Research Findings Are False: The author has declared that no competing interests exist: "There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical models; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies to the most modern molecular research. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims. However, this should not be surprising. It can be proven that most claimed research findings are false."

Chi Square Goodness of Fit to determine if the ratio of colors in a pack of M&M's is consistent with the ratios indicated by the company

-see 5 studies in book S1: Chi square = 61.083 ; df = 5; two-tailed P value is less than 0.0001 S2: N = 200; df: 5; X2 = 102.5; p = .00 S3: X2=17.3531; df=5; p=0.0039 S4: N= 100; df = 5; X2 = 11.3;;p =.0457 S5: N= 200; df = 5; X2 = 21.4; p = .00068 >notes: to discuss these results, possible reasons for finding signif dif in all 5 studies, you could say reasoning for this may be because of location, time you bought them, could've came from a day where they were just making one color in the morning, etc

another Levene example

-see table -In the example below, the Levene's test is not significant (p=.13). Therefore we conclude that the variances are not statistically different. We then use the row titled Equal Variances Assumed when interpreting the t- test.

Table: Levene's Test Result

-see table >notes: >produces an f statistic...value doesn't mean anything to use unless we compare to data/df, but computer does it for you; if get signif levenes test it means variances are significantly different(which is not what you want)..so go to equal variances not assumed..tstat is same for =variances assumed vs = variance snot assumed, but df is lower for =variances not assumed >if significant value, use =variances not assumed >would say mean diffs were signif w/ p=0.034 or 0.039; mean dif & standard error stays same >95% confident that whole population, that the real mean difference is somewhere btw 0.495 to 12 (also shows that its signif bc both values are above 0); if value was negative, it would say that 95% confident mean diff could be opposite and the other group actually had better scores (whereas before you thought the first group had better scores)..if coulve been opp, this means not signif bc neg value to positive value spans 0) >test q: whats the name of the levene table graph?? WELSH TEST

REPEATED MEASURES ANOVA SPSS OUTPUT

-see table •As reported in the literature: F(2, 38) = 3.77, p < .001) >test qs: what is f stat? 3.77 pval? Less than 0.01; Is spherificty assumed? Yes; effect size? 0.81 large; how many groups compared? 3 (bc df 2), 39 subjects; what is partial η2? The effect size(it is strong in this case) Effect Size: •There are four possible effect size tests for a one-way repeated measured ANOVA. •The most common effect size test is the partial eta squared (η2 ) •Example as reported in the literature: F(2, 38) = 3.77, p < .01), partial η2 = .810 •Post hoc tests will follow any significant repeated measures ANOVA -Example of Repeated Measures ANOVA: comparing tailoring methods of multimedia-based fall prevention education for community-dwelling older adults >"To examine baseline differences in participant characteristics among groups, we conducted x2 tests or Fisher's Exact Tests for categorical variables and a univariate analysis of variance (ANOVA) for age. Kolmogorov- Smirnov tests (Chakravarti, Laha, & Roy, 1967) confirmed that all continuous variables were normally distributed. We conducted a mixed-design repeated measures ANOVA to investigate Group (i.e., authenticity, motivational, or control group) · Time (pretest or posttest) effects on fall threats knowledge."

Symmetric Measures

-see table in book -Phi and Cramer's V are indications of the strength of the association between the two variables.

Spearman Correlation: Nonparametric measure of statistical dependence between two variables. SPSS Output

-see table in book -The correlation coefficient = .799. -This is a positive correlation. -The two-tailed p value associated with this correlation = .001 -Remember, correlation does not mean causation.

Spearman Correlation SPSS Output: Test for Normality:

-see table in book -These tests reveal a significant deviation from normality. Therefore, the Spearman (nonparametric) test is appropriate.

Table: Descriptive Statistics (Male and Female Samples)

-see table in book >Alphas for skewness and kurtosis assessments are usually set at .01 >Because these calculations lead to standardized values, we can compare these values to our normal distribution curve at the .01 level (2.58). None of the values below exceed 2.58 so we do not have significance. >Skewness: (Females) = .962 / .374 = 2.57 >Skewness: (Males) = .037 / .374 = .099 >Kurtosis: (Females) = .611 / .733 = 0.834 >Kurtosis: (Males) = -1.058 / .733 = 1.44 -notes: take value and divide it by the standard error

Do we really need to worry about skewness and kurtosis?

>Skewness and kurtosis values are very dependent on the sample size. >The larger the sample, the smaller the skewness and kurtosis values. >Extreme skewness or kurtosis values are red flags when using parametric tests.

Pearson Correlation SPSS Output: Negative Correlation

-see table in book The correlation coefficient = -.522. This is a negative correlation. The two-tailed p value associated with this correlation = .000 >notes: >the above two tables are the same data just one is done GRE Vs GPA and other is GPA vs GRE >test q: n=30, which test use? Shapiro wilk bc less than 50; what is the correlation coefficient: 0.522, a strong correlation; correlation btw gre and gre? 1; how many data pts? 1320 >deciding if data is normal dist depends on alpha; ex: if set alpha at 0.05 and data is below 0.05 you should run nonparamet?? >nonparam tests rank order data

Tests of Normality

-see table in book These tests reveal a non-significant deviation from normality. The Pearson (parametric) test is appropriate. -note: these sigfigs are normal enough to run parametric tests

(CH7)Why Statistics Can Be Perplexing

>Statistics and probability are not intuitive. >We tend to jump to & make overly strong conclusions from limited data. >We have incorrect intuitive feelings about probability. >We see patterns in random data. >We don't realize coincidences are common.

NUMBERS NEEDED TO TREAT & NUMBERS NEEDED TO HARM

>NNT: Is it worthwhile to treat? How many patients need to be treated before you achieve a positive result? Also see NNH (Numbers needed to Harm) >NNT= N/ARR >N= number of subjects >ARR = improvement rate in intervention group - improvement rate in control group >100 patients given drug, 100 placebo >Drug group: 80/100 improved >Placebo group: 60/100 improved >NNT= N/ ARR >100/20 = 5 >ARR 80%- 60% = 20% >Drug Group: 998 live, 2 die >Placebo Group: 996 live, 4 die >RRR (relative risk reduction) = 50% !! >Deaths reduced from 4 to 2, or 50% >But NNT = 100/0.2 = 500 >998/1000 - 996/1000 = 99.8%-99.6%= 0.2 >So you would need to subject 500 subjects to treatment in order to have one more survive. >Also see Numbers Needed to Harm (NNH)

Types of t-tests

>One sample t-test (Z test); notes: don't confuse w/ zscore >Independent T-test >Dependent T-Test

SUMMARY

>Random patterns happen all of the time. >Small samples can sometimes hide a significant pattern. >We can have large differences that are not deemed statistically significant. >We can have small differences that are deemed statistically significant. >The larger the sample size the more likely we will have a significant finding, simply due to the largeness of the sample.

Common Assumptions associated with t- tests

>Random sampling >Interval or ratio data >Normality >Homogeneity of Variance

Tests of Normality

>As discussed in Chapter 6, a common assumption of parametric statistical techniques requires data sets approach normal distributions. >Skewness and kurtosis values indicate the shape of a data set. >An examination of the skewness and kurtosis values allows for a determination of normality to be made. >Two of the most common tests used to assess skewness and kurtosis are the Kolmogorov-Smirnov test (used with 50+ subjects), the Shapiro-Wilk test (used with less than 50 subjects). >The Anderson-Darling test and the Cramer Von-Mises test are also seen in the literature. >Often, stem and leaf plots and QQ plots are used to assess normaility. >Each test has its own strengths and weaknesses. All are useful for use when interpreting normality of the data sets. >As seen in the following table, a p value (sig) is produced for each data set. >A significant p-value, usually less than .01, indicates a data set that is not normal is shape. >(see table) In the example below, Data Set 1 has been found to be abnormal in shape. Therefore, statistical tests requiring normality cannot be used.

SUMMARY: What Are We to Do?

>Do our best to understand research methods and its limits >Recognize that we all have biases that need to be removed, and the research method is misunderstood; conflicting results are not only common but predictable based on the tenuous foundations supporting both Frequentist and Bayesian approaches. >the best we can do with findings is to talk about probabilities.

Other Ways To Test For Normal Distributions:

>HISTOGRAMS: It is acceptable to simply eyeball histograms generated from the descriptive statistics. If the data approach a normal distribution shape, the researcher can assume that the assumption has been met. However, this type of casual evaluation should be avoided. >QQ PLOTS: A Q-Q Plot will produce a diagonal line with the data points dispersed along this line. The closer the data points to the line, the more normal the distribution. Again, this is an eyeball test. >SHAPIRO-WILK TEST: This test is designed to evaluate the shape of the data. A p-value will be produced for each set of data. Interpret the p-value against the .05 level. If the p-value is less than .05 the data are not normally shaped. Nonparametric tests would then be used. The Shapiro-Wilk test is used with small sample sizes, usually less than N=50. (See Table below) >KOLMOGOROV-SMIRNOV TEST: This test is designed to evaluate the shape of the data. A p-value will be produced for each set of data. Interpret the p-value against the .05 level. If the p-value is less than .05 the data are not normally shaped. Nonparametric tests would then be used. The Kolmogorov-Smirnov test is used with large samples of N=50 >SKEWNESS AND KURTOSIS Z-SCORES: Included in the descriptive statistics table are skewness and kurtosis values and their stand error values. Simply divide the value by the standard error to produce a z-score. (See Table below) Most sources indicate a statistical significance level be set at .01 (z-score of ±2.58). If the z-score is within ±2.58, the data are normally distributed. >If normality is violated researchers sometimes transform the data. But in most cases, a non-parametric test is used. -notes: most tests for assumptions set alpha at 0.01, which means the data really has to be misshapen to get a signif value; skewness means is the data shifted in a pos or neg direction

Effect size associated with a t-test

>Remember that a p-value does not tell us anything about the size of an effect. >To calculate the effect size, divide the mean difference between the groups by the pooled standard deviation. >see two formulas >( || ) means the absolute value (negative value becomes a positive value) Group Statistics SPSS Output (see table) >S pooled = √62(30-1) + 52(30-1) / 30+30-2 =5.5 >d = 85-75 / 5.5 = .91 >This effect size would be labeled as "strong".

A second example using the random number generator:

>TOTAL: Heads = 59 Tails = 41 >Ho = 50/50 change >Observed: 59/100 or .59 >Variance of the proportion: p (1-p)/n .5 (.5)/100 = .0025 >sd = square root (√) of variance = .05 >z= (proportion - prob.)/ sd = .59-.5/.05 = 1.80...... (p =.07) >Z does not equal or exceed the .05 probability value of -/+ 1.96, so we accept the null. (not signif..coin is still not biased) >Or....... Z (59/100) = ((59/100 - .5)) √ [(.5 X .5) / 100] >z = .09 / .05 = 1.8 >Critical value of a two tailed test z-test = 1.96 >Since 1.8 is less than 1.96, we do not have significance at the .05 level. >What if we again use the 59% heads finding, but assume this occurred over 1000 tosses. >N=1000 >Observed heads= 590/1000 or 59% >P=.5 >sd=.0158 >Z= 5.69 >This z-score far exceeds the 1.96 value needed to be significant at the .05 level. >Now let's look at increasing the sample to 10,000, with an outcome of H= 51%, T = 49% >N = 10,000 >Observed Heads = 5100/10,000 or 51% >sd = .005 >Z = 2 >Z exceeds the .05 value of -/+1.96. So this outcome is significant at the .05 level.

Test for Homogeneity (Equality) of Variance

>The SPSS generated results from a Levene's test for equality of variances is presented below. >The first two columns indicate the Levene's F value and the p value (Sig). >A p value of .09 is not significant at the .05 level, so variances are assumed to be equal. >(see table) >(t) refers to the t statistic. This is the value produced by the t-statistic formula. >(df) refers to the degrees of freedom. >(Sig. two-tailed) refers to the p value, given a two tailed test. >The p value (p = .005) indicates the probability of obtaining the t value of 3.00 if the null hypothesis is correct. >Using the T-Table, with 78 degrees of freedom and alpha set at .05, the t statistic (3.00) is greater than the table value of 1.99. Therefore, the t-statistic is significant at the .05 level. >The 95% confidence intervals indicate that we are 95% confidence that the real mean difference lies somewhere between the two values presented. Note that the mean difference value from the data resides within the 95% confidence intervals. >In the example found in this table, a researcher would summarize the results as follows: >There was a statistically significant difference, t (78) = 3.00, p = .005. >The 95% confidence interval of the difference refers to a range indicating 95% certainty that the population mean difference would be found within the range. >Referring to the 95% confidence interval can be much more useful to the researcher when interpreting the results. >Notice that if the variances are not assumed to be equal, the degrees of freedom are reduced. Given the t-test formula, a reduction in the degrees of freedom will affect the p value, given a certain t statistic value. >See the following T Distribution table for values associated with one and two- tailed t tests. >Note that a one-tailed test places all of the alpha percentage (.05) at one tail, not split between two tails. So, given the same df, the values needed for significance with a one-tailed test are lower than those associated with a two-tailed test. >Note that given the same alpha and the same tail assignment, as the degrees of freedom increase, the threshold for the t statistic decreases. >SPSS analyses will produce an exact p value. So, tables like this are no longer necessary. -notes: for t-table if tstat you find in the table is higher than value, you get statistical signif; if 10 df, you have to have a higher tvalue than 70df; if infinity, tstat is ~1.96; the more df, or the more subjects, the smaller the tstat has to be to be signif (and the more it starts to look like a bell curve); as go up df, the lower the tstat(at df 1 tstat is 12.7, at infinity its 1.96)

Strength of association

>The chi-square test does not indicate the effect size, or magnitude of the association. >Two measures that do provide measures of effect size are Phi (φ) and Cramer's V. Measure Value Approx. Sig. Phi .322 .02 Cramer's V .322 .02 >Recall that effect sizes are often interpreted using the following guide: 0.2 = small effect 0.5 = medium effect 0.8 = large effect >The strength of association results may appear in the literature as: φ = 0.375, p = .02. -notes: effect size is done only if find signif pval

Homogeneity of Variance:

>The equality of variances is sometimes referred to as homogeneity of variance or homoscedasticity. >Levene's test: An F test used to assess the equality of variances between data sets. >A p-value will be produced. > Interpret the p-value against the .05 level. >If the p-value is less than .05 the data are violating the assumption of homogeneity of variance. >Often, statistical outputs will produce p-values to be used when the assumption of homogeneity of variance is met and when the assumption is not met. -In the example in the next table, the Levene's test is significant at the .01 level. Therefore, we conclude that the variances are statistically different. We must then use the row titled Equal Variances Not Assumed when interpreting the t- test.

Important Facts to Remember:

>The research method does not allow us to determine, w/o any doubt, that there is or is not a real difference. We can only talk about probabilities associated with differences we find. >Researchers always run the risk of false findings. >P values simply measure the degree of surprise associated with a finding. >P values do not indicate effect size. >Alpha levels are arbitrary. >Too little data or too much data can produce false findings. >Tainted data sets produce tainted results. (Garbage in, garbage out) >Inflated alphas may occur with multiple comparisons. >Statistical power analyses are useful when determining the size of a data set. But, power calculations are not common. >Confounding variables are usually present and uncontrolled. >Replications of results are not common. >There is widespread misuse of statistical methods and interpretation of results. >The Baysian approach can be difficult to understand.

Determining Significance of Skewness and Kurtosis

>There is no gold standard for assessing the peakedness and symmetry of a data set. >Using descriptive statistics, a z score for skewness and kurtosis can be calculated by dividing the statistical value by the standard error.

(CH8: What can go wrong with statistics?) Possible Statistical Outcomes:

>There is no real difference and we find no significant difference. >There is no real difference but we find a significant difference. (Type I error) >There is a real difference but we fail to find a significant difference. (Type II Error) >There is a real difference and we find a significant difference.

Independent T-test

>Used to compare two independent samples >formula: mean1-mean2 divided by square root of (see pic in book for formula) -plays role: mean difs, SD, and # of subjects -notes: smaller variances=more likely larger tstat; more likley going to be signif

Binomial distribution

>Using a random number generator, a data set of 100 coin flips was recorded: Coin flip (1= Heads; 2= Tails). (see pic?) Notice the patterns of heads. At one point heads occurred nine times in a row. >TOTALS: 1: 54; 2: 46 >Is the coin biased? >We can calculate a z score and using a z table, determine if 54/100 heads is statistically significant at the .05 level. >Null Hypothesis (Ho) = 50/50 change >Observed: 54/100 .54 >Variance of the proportion: p (1-p)/n >.5 (.5)/100 = .0025 >sd = square root of variance = .05 >z = (proportion - prob.)/ sd: 0.54-.5/.05 = 0.80 >We compare this to the known quantity of a two-tailed z for significance at the .05 level (1.96). >z does not equal or exceed, so we accept the null. (its not statistically signif...the coin is not biased)

Calculating and reporting an effect size

>p values do not tell us about the size of any effect. >Another way to look at effect size is to think about whether or not the significance differences found should be considered as "clinically different". If a drug can reduce systolic blood pressure and the p value associated with the study is less than .05, we still need to consider if the reduction is clinically significant. This is a judgment call. >In an effort to help the researcher make a judgment about the size of any effect, Cohen's d is often calculated. >Cohen's d = the square root of the mean difference between the groups divided by the pooled standard deviation. >Effect sizes are most commonly interpreted as follows: Effect Size Strength 0.2 small 0.5 medium 0.8 large >Cohen's d is only used when the researcher has determined that homogeneity of variance is present. Effect sizes are subject-specific and therefore must be viewed with caution.

for laerd project

>statistical test selector can help tell you what test to use >SPSS statistics >for 1st assignment: follow "setting up your data" to create your data set >dep t-test doesnt have lavenes means test so dont need to include >if asks what to do w/ outliers just ignore -project: don't need group statistics spss output table after the first time he sees it; whatever test in book for the test, put that in the assignment; leard not giving levenes for correlated ttest rn but will give it for indep ttest; for correlated put all 3 paired samples stuff

MANCOVA

A MANOVA design that attempts to control confounding variables.

Variable

A condition or factor that is present in a research design.

Validity

A measure of how well a test or measurement instrument measures what it is supposed to measure.

ODDS RATIO:

A measure of the ratio of the odds that an event will occur to the odds of the event not happening. (The odds of drawing an ace out of a deck of cards is 4/48.)

Welch Test

A nonparametric version of a t-test used when the two data sets have unequal variances and unequal sample sizes.

Histograms:

A visual assessment of histograms is sometimes used as a way to assess the shape of a data set. -notes: can sometime eyeball histogram or stem and leaf(view as curve turned on side) to see normal distrib, but its better to do statistical analysis

ANCOVA

Analysis of covariance. Examines variations btw 2 or more sets of interval or ratio data while controlling for variables that may confound the data.

Pearson's Chi-square Test for Independence

Example: A physician's office staff is trying to decide if male and female patients differ in their choice between two instructional videos. -Ho: The proportion of patients that select each video is independent of gender. -Chi Square Formula: (Observed frequencies- expected frequencies)^2 DIVIDED BY Expected: ((O-E)^2)/E >see pic of formula in book -If the computed test statistic is large, then the observed and expected values are not close and the model is a poor fit to the data.

Example of paired t-test and Bonferroni Adjustment

Fear of falling among people who have sustained a stroke: A 6-month longitudinal pilot study. "To address Objective 1.....we used paired t tests. We again used paired t tests..... Or Pearson's x2 analyses (or Fisher's Exact Test if the cell count was <5), as appropriate. To control for multiple t tests, we completed a Bonferroni correction (.05 divided by the number of comparisons). We included four comparisons (using the four variables anxiety, depression, balance, and QoL); statistical significance was thus defined as .0125)"

MANN WHITNEY U TEST

Nonparametric Test for Two Independent Means >Tests the medians. >Mann-Whitney U test holds the following assumptions: oThe populations do not follow any specific parameterized distributions. oThe populations of interest have the same shape. oThe populations are independent of each other. ("Asymptotic" means that the p-value approaches the real value as sample size increases.) >see table: The Asym. Sig (p values)= .01 Given alpha set at .05, the decision is to reject the null.

What Happens when the ANOVA Reaches Significance?

Post Hoc Tests (Multiple Comparison Procedures) -The significant ANOVA tells us that there is a difference, but the ANOVA does not tell us where that difference occurs. oIs the difference between data set one and two? oIs the difference between data set two and three? oIs the difference between data set one and three? oOr, are all three data sets significantly different from each other? -A post hoc (after the fact) analysis follows a significant finding. -There are several post hoc analyses from which to choose. -Selection of a post hoc test is dependent on the type of data sets and is beyond the scope of this book. -In the example below(see table), A Tukey HSD test is used. >Examine the p value column for significance. Note that a p value less than .05 was found between level 1 and level 3. No other comparisons produce p values less than .05.; notes: differences btw level 1 vs level 2 does not have a signif diff(0.345), nor does level 2 vs level 3(0.777); level 1 vs level 3 are significantly different (0.047) -Other common post hoc tests: Scheffe, Dunn, Newman-Kuel, and Bonferroni.

Example of Test for Normality

RCT comparing tailoring methods of multimedia-based fall prevention education for community-dwelling older adults: "To examine baseline differences in participant characteristics among groups, we conducted x2 tests or Fisher's Exact Tests for categorical variables and a univariate analysis of variance (ANOVA) for age. Kolmogorov- Smirnov tests confirmed that all continuous variables were normally distributed." -notes: small cell sizes, like only 4 data in that group? Is a red flag; ex: only 2 people said first video was good, so cell size would only be 2 which is bad

EXAMLPE:

RR=1.3, CI: 1.1 - 2.4. This would be interpreted as the data indicating 30% increase in risk, with 95% confidence that the real risk is most likely between a 10% increase and a 140% increase. RR=1.3, CI: .90 - 2.4. This would be interpreted as the data indicating 30% increase in risk. However, the 95% confidence interval indicates that the real risk is most likely between a 10% decrease and a 140% increase. Since the real risk could actually decrease or increase, the RR is not considered significant. RR=0.5, CI: 0.3-0.9. This would be interpreted as the data indicating 50% decrease in risk, with 95% confidence that the real decrease in risk is most likely between a 10% and a 70%. RR=0.5, CI: 0.3 - 1.4. This would be interpreted as the data indicating 50% decrease in risk. However, we are 95% confidence that the real risk could decrease by 70% or increase by 40%. Thus, the RR of 0.5 is not significant.

Voluntary response sample

occurs when members of a pop are given the opportunity to become subjects.

Snowball Sampling

occurs when the researcher starts with a subject or a group of subjects who are asked to recommend other potential subjects who possess the necessary qualifications to serve as a subject.

How Do We Know if Assumptions Are Violated?

Table: Tests of Normality >The Kolmogorov-Smirnov test and the Shapiro-Wilk test are commonly used to assess normality of the data. >Both tests produce p values. >If any of the p values are less than .05, the data are not normally shaped and the researcher should use the appropriate nonparametric test. -see table in book: Note in the example above that the p values for the sample data are less than .05. -notes: you dont want significance in these tests bc signif means the data is not normallly dist/is mishapened; can use 0.05 or 0.01 alpha

Repeated measures design

Tests designed to evaluate two or more data sets that come from the same subject sample.

Systematic sampling

refers to an attempt at random sampling where an ordered system of selection is determined.

Stratified random sampling

similar to cluster sampling. A pop is stratified by some criteria. Random selection from each stratum is then conducted.

Incidence

The probability of occurrence of a factor in a pop in a specified period of time.

PREVALENCE

The proportion of a population possessing a specific characteristic in a given time period.

independent variable(s)

The variable(s) that may influence the dependent variables.

Hazard and Hazard-Ratios

These analyses look at risk over a specific time period.

Cox Proportional Hazard:

This analysis uses regression to PREDICT a cumulative hazard over a specific time period. Cox regressions produce coefficients that relate to hazard. Positive coefficients indicating a worse prognosis and negative coefficient indicating a better prognosis.

ANOVA

Where: >F=ANOVA Coefficient=MST / MSE >MST=Mean sum of squares due to treatment >MSE=Mean sum of squares due to error. >df = With this particular experimental design, there are three groups that result in two degrees of freedom. The between groups df refer to the number of data sets - 1. The within groups degrees of freedom is based on a formula using the number of subjects

What if only 10% of the subjects really had scoliosis? (34/340)

a(TP): 32 b(FP): 221 c(FN): 2 d(TN): 85 Sen: 32/43= .94 Spec: 85/306= .28 PPV: 32/253=.13 Only 13% of those testing positive really had scoliosis. NPV: 85/87= .98 -With a test having SEN= .80 and SPECF= .90: Pretest Probability 1% 10% 50% 90% PPV 7.5% 47.1% 88.9% 98.6% NPV 99.8% 97.6 81.8% 33.3% "The lower the pretest probability of a disease, the lower the predictive value of a positive test will be regardless of the sensitivity and specificity."

Chi-square test for independence (or association).

applied when you have two categorical variables from a single pop Example: Do males and females have different preferences when shown an introductory video on skin cancer?

MANOVA

evaluates the relationship btw 3 or more levels of an independent variable and 2 or more dependent variables.

Paired T-test

examines means and variances of two related samples.

correlation

examines the degree of the relationship btw variables which tend to be associated in a way not expected by chance.

Problems with Convenience Sampling

its one of the most common types of sampling used in the research process. However, "doing statistical testing in a convenience sample is pointless since the assumptions about probability sampling are violated." Is It Right To Test For Significant Differences in Convenience Samples? >Convenience sampling cannot be taken to be representative of the pop; do not produce representative results. >Convenience samples often contain outliers that can adversely affect statistical testing.

RRs and ORs are interpreted as follows:

o .01 to .99 values indicate a decrease in risk or odds. o 1.01-infinity indicates an increase in risk or odds. o The value of 1.0 indicates no increase or decrease in risk or odds. o A RR or OR of 1.3 indicates a 30% increase in the risk or odds. o An RR or OR of 2.0 indicates a 100% increase in risk or odds. We would say that the risk or odds doubled. o An RR or OR of 0.7 indicates a 30% decrease in risk or odds. -If the range contains the value 1.0, the RR or OR is not significant. -95% confidence intervals (CI) are always reported with RRs and ORs. -The 95% confidence interval indicates the range containing the population RR or OR. -The researcher is 95% confident in the accuracy of this range.

Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value

•A diagnostic test can have four possible outcomes: o True positive(a) o False positive(b) o False negative(c) o True negative(d) -Sensitivity: Detecting a true positive. a/a+c -Specificity: Detecting a true negative d/d+b -If a person tests positive, how likely is it that the person really has the disease? -Positive Predictive Value: a/ a+b -Negative Predictive Value: d/c+d -Sensitivity = true positives/true positives + false negatives -Specificity = true negatives / true negatives+ false positives -100% specificity correctly identifies all patients without the disease. -70% specificity correctly reports 70% of patients as true negatives, but 30% patients without the disease are incorrectly identified as test positive (false positives).

TWO-WAY ANOVA

•A two-way ANOVA is used when the experimental design contains two independent variables. •For example, we may be interested in examining critical thinking knowledge differences between gender and year in college (first, second, third). •The assumptions for a one-way ANOVA apply to the two-way ANOVA •The SPSS version of a two-way ANOVA will provide: o descriptive statistics o Levene's test results o This result of the simple main effect. The simple main effect (Univariate test) provides p values for the two independent variables •The two-way ANOVA will determine whether either of the two independent variables or their interaction are statistically significant. •The table labeled Tests of Between-Subjects Effects is used to determine significance. •The Levene's test p value of .071 is not significant at the .05 level, so we can assume that the variances are not significantly different. •Therefore, we can proceed with the parametric two-way ANOVA. •The independent variables are: Gender (2) and Educational Level (3). •The dependent variable is critical thinking test score. Example of a two-way ANOVA(see table) The two independent variables are sex and year-in-college. The dependent variable is a test score on a critical thinking. Factorial design is: 2(sex) X 3 (grade level) Levene's test for equality of variances: F= 3.26 Sig: (p value)= .07 •Results: o Gender (G) p value = .548 o Education Level (EL) p value = .01 o G X EL interaction = .400 >gender main effect and educational main effect: no signif btw gender(0.5); there is signif dif btw education level(0.01); critical thinking changes across educational level but not gender >interaction: btw gender and educational level is not significant (0.4) (looking if dif btw males btw levels 2 and 3 vs females btw educ levels 2 and 3) >main affect is the one factor alone; interaction is interaction btw the two factors >if had signif gender difference, would a post hoc be needed? No bc only comparing two groups (1 vs 2); would've had to do post hoc for education level bc its 3 levels Interpretation: •Main Effect of Gender: Not significant at the .05 level. •Main Effect of Education Level: Significant at the .01 level. •The Gender X Education Level: This refers to a possible interaction. The interaction is not significant at the .05 level. An interaction occurs when individual levels of the independent variable do not act in the same way across all levels. In this example, we do not have a significant interaction. Since we have a significant main effect for educational level, a post hoc test would be used to determine where significant differences exist.

EFFECT SIZE

•Calculating effect size for an ANOVA result is often done by calculating omega squared (ω2), or eta squared (η2). •(ω2) = SSb - (dfb) (MSw) / SSt + MSw •Using the ANOVA printout: >SSb : between groups sum of squares >dfb: between groups degrees of freedom >MSw: within groups mean square >SSt: total sum of squares -see table: •ω2 = 7.763 - (2)(.275) / 18.224 + .275 = .389 •Therefore, the effect size (ω2 ) = .389 (moderate/ medium effect) •Results as reported in the literature: F (2, 38) = 14.11, p < .01, ω2 = .389. •Tukey Post Hoc analyses revealed a significant value at the .05 level for the Level 1-Level 3 comparison.

LINEAR REGRESSION

•First tests for a correlation between the variables. •Produces a line of best fit. Given a graph of all data points, the regression analysis produces a line that lies as close as possible to all data points. •This line is then used to create a regression equation. •Produces an ANOVA used to determine if the prediction equation is significant. •The equation allows for the prediction of y using x. For example, if we are interested in predicting weight (y) given height (x), the regression equation allows for a particular height to be placed in the equation. •Used for predicting one variable based on one or more existing variables. o Are cholesterol levels explained by cholesterol consumption? o Are triglyceride levels explained by cholesterol consumption? •The value you are predicting is the dependent variable and the value you know is the independent variable. •Determine how much of the variation in the dependent variable is explained by the independent variable. •Often, your goal is not to make predictions, but to determine whether differences in your independent variable can help explain the differences in your dependent variable.

Kruskal-Wallis H test

•Nonparametric version of a One-way ANOVA •Comparison of medians •Much less information about the data is provided. •Assessment of boxplots can be used as evidence of similar score distributions. -see table? -Significance will be followed by appropriate Post Hoc analyses.

ANALYSIS OF COVARIANCE (ANCOVA)

•The ANCOVA is used to examine a statistical design that would employ an ANOVA, with an attempt to control for a covariate that may be affecting to results of a simple ANOVA. For example, you may suspect that pretreatment exercise may affect post-treatment cholesterol levels. The pre-treatment exercise values can be used as covariates. •There needs to be a linear relationship between the covariate, and the dependent variable at all levels. •Results will be similar to ANOVA.

Example of a Linear Regression Output

•The Model Summary Table(see table) indicates the correlation (R). •The R¬2 value indicates the amount of total variation explained by the independent variable. •Adjusted R2 is an estimate of the explained variance expected in the population. •The ANOVA Table(see table) is used to determine if the regression line that has been determined as the best fit is useful (statistically significant) when predicting the dependent variable. •The Coefficients Table(see table) provides the necessary values to construct a regression equation. •The linear regression equation: Y = bX +a •Y'=the dependent variable (predicted score) •B=Beta (the slope of the line) •a=the Y intercept. •X=explanatory variable •So, Y = 20.5 X + 13.344 •For example, assume we want to predict a student's Physician Assistant (PA) Licensure exam score based on that student's GPA (3.6) in a PA program. •Y=20.5 (3.6) + 13.344 = 87.114 (this means youll probably get an 87 on the exam) -see examples in literature if want -notes: ex.: do multiple regression with 8-10 skin fold sites to predict body density; ex: predicting survival from non-small cell lung cancer


Conjuntos de estudio relacionados

PRE-BOARD EXAMS (by Raul_DANNANG)

View Set

A közgazdaságtan alapfogalmai - 1.fejezet

View Set

QCSDSGC Music Theory: Time Signatures with 1, 2 ,4 and 8 on the Bottom

View Set