Stats Exam 3 (t-tests, correlations, stat. sig.)

Ace your homework & exams now with Quizwiz!

Line of best fit:

The regression line that best fits the observed scores and minimizes the error in prediction.

path analysis

- form of multiple regression - helps make cause and effect conclusions - directionality - examines the direction of relationships through first an assumption of some theoretical relationship between variables and then a test to see if the hypothesized direction of these relationships is consistent with the actual data. - exploratory - ex: Know variables are related, but helps determine how they are. Helps to determine what is causing the other. Is typing skills causing high job performance or vice versa

medium effect size

0.2 < d < 0.8

Step 4: Make a decision

Use the value of the test statistic to make a decision about the null hypothesis. - If the probability of obtaining a sample mean is less than or equal to 5% when the null hypothesis is true, then the decision is to reject the null hypothesis. - If the probability of obtaining a sample mean is greater than 5% when the null hypothesis is true, then the decision is to fail to reject the null hypothesis.

alternative/research hypothesis

a definite statement that a relationship exists between variables - "is different", "are positively related" - represents an inequality - usually refers to the sample - can be directly tested - is the explicit hypothesis (not the implied) - can be directional or nondirectional

one-tailed test

a directional hypothesis - assumes a difference in a particular direction (Group 1 will score higher than Group 2)

Action Plan of testing significance - Step 1

a statement of the null hypothesis

linear regression (regression)

a statistical procedure used to determine the equation of a regression line to a set of data points and to determine the extent to which the regression equation can be used to predict values of one factor, given known values of a second factor in a population

regression is an extension of a

correlation

small effect size

d < 0.2

large effect size

d > 0.8

criterion variable

dependent variable (y) - the variable with unknown values that can be predicted or estimated, given known values of the predictor variable

Action Plan of testing significance - Step 5

determination of the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic

phi correlation coefficient

determine direction/strength of the linear relationship between 2 dichotomous factors (nominal/nominal)

distance between each individual point and the regression line are considered _________________ because ______________________

errors because it means the prediction was wrong; it was some distance from the right answer.

data points

pairs of value for x and y

sample results must generalize to

population

directional research (alternative) hypothesis

reflects a difference between groups and the direction of the difference is specified - ex: the average score of 12th graders is greater than the average score of 9th graders on a memory test - A > B or B < A

in a t-test, the rejection areas are where on the distribution?

the tails

obtained value (test statistic value)

the value that results from the application of a statistical test

correlational method

treat each factor like a dependent variable and measure the relationship between each pair of variables

systematic variables

two factors work together to cause a change

scatter plot

used to show relationship between 2 variables (looking for a pattern)

general formula for the regression line

y' = bX + a - Y ′ is the predicted score - b is the slope, or direction, of the line; - X is the score being used as the predictor - a is the point at which the line crosses the y-axis. - we know x, we are trying to predict y

Action Plan of testing significance - Step 7/8

* if the obtained is more extreme than the critical value = reject the null * if the obtained value doesn't exceed the critical value = fail to reject the null

Step 2: Set the Criteria for Decision

Set the criteria for a decision by stating the level of significance for a hypothesis test. The level of significance is typically set at 5% in behavioral research studies.

homogeneity of variance assumption

t-test makes the assumption that the amount of variability in each of the two groups is equal

Prediction

the computation of future outcomes based on a knowledge of present ones.

the lower the standard error of estimate....

the higher the correlation between the 2 values (and better prediction)

hypothesis testing

the method of evaluating samples to learn more about characteristics in a given population - systematic way to test claims/ideas about a group/population

correlation coefficient

used to measure the strength and direction of the linear relationship between 2 factors * r = -1 to +1

how many groups are you dealing with in a t-test? The same participants are being tested more than once?

2 (dependent samples)

how many variables are you dealing with in a t-test?

Why should we think in terms of "failing to reject" the null rather than just accepting it?

- Researchers "fail to reject" instead of accepting the null because in behavioral research, because nothing is set in stone and most results are subject to change. - It is never wise to "prove" anything since you can never know for certain whether it was not due to chance.

meta-analysis

- Researchers combine data from several studies to examine patterns and trends. - powerful tool to help organize disparate information and guide policy decision making. - helps to compensate for error - often combine all the separate effect sizes from many studies to get a representative estimate of the generalizable relationship between variables.

significance vs. meaningfulness

1. Statistical significance is not very meaningful unless the study has a sound conceptual base that lends some meaning to the significance of the outcome 2. Statistical significance cannot be interpreted independently of the context within which it occurs. 3. While statistical significance is important as a concept, it is not the end-all and not the only goal of scientific research. This is why we test hypotheses, not try to prove them.

positive correlation

A correlation where as one variable increases, the other also increases, or as one decreases so does the other. Both variables move in the same direction.

Step 3: Compute the test statistic/ run the test

Compute the test statistic, which tells us how many standard deviations a sample mean is from the population mean. The larger the value of the test statistic, the farther the distance a sample mean is from the population mean stated in the null hypothesis.

how regression works

Data are collected on past events (such as the existing relationship between two variables) and then applied to a future event given knowledge of only one variable. - The higher the absolute value of the correlation coefficient, regardless of whether it is positive or negative, the more accurate the prediction is of one variable from the other based on that correlation.

Regression equation

The equation that defines the points and the line that are closest to the observed scores. - ex: To predict college GPA from high school GPA, we have to create a regression equation and use that to plot what is called a regression line.

significance level

The risk associated with not being 100% positive that what you observe in an experiment is due to the treatment or what was being tested - significant findings occur at 0.05 level (p<0.05) = differences found were not due to chance

standard deviation vs standard error

The standard deviation (SD) measures the amount of variability, or dispersion, from the individual data values to the mean, while the standard error of the mean (SEM) measures how far the sample mean (average) of the data is likely to be from the true population mean

Cohen's d

a measure of effect size that assesses the difference between two means in terms of standard deviation, not standard error - numerical representation of the effect size (small, medium, large) - threshold of 0.2 and 0.8

linearity

assumption that the best way to describe a pattern of data is using a straight line

Spearman rank order correlation coefficient

determine the relationship between 2 ranked factors (ordinal/ordinal)

a perfect regression is going to result in

high regression variance - weaker the regression, the lower the regression variance

t-test

measuring diffs between means of a sample and population. - Assume they are independent of each other - assume amount of variability in each of the two groups is equal - both groups are normalized (have normal distributions)

5% region

outcomes are due to something other than chance - we are 95% confident that the result is not due to chance3

Action Plan of testing significance - Step 3

selection of the appropriate test statistic

Action Plan of testing significance - Step 2

setting the level of risk associated with the null hypothesis

We can further evaluate the relative contribution of each predictor variable by evaluating the

significance of the added contribution of each factor

beta weights

standardized weights

critical value (p value)

the point beyond which the obtained outcomes are judged to be so rare that the conclusion is that the obtained outcome is not due to chance but to some other factor

which values indicate a stronger relationship (correlation)?

values closer to +1

which values indicate a weak relationship (correlation)?

values closer to r = 0

residual variation

variability due to unknown or uncontrolled variables; error term - high when low regression variance

zero correlation

no linear pattern/relationship between 2 factors

restriction of range

occurs when the range of data measured in a sample is restricted/smaller than the range of data in the general population

casualty

one factor causes changes in a second factor

nondirectional research (alternative) hypothesis

reflects a difference between groups, but the direction of the difference is not specified - ex: the average score of 9th graders is different from the average score of 12th graders on a memory test

When is a t-test used?

- when researchers are interested in finding out whether there was a difference in the average scores of one (or more) variable(s) between the two groups - The t test is called independent because the two groups were not related in any way - Each participant in the study was tested only once

Multiple regression:

* A statistical technique whereby several variables are used to predict one *calculating the independent contributions of multiple predictors in explaining a criterion variable and comparing different contributions - find out my running a factor analysis - Any variable you add has to make a unique contribution to understanding the dependent variable. - don't want them to covary - The additional variable needs to explain differences in the predicted variable that the first predictor does not. - The different weights (estimates of the independent relationships with the criterion variable) are standardized, so they can be compared to see which predictor is the strongest predictor, which is the weakest, and so on.

r (28) = 0.44, P < 0.05

* r = test statistic/ correlation coeff. * 28 = # of degrees of freedom * sample size = 29 * 0.44 = obtained value (strong relationship since its above 0.3) * p < 0.05 = probability is less than 5% on any one test of the null that the relationship between 2 variables is due to chance alone (5% chance that the result is due to chance) * conclusion: significant relationship

What is a fishing expedition (in research)? Why should you avoid this?

- A fishing expedition occurs in research when the researcher wants to study so many variables with no structure or correlation. - It should be avoided because it is sloppy research, and it is the responsibility of the researcher to and within the framework of the study - Research needs to be specific, clear, and testable.

t distribution

- a sampling distribution in which the estimated standard error is computed using the sample variance in the formula. - As sample size increases, the sample variance more closely approximates the population variance. - The result is that there is less variability in the tails as sample size increases. - So the shape of the t distribution changes (the tails approach the x-axis faster) as the sample size is increased.

hypothesis

- educated guess based on previous research - reflect general problem/question that is the motivation of the research - a problem statement that can be examined - hypothesis testing deals with a sample and the results are generalized to the larger population

degrees of freedom (df)

- how much wiggle room we have - has to be correct in order for statistical calculations to be correct (accurate data entry) - as sample size increases, the degrees of freedom also increase - formula: n-1 - The concept is applied to statistics calculated from sample data, and refers to the number of values free to vary. For example, if you know the mean for a sample of 10 values, and you know/ can choose 9 of the values...you then will also know the 10th value (by algebra). Only 9 of those values are free to vary (N-1) - you are free to have any random number up to the last one - used to compensate between the sample size and the population size

effect size

- measure of how strongly variables relate to one another (magnitude of difference between 2 groups) - usually calculated as cohen's d (tells you if the effect is small, medium or large) - sample size is not taken into account - if it's small, one approach is to increase the sample size

what makes a good hypothesis

- must be stated in declarative form, not a question - clear, forceful statement - assumes an expected relationship between variables - reflects a theory or literature on which they're based - should be brief and to the point - must be testable (contain measurable variables)

Null Hypothesis (H0)

- represents no relationship/difference between the variables that you're studying - "no difference", "no relationship" - represents an equality - always refers to the population - must be indirectly tested - is the implied hypothesis (not the explicit)

structural equation modeling

- specific kind of path analysis - used to present the results in a graphical representation of the relationships among all of the different factors under consideration. - you can actually see what relates to what and with what degree of strength. - also allows you to use factors ** confirmatory

what does "significant" mean in stats?

- the probability that the results are not due to random chance - but you can never be 100% certain that it is not due to chance

The Big Rule(s) When It Comes to Using Multiple Predictor Variables

1. outcome, select a predictor variable (X) that is related to the criterion variable (Y). 2. select variables that are independent or uncorrelated with one another but are both related to the outcome or predicted (Y) variable - each one makes as distinct a contribution as possible to predicting the dependent or predicted variable

Four Steps of Hypothesis Testing

1. state the hypothesis 2. set the criteria for a decision 3. compute the test statistic 4. make a decision

how many groups are you dealing with in a t-test? The same participants are being NOT being tested more than once?

2 (independent samples)

Regression line:

The line drawn based on values in a regression equation. Also known as line of best fit - the line associated with the smallest total value for sum of squares is the best fitting line - reflects our best guess as to what score on the Y variable (college GPA) would be predicted by a score on the X variable (high school GPA). - Given the regression line, we can use it to precisely predict any future score.

Type I error (alpha)

Type I error occurs when you reject the null hypothesis when it is true. - The incorrect decision is to reject a true null hypothesis: a "false positive" finding. - It is represented as an alpha (symbolized as α).

Step 1: State Hypothesis

We state the value of a population mean in a null hypothesis and presume it is true. This acts as a starting point so that we can decide whether or not the null hypothesis is likely to be true.

sampling error

a measure of how well a sample approximates the characteristics of a population

two-tailed test

a nondirectional hypothesis - assumes a difference but no particular direction - establish probability levels for rejecting or not rejecting the null hypothesis

Factor analysis

a technique looking at the correlations among a bunch of variables and whether some variables correlate more strongly with some than others. - seeks to identity variables that fall under the same factor - Each factor represents several variables, and factors turn out to be more efficient than individual variables at representing broad concepts in certain studies. - factor: groups of variables that are all related to each other likely represent some single concept/ factor - exploratory

The standardized beta coefficient, β

accounts for the unique, distinctive contribution of each predictor variable, excluding any overlap with other predictor variables - the larger the beta, the more influence the factor has in predicting values of y

the purpose of the null

acts as both a starting point and as a benchmark against which the actual outcomes of a study can be measured - starting point = accepted as true (absence of contradicting info) - differences are due to chance/something else

negative correlation

as one variable increases, the other decreases

confidence interval

best estimate of the range of a population value that we can come up with given the sample * highly reliable tests have narrower confidence intervals

outlier

can obscure the relationship by altring the strength and direction

one sample t-test

compares the mean score of a sample with another score (sometimes the population mean)

Action Plan of testing significance - Step 6

comparison of the obtained value with the critical values

Action Plan of testing significance - Step 4

computation of the test statistic value (obtained value)

Type II error

failing to reject a false null hypothesis - the probability if accepting a null hypothesis when it is false - related to sample size - sensitive to numbers

in order to increase representativeness....

increase sample size

increase power

increase sample size and increase the effect size (increase rejection zones)

predictor variable

independent variable (x) - the variable with values that are known and can be used to predict values of another variable.

coefficient of determination (r^2)

is the percentage of error that is reduced in the relationship between variables. - Used to analyze how differences in one variable can be explained by a difference in a second variable - Another measure of error

Statistical power

is the probability of a hypothesis test of finding an effect if there is an effect to be found. - one tailed is more powerful than two tailed

t statistic

known as t observed or t obtained, is an inferential statistic used to determine the number of standard deviations in a t distribution that a sample mean deviates from the mean value or mean difference stated in the null hypothesis. - mean = 50 (0-100)

correlation coefficient can act as its own.....

test statistic

Homoscedasticity Assumption

the assumption of constant variance among data points - we assume there's an equal (homo) variance or scatter (scedasticity) of data points dispersed along the regression line - uniformly spread apart

standard error of estimate

the average amount that each data point differs from the predicted data point - reflects average error along the line of regression - tells us how much imprecision there is in our estimate

regression line

the best fitting straight line to a set of data points (the line that minimizes the distance of all data points that fall from it) - the closer a set of data points falls to the regression line, the stronger the correlation

that the higher the absolute magnitude of the correlation between two variables...

the better the prediction

statistical significance

the degree of risk you're willing to take that you will reject a null hypothesis when it is actually true

error in prediction

the distance between each individual data point and the regression line - a direct reflection of the correlation between the two variables.

covariance

the extent to which the values of 2 factors vary together - how much variance they share - how much the two spreads overlap (look at venn diagram)

purpose of the alternative hypothesis

the results of the test are compared with what you expect if you were wrong (the null hypothesis)

you can use correlations to predict

the values of one variable based on the value of another - The basic idea is to use a set of previously collected data (such as data on variables X and Y), calculate how correlated these variables are with one another, and then use that correlation and the knowledge of X to predict Y.

point-biserial correlation coefficient

to measure the strength & direction of the linear relationship between one factor that is continuous and one factor that is dichotomous

normality

to test for linear correlations, we must assume that the data points are normally distributed (must form a bivariate)

Correlation

used to describe the strength and direction of the linear relationship between 2 factors - used scatter plot graph

Pearson correlation coefficient

used to determine the strength and direction of the relationship between two factors (for interval and ratio data) - measures the variance in the distance that data points fall from the regression line

inferential statistics

used to infer something about the population based on sample characteristics - must infer from a smaller sample to the larger sample 1. must select a representative sample 2. administer test/experiment 3. conclusion = chance? not due to chance?

See all study sets

Related study sets

ATI Pharmacology Proctored Review

TEAS, Digestive Organ Production of Zymogens, Enzymes, and Hormones and Functions

Chapter 2 Statistics Homework

Respiratory

NWM Chapter 4 pt.2

Japanese

Chapter 23: Pre-Class Quiz

Chemistry Exam 1

Unit 5 Quiz Review: Social Media and Interpersonal Communications

Homework 3

BIO 140 Chapter 27 Connect Concepts

C846 Chapter 1 Questions

BA304 Quiz 3 notes: Module 3, 2, & 1

Powerpoint

Mosby Chp. 5 Protection

life insurance

Exam 4 final review

French Unit 5 Grammar (pronouns, partitives)

306 Ricci PrepU Chapter 21: Nursing Management of Labor and Birth at Risk

AQA GCSE Topic 2:which leader was the most satisfied with the terms of the treaty of Versailles: You must refer to both leaders and the terms of then Treaty of Versailles when explaining your answer.