MATH 317 - Comprehensive (Module Notes)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Guidelines for interpreting Pearson's correlation coefficient

******Remember that these values are guidelines and whether an association is strong or not will also depend on what you are measuring.

Normal Distribution

- A normal distribution is not skewed. - It is perfectly symmetrical and the mean is exactly at the peak.

Shapiro-Wilk Test

- Null hypothesis: data are normally distributed. - Reject null hypothesis if p < 0.05. - Accept null if p> 0.05 conclude data are normally distributed

The process of changing an X value into a z-score involves creating a signed number, called a z-score, such that:

- Sign of the z-score (+ or -) identifies whether the X value is located above the mean (positive) or below the mean (negative). - Numerical value of the z-score corresponds to the number of standard deviations between X and the mean of the distribution - In addition to knowing the basic definition of a z-score and the formula for a z-score, it is useful to be able to visualize z-scores as locations in a distribution - Remember, z = 0 is in the center (at the mean), and the extreme tails correspond to z-scores of approximately -2.00 on the left and +2.00 on the right. Although more extreme z-score values are possible, most of the distribution is contained between z = -2.00 and z = +2.00.

Kolmogorov-Smirnov (KS) Test and the Shapiro-Wilk Test.

- Two well-known tests of normality - The Shapiro-Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also handle sample sizes as large as 2000. - Some researchers don't think the KS test is valid (more historical) and they don't really use it anymore. SPSS still calculates it and presents it as part of the table by default.

How do you check normality visually?

- histogram (an effective graphical method for showing skewness and kurtosis) - steam-and-leaf plot - box plot (easy to visualize if the data is being pulled in one direction or another) - p-p plot (compares cumulative probability to ideal test distribution) - normal q-q plot (compares quantiles to ideal distribution-- DOTS SHOULD BE ALONG THE LINE)

Checking normality numerically and statistically

- shapiro-wilk test - skewness and kurtosis values - skewness and kurtosis z-values

Skewness and kurtosis values

- should be as close to ZERO as possible. - The normal distribution is symmetric and has a skewness value of 0. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail.

Skewness and kurtosis z-values

- should be between -1.96 to 1.96. o - take skewness or kurtosis value divided by its standard error. - if the result is more than 1.96 or - 1.96, then we reject null and conclude there is skewness.

Leptokurtosis (positive kurtosis)

- tails are fatter than the normal distribution - positive kurtosis indicates that the data exhibit more extreme outliers than a normal distribution

Examples of ratio variables

- temperature measured in Kelvin is a ratio variable as 0 Kelvin (often called absolute zero) indicates that there is no temperature whatsoever. - height, mass, distance, tc

Examples of interval variables

- temperature measured in degrees Celsius or Fahrenheit (the difference between 20C and 30C is the same as 30C to 40C) - Temperature measured in degrees Celsius or Fahrenheit is NOT a ratio variable. - date of birth.

Left skew (negative skew)

- the long "tail" is on the negative side of the peak - it is sometimes referred to as, "skewed to the left" (the long tail is on the left hand side) - the mean is also on the left of the peak. The mean is LESS than the median

Right skew (positive skew)

- the long tail is on the positive side of the peak, and sometimes referred to as "skewed to the right". - the mean is on the right of the peak value. The mean is GREATER than the median.

Platykurtosis (negative kurtosis)

- very thin tails compared to the normal distribution - negative kurtosis indicates that the data exhibit less extreme outliers than a normal distribution.

typical applications of the one-sample t-test

1) testing a sample against a pre-defined value (or generally assumed) 2) testing a sample against an expected value 3) testing a sample against common sense or expectations 4) testing the results of a replicated experiment against the original study

Two general types of statistic that are used to describe data

1. Measures of central tendency 2. measures of spreads

Factors affecting power

1. Sample size—the larger the sample size, the higher the power. 2. Variance— smaller variance yields higher power. 3. Alpha level (or significance level)-- the more lower (more stringent) the significance level, the lower the power. Power is lower for the 0.01 level than it is for the 0.05 level. The stronger the evidence needed to reject the null hypothesis, the lower the chance that the null hypothesis will be rejected. 4. Standard deviation—the smaller the standard deviation, the higher the power

In testing reliability, the closer the correlation is to ___, the more reliable the scores on the scale are

1.0

Phi is best for what kind of table?

2x2

CI Example: Suppose that we have a good (the sample was found using good techniques) sample of 45 people who work in a particular city. It took people in our sample an average time of 21 minutes to get to work one -way. The standard deviation was 9 minutes.

95% of the time, when we calculate a confidence interval in this way, the true mean will be between the two values. 5% of the time, it will not. Because the true mean (population mean) is an unknown value, we don't know if we are in the 5% or the 95%. BUT 95% is pretty good so we say something like, "We are 95% confident that the mean time it takes all workers in this city to get to work is between 18.3 and 23.7 minutes." This is a common shorthand for the idea that the calculations "work" 95% of the time.

when does a monotonic relationship exist?

A monotonic relationship exists when either the variables increase in value together, or as one variable value increases, the other variable value decreases. While there are a number of ways to check whether a monotonic relationship exists between your two variables, a scatterplot allows you to visually inspect for monotonicity

Sample

A subset of the population from which data are collected

What does the z-score tell you?

A z-score tells you where the score lies on a normal distribution curve. A z-score of zero tells you the values is exactly average (zero on the chart) while a score of +3 tells you that the value is much higher than average (3 circled on the chart)

How does ANOVA answer "Are the group means different?"

ANOVA helps to answer this question by splitting up the sources of variability. If the between variability is larger than the within variability, then the group means are different. If the between variability is not large enough, then there is not enough evidence to claim the groups are different.

What distribution does ANOVA use?

ANOVA uses F-distribution because it examines the ratio of variances. The sampling distribution of the ratios follows an F-distribution. Therefore, we use the F-statistic to determine statistical significance rather than t-statistic.

In regard to R square, what value is considered a good prediction?

Above 70% is considered to be a good prediction.

T-tests

All the tests in the t-test family compare differences in mean scores of continuous-level (interval or ratio), normally distributed data.

Means

Always report the mean (average value) along with a measure of variability (standard deviation(s) or standard error of the mean).

ANOVA

Analysis of variance (ANOVA) is a statistical tool that allows researchers to compare differences between more than 2 groups (multiple groups) to see if there are differences between these groups.

Why don't we use a 99% CI?

As the confidence level increases, the margin of error increases. That means the interval is wider. So, it may be that the interval is so large it is useless!

ASSUMPTIONS for ONE-WAY ANOVA

Assumption #1: The dependent variable contains two or more independent groups. Assumption #2: The dependent variable is interval or ratio level (i.e., they are continuous). Assumption #3: The dependent variable is normally distributed for each of the population as defined by the different levels of a factor. Assumption #4: Variances of the dependent variable are the same for all populations (homogeneity of variance). Assumption #5: Cases represent random samples from the populations and the scores on the test variable are independent of each other. Assumption #6: There are no significant outliers.

Assumptions of Spearman correlation

Assumption #1: The two variables should be measured on an ordinal, interval or ratio scale. Assumption #2: The two variables represent paired observations. Assumption #3: There is a monotonic relationship between the two variables

Assumptions of Pearson's correlation

Assumption #1: The variables must be either interval or ratio measurements. Assumption #2: The variables must be approximately normally distributed. Assumption #3: There is a linear relationship between the two variables. Assumption #4: Outliers are either kept to a minimum or are removed entirely. Assumption #5: There is homoscedasticity of the data

Assumptions of one-sample t-test

Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e. continuous data). Assumption #2: The data are independent (i.e., not correlated/related), which means that there is no relationship between the observations. This is more of a study design issue than something you can test for, but it is an important assumption of the one-sample t-test. Assumption #3: There should be no significant outliers. The problem with outliers is that they can have a negative effect on the one-sample t-test, reducing the accuracy of your results. F Assumption #4: Your dependent variable should be approximately normally distributed.

Assumptions of Kruskal-Wallis H test

Assumption #1: Your dependent variable should be measured at the ordinal or continuous level (i.e., interval or ratio). Assumption #2: Your independent variable should consist of two or more categorical, independent groups. Typically, a Kruskal-Wallis H test is used when you have three or more categorical, independent groups, but it can be used for just two groups (i.e., a Mann-Whitney U test is more commonly used for two groups). Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. Assumption #4: In order to know how to interpret the results from a Kruskal-Wallis H test, you have to determine whether the distributions in each group (i.e., the distribution of scores for each group of the independent variable) have the same shape (which also means the same variability).

Assumptions of Wilcoxon signed rank test

Assumption #1: Your dependent variable should be measured at the ordinal or continuous level. Assumption #2: Your independent variable should consist of two categorical, "related groups" or "matched pairs". Assumption #3: The distribution of the differences between the two related groups (i.e., the distribution of differences between the scores of both groups of the independent variable; for example, the reaction time in a room with "blue lighting" and a room with "red lighting") needs to be symmetrical in shape. If the distribution of differences is symmetrically shaped, you can analyze your study using the Wilcoxon signed-rank test. In practice, checking for this assumption just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task. However, do not be surprised if, when analyzing your own data using SPSS Statistics, this assumption is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a Wilcoxon signed-rank test when everything goes well!

Assumptions of Mann-Whitney U-test

Assumption #1: Your dependent variable should be measured at the ordinal or continuous level. Assumption #2: Your independent variable should consist of two categorical, independent groups Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves Assumption #4: A Mann-Whitney U test can be used when your two variables are not normally distributed.

Assumptions of an independent samples t-test

Assumption #1: Your dependent variable should be measured on a continuous scale (i.e., it is measured at the interval or ratio level) Assumption #2: Your independent variable should consist of two categorical, independent groups. Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves (if this is assumption is failed, you need to perform a diff statistical test like a paired-samples t-test) Assumption #4: There should be no significant outliers (can reduce validity of results) Assumption #5: Your dependent variable should be approximately normally distributed for each group of the independent variable (done through shapiro-wilks test) Assumption #6: There needs to be homogeneity of variances, which means variance within each of the populations is equal. This is the same as the "equality of variance". It's a measure of spread. You can test this assumption in SPSS Statistics using Levene's test for homogeneity of variances.

Assumptions of paired samples t-test

Assumption #1: Your dependent variable should be measured on a continuous scale (i.e., it is measured at the interval or ratio level). Assumption #2: Your independent variable should consist of two categorical, "related groups" or "matched pairs". "Related groups" indicates that the same subjects are present in both groups. The reason that it is possible to have the same subjects in each group is because each subject has been measured on two occasions on the same dependent variable Assumption #3: There should be no significant outliers in the differences between the two related groups Assumption #4: The distribution of the differences in the dependent variable between the two related groups should be approximately normally distributed

Assumptions of Chi Square

Assumption #1: Your two variables should be measured at an ordinal or nominal level (i.e., categorical data). Assumption #2: Your two variables should consist of two or more categorical, independent groups. Assumption #3: 80% of expected cell counts >5. SPSS will tell you whether this assumption is met. If it is not met, then perform the Fisher's exact test or merge categories where sensible. ** must have #1 and #2 to do chi square test; #3 tells if we can use the results of the test

Assumptions of Simple Linear Regression

Assumption #1: Your two variables should be measured at the interval or ratio level (i.e., they are continuous). Assumption #2: There needs to be a linear relationship between the two variables. Assumption #3: There should be no significant outliers Assumption #4: You should have independence of observations, which you can easily check using the Durbin-Watson statistic Assumption #5: Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line. Assumption #6: Finally, you need to check that the residuals (errors) of the regression line are approximately normally distributed. Two common methods to check this assumption include using either a histogram (with a superimposed normal curve) or by using a Normal P-P Plot. ****Assumptions #2 and #3 should be checked first, before moving onto assumptions #4, #5 and #6

What does the chi-square statistic compare?

Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.).

Is a Chi-square test parametric or non-parametric?

Chi-squared tests are considered non-parametric tests, where both the IV and DV are categorical variables.

Confidence intervals

Confidence intervals for means are intervals constructed using a procedure that will contain the population mean a specified proportion of the time, typically either 95% or 99% of the time. These intervals are referred to as 95% and 99% confidence intervals respectively. The general idea of any confidence interval is that we have an unknown value in the population and we want to get a good estimate of its value. Using the theory associated with sampling distributions and the empirical rule, we are able to come up with a range of possible values, and this is called a "confidence interval".

Data don't have to be perfectly normal, but should be approximately normally distributed, and this should be the case for each ______

Kurtosis

Describes the "peakness" or the "flatness" of a distribution, which measures of the extent to which there are outliers

what question does multiple linear regression answer?

Do X # of variables predict DV? **DV is quantitative while some IV can be qualitative

what question does simple linear regression answer?

Does the IV predict the DV?

Independent samples t-test

Evaluates the difference between the means of two independent groups. With an independent samples t-test, each case must have scores on two variables, the group and test variable. - E.g. group variable—men, women, test variable = dietary quality measure - t-test would measure whether the difference in dietary quality measure is different between men and women

Paired samples t-test

Evaluates whether the mean of the difference between these 2 variables is significantly different from zero. It is applicable to two types of studies: repeated-measures and matched-subjects designs

T/F: variables in correlation can be both qualitative and quantitative

F; variables can ONLY be quantitative

Frequencies

Frequency data should be summarized in the text with appropriate measures such as percents, proportions, or ratios.

______ causes a change in _____ and it isn't possible that ______ could cause a change in ______

IV; DV; DV; IV

How do you determine if your two distributions (of the variables) have the same shape?

If they do have the same shape, you can use the Mann-Whitney U test to compare the medians of your dependent variable (e.g., test score) for the two groups (e.g., males and females) of the independent variable (e.g., gender) you are interested in. However, if your two distribution have a different shape, you can only use the Mann-Whitney U test to compare mean ranks.

What question does correlation answer?

Is there a relationship between 2 or more variables?

Spearman correlation coefficient

It is denoted by the symbol rs

Does the Pearson correlation coefficient indicate the slope of the line?

It is important to realize that the Pearson correlation coefficient, r, does not represent the slope of the line of best fit. Therefore, if you get a Pearson correlation coefficient of +1 this does not mean that for every unit increase in one variable there is a unit increase in another. It simply means that there is no variation between the data points and the line of best fit.

Linear regression

It is used to predict the value of a variable based on the value of another variable.

what is the next step up after correlation?

Linear regression

Applications of the paired samples t-test:

Matched-subjects design with an intervention Matched-subjects design with no intervention Repeated measures design with an intervention Repeated measures design with no intervention

Skewness

Measures the symmetry of a distribution

Different applications of the one-sample t-test is distinguished by the choice of the test value. The test value can be:

Midpoint on test variable Chance level of performance on test variable Average value of test variable based on past research

Do the two variables have to be measured in the same units?

No, the two variables can be measured in entirely different units. The calculations for Pearson's correlation coefficient were designed such that the units of measurement do not affect the calculation. This allows the correlation coefficient to be comparable and not influenced by the units of the variables used.

Can you use any type of variable for Pearson's correlation coefficient?

No, the two variables have to be measured on either an interval or ratio scale. However, both variables do not need to be measured on the same scale (e.g., one variable can be ratio and one can be interval). If you have ordinal data, you will want to use Spearman's rank-order correlation or a Kendall's Tau Correlation instead of the Pearson product-moment correlation.

Can the KWH test tell us which specific groups of your independent variable are statistically significantly different from each other?

No; it only tells you that at least two groups were different. Since you may have three, four, five or more groups in your study design, determining which of these groups differ from each other is important. You can do this using a post hoc test (N.B., we discuss post hoc tests later in this guide).

Another name for categorical variables

Qualitative variables

In the model summary, what is R?

R (correlation coefficient) is the square root of R-Squared and is the correlation between the observed and predicted values of the dependent variable. It measures the strength and direction of the relationship.

In the model summary, what is R square?

R square (coefficient of determination) is the proportion of variance in the dependent variable which can be explained by the independent variables. This is an overall measure of the strength of association and does not reflect the extent to which any particular independent variable is associated with the dependent variable. It represents the percent of the data that is the closest to the line of best fit

T/F: the 2 variables in simple linear regression are quantitative

parametric statistics (assumes data are normally distributed)

T-tests (two sample, or paired), F-tests, Analysis of variance (ANOVA), Pearson correlation

Cronbach's alpha

Test used to assess the internal consistency of a scale by computing the intercorrelations among responses to scale items; values of .70 or higher are interpreted as acceptable internal consistency

What's the difference between a one-sample t-test and an independent-sample t-test?

The 1-sample one-test checks whether the mean score in a sample is a certain value, the independent sample t-test checks whether an estimated coefficient is different from zero.

Values of Pearson Correlation Coefficient

The Pearson correlation coefficient, r, can take a range of values from +1 to -1

For Pearson correlation, does the variables measured have to be IV and/or DV?

The Pearson product-moment correlation does not take into consideration whether a variable has been classified as a dependent or independent variable. It treats all variables equally. This is because the Pearson correlation coefficient makes no account of any theory behind why you chose the two variables to compare. Thus, the value of r would be the same.

Methods of inferential statistics

The methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses.

One-Sample T-Test

The one-sample t-test compares the mean score found in an observed sample to a hypothetically assumed value. Typically the hypothetically assumed value is the population mean or some other theoretically derived value.

The relationship between r and the association strength of the two variables measured in the Pearson correlation

The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or negative, respectively. Achieving a value of +1 or -1 means that all your data points are included on the line of best fit - there are no data points that show any variation away from this line. Values for r between +1 and -1 (for example, r = 0.8 or - 0.4) indicate that there is variation around the line of best fit. The closer the value of r to 0 the greater the variation around the line of best fit.

In linear regression, what variable do we use to predict the DV?

The variable we are using to predict the other variable's value is called the independent variable (or sometimes, the predictor variable).

In linear regression, what variable do we want to predict?

The variable we want to predict is called the dependent variable (or sometimes, the outcome variable).

Interpreting z-score in regard to the percentage

The z-score in the center of the curve is zero. The z-scores to the right of the mean are positive and the z-scores to the left of the mean are negative. If you look up the score in the z-table, you can tell what percentage of the population is above or below your score. The table below shows a z-score of 2.0 highlighted, showing .9772 (which converts to 97.72%). If you look at the same score (2.0) of the normal distribution curve above, you'll see it corresponds with 97.72%.

How can you detect a linear relationship for Pearson correlation?

To test to see whether your two variables form a linear relationship you simply need to plot them on a graph (a scatterplot, for example) and visually inspect the graph's shape. It is not appropriate to analyze a non-linear relationship using a Pearson product-moment correlation.

Type 1 error and p-value

Type I error is related with the p-value. A p-value of 0.05 means there is a 5% chance that you have committed a type I error. Therefore, the lower the p-value, the less likely you have committed a type I error

Non-parametric statistics ((does NOT assume data are normally distributed)

Wilcoxon Rank Sum Test, Wilcoxon Signed Rank Test, Kruskal-Wallis Test, Spearman's Rank Correlation

Multiple linear regression equation

Y = m1x1 + m2x2 + m2x3 + .... + b

Regression Equation

Y= mX + b OR DV = slope(IV) + constant Y= DV; X= IV; m= slope; b= constant

Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured.

You cannot use the data you have collected to generalize to other people or objects (i.e., using data from a sample to infer the properties/parameters of a population).

How are z-scores expressed?

Z-scores are expressed in terms of standard deviations from their means. As a result, these z-scores have a distribution with a mean of 0 and a standard deviation of 1. Technically, a z-score is the number of standard deviations from the mean value of the reference population (a population whose known values have been recorded.

A one-sample t-test evaluates whether the mean on a test variable is significantly different from a constant, which is also known as what?

a "test value" in SPSS

interrater reliability

a measure of agreement between different raters' scores

test-retest reliability

a measure of the stability of scores on a scale over time

An r value less than 0 indicates ______

a negative association; as the value of one variable increases, the value of the other variable decreases

Mann-Whitney U-test

a nonparametric test to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. This test is used to evaluate whether two samples are likely to derive from the same population (i.e., that the two populations have the same shape). Some investigators interpret this test as comparing the medians between the two populations.

Snowball Sampling

a nonprobability sampling strategy in which participants recruit others into the sample

An r value greater than 0 indicates ____________

a positive association; as the value of one variable increases, so does the value of the other variable.

Convenience Sampling

a type of nonprobability sample made up of those volunteers or others who are readily available and willing to participate

Quota Sampling

a type of nonprobability sampling that results in the sample representing key subpopulations based on characteristics such as age, gender, and ethnicity

Cluster Sampling

a type of probability sampling in which groups or clusters are randomly selected instead of individuals

Stratified Random Sample

a type of probability sampling that results in the sample representing key subpopulations based on characteristics such as age, gender, and ethnicity

Kruskal-Wallis H test

a.k.a. "one-way ANOVA on ranks" it is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. It is considered the nonparametric alternative to the one-way ANOVA, and an extension of the Mann-Whitney U test to allow the comparison of more than two independent groups. In other words, the Kruskal-Wallis test is roughly an ANOVA for small sample sizes or an ordinal outcome variable.

One-way ANOVA

allows us to analyze mean differences between two or more groups on a between-subjects factor. If you have just 2 groups, you can do a t-test. ANOVA is an extended t-test for MORE than 2 groups. Must have scores on two variables: a factor and a dependent variable. The factor divides individuals into two or more groups or levels, while the dependent variable differentiates individuals on a quantitative dimension. The ANOVA uses the F-statistic, which evaluates whether the group means on the dependent variable differ significantly from each other.

Adjusted R square

an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model. Adjusted R-squared is computed using the formula 1 - ((1 - Rsq)((N - 1) /( N - k - 1)) where k is the number of predictors *** used in multiple linear regression

Hypothesis testing (inferential statistics) and confidence intervals assume that the outcome (or dependent variable) is __________________________

approximately normally distributed

Pearson (product-moment) correlation

attempts to draw a line of best fit through the data of two variables

Homoscedasticity

basically means that the variances along the line of best fit remain similar as you move along the line it is required that your data show homoscedasticity for you to run a Pearson product-moment correlation

Basically categorical variable yield data in the _______ and numerical variables yield data in ________

categories; numerical form.

what does multiple linear regression control?

controls for cofounding variables (holds them constant; such as gender, age, ethnicity, income, etc)

As descriptive statistics, z-scores ________

describe exactly where each individual is located

When we use ______ statistics, it is useful to summarize our group of data using a combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts) and statistical commentary (i.e., a discussion of the results).

descriptive

The fact that z-scores identify exact locations within a distribution means that z-scores can be used as ______ statistics and as _______ statistics.

descriptive; inferential

As inferential statistics, z-scores _______

determine whether a specific sample is representative of its population, or is extreme & unrepresentative

two main methods of assessing normality

graphically and numerically

If the Sig. value of the Shapiro-Wilk Test is _____ than 0.05, the data are normal. If it is ____ 0.05, the data significantly deviate from a normal distribution.

greater; below

content validity

inclusion of all aspects of a construct by items on a scale or measure

Two types of continuous variables

interval and ratio

Ratio variables

interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable (temperature measured in degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is no temperature) The name "ratio" reflects the fact that you can use the ratio of measurements. So, for example, a distance of ten meters is twice the distance of 5 meters

Pearson (product-moment) correlation coefficient

is a measure of the strength of a linear association between two variables and is denoted by r indicates how far away all these data points are to this line of best fit (how well the data points fit this new model/line of best fi

Spearman rank-order correlation

is a nonparametric measure of the strength and direction of association that exists between two variables measured on at least an ordinal scale. The test is used for either ordinal variables or for continuous data that has failed the assumptions necessary for conducting the Pearson's product-moment correlation.

when is a chi square (X2) statistic used?

it is used to investigate whether distributions of categorical variables differ from one another.

Power

likelihood to distinguish an actual effect from one of chance. It's the likelihood that the test is correctly rejecting the null hypothesis (i.e. "proving" your hypothesis). For example, a study that has an 80% power means that the study has an 80% chance of the test having significant results. Power is also described as the probability of avoiding type II error

descriptive states includes ______ and _______

means; frequencies

Cramer's V is best for what kind of table?

more than 2x2 tables

Dichotomous variables

nominal variables which have only two categories or levels (e.g. gender (male or female), if we asked a person if they owned a mobile phone. Here, we may categorize mobile phone ownership as either "Yes" or "No")

Three types of categorical variables

nominal, ordinal or dichotomous

A type I error (false-positive or alpha (α) error)

occurs if an investigator rejects a null hypothesis that is actually true in the population (you erroneously reject the null hypothesis) You are finding a difference that does not exist.

A type II error (false-negative or beta (β) error)

occurs if the investigator fails to reject a null hypothesis that is actually false in the population. You are NOT finding a difference when it actually exists.

Repeated measures design

participant is assessed on 2 occasions or under 2 conditions Primary question: whether the mean difference between the scores on the 2 occasions is significantly different from zero.

Matched-subjects design

participants are paired, and each participant in a pair is assessed once on a measure. Primary question: whether the mean difference in scores between the 2 conditions differ significantly from zero

Subpopulation

portion or subgroup of the population

criterion validity

positive correlation between scale scores and a behavioral measure

concurrent validity

positive correlation between scale scores and a current behavior that is related to the construct assessed by the scale

convergent validity

positive relationship between two scales measuring the same or similar constructs

Another name for continuous variables

quantitative variables

Two-way ANOVA

relates two between-subjects factors to a dependent variable.

Non-probability sampling (nonrandom sampling)

sampling procedure that does not use random selection

Probability sampling (random sampling)

sampling procedure that uses random selection (a process of selecting a sample in which all members of a population or a subpopulation have an equal chance of being selected)

The Central Limit Theorem

states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger — no matter what the shape of the population distribution. This fact holds especially true for sample sizes over 30.

Properties of samples, such as the mean or standard deviation, are not called parameters, but ______

statistics

inferential statistics

techniques that allow us to use these samples to generalize about the populations from which the samples were drawn (hypothesis testing). It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (from previous section on probability and non-probability sampling). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population.

Major decision in using one-sample t-test is choosing the _____________.

test value

Phi and Cramer's V are tests of what?

tests of the strength of association

An r value of 0 indicates _________

that there is no association between the two variables.

internal consistency

the consistency of participant responses to all the items in a scale

Population

the group that a researcher is interested in examining defined by specific characteristics such as residency, occupation, gender, or age

How can an investigator reduce their likelihood of having type I and II errors?

the investigator can reduce their likelihood by increasing the sample size (the larger the sample, the lesser is the likelihood that it will differ substantially from the population)

Wilcoxon signed-rank test

the nonparametric test equivalent to the paired samples (dependent samples) t-test. The Wilcoxon signed-rank test does not assume normality in the data, and thus can be used when this assumption has been violated and the use of the dependent t-test is inappropriate. It is used to compare two sets of scores that come from paired data (same participants). This can occur when we wish to investigate any change in scores from one time point to another, or when individuals are subjected to more than one condition (before and after).

Sampling bias

the sampling does not represent the population

Descriptive statistics

the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. descriptive statistics do not, however, allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made.

What does the z-score value tell us?

the value of the z-score tells exactly where the score is located relative to all the other scores in the distribution

Dependent variable (DV)

the variable being tested and measured in a scientific experiment.

Independent variables (IV)

the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable

Measures of central tendency

these are ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean. •

Measures of spread

these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, absolute deviation, variance and standard deviation.

Simple Random Sample

type of probability sampling in which every single member of the population has an equal chance of being selected for the sample

chi-square test for independence (Pearson's chi-square test or the chi-square test of association)

used to determine whether the observed values for the cells differ significantly from the corresponding expected values for the cells it is also used to determine if there is a relationship between two categorical variables

Interval variables

variables for which their central characteristic is that they can be measured along a continuum and they have a numerical value

Ordinal variables

variables that have two or more categories just like nominal variables only the categories can also be ordered or ranked (e.g. cancer staging (stage 1, 2, 3, 4) which have meaningful differences but not quantifiable between the stages)

Nominal variables

variables that have two or more categories, but which do not have an intrinsic order (e.g. type of property-- houses, condos, co-ops, bungalows)

construct validity

whether a measure mirrors the characteristics of a hypothetical construct; can be assessed in multiple ways

face validity

whether a particular measure seems to be appropriate as a way to assess a construct

MATH 317 - Comprehensive (Module Notes)

Ensembles d'études connexes

Eco Practice Exam 2

Money and Banking Module 2

MIS Test II: Internet & E-Commerce

History of Rock and Roll Final

exam 3 - fetal circulation/newborn assessments

Fetal Development - Ch 3

Statistics Chapter 4: Scatterplots and Correlation

MNGT 301 Exam 4 (CH.16/18/19)

Chapter 35 - the adolescent and family (evolve)

JAVA OFFICIAL, Java CH 3, java ch 4, Java CH 2

Chapter 1: What is Statistics?

Exam Three

Pharmacology: Chapter 49 Key terms, Notes, PrepU

Unit 3 Multiple Choice Question (Hopefully) Full Set

FIN 3403 Ch 4

MAN ch 3 PRACTICE QUESTIONS

Solving Equations using Properties of Logarithms

Phys chapter 18

chapter 19-36 auto technology study guide

Strategic Management Exam 3