Stats Chapter 2

Ace your homework & exams now with Quizwiz!

Give The Mean, and Standard Deviation when you give the correlation

don't just give the correlation by itself

The Z-Scores for every x and y observation are calculated when finding Correlation

yes they are

Correlation Fact 2

• Because r uses the standardized values of the observations, r does not change when we change the units of measurement (a linear transformation) of x, y, or both. Measuring height in inches rather than centimeters and weight in pounds rather than kilograms does not change the correlation between height and weight. The correlation r itself has no unit of measurement; it is just a number.

Correlation Fact 1

• Correlation makes no use of the distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation.

Correlation Fact 4

• Correlation measures the strength of only the linear relationship between two variables. Correlation does not describe curved relationships between variables, no matter how strong they are.

Correlation Fact 3

• Correlation requires that both variables be quantitative. For example, we cannot calculate a correlation between the incomes of a group of people and what city they live in, because city is a categorical variable.

Correlation Fact 5

• Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot.

Arithmetic Mean

(the sum of all its values)/(the number of values)

Describe Overall Pattern of Relationship?

, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern ... Form: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the "form"

----------------------------------2.3-----------------------------

------------------------------------------------------------------

Correlation

-Denoted r, measures the direction and strength of the linear relationship between two quantitative variables. -The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed.

KEY CHARACTERISTICS OF DATA FOR RELATIONSHIPS

A description of the key characteristics of a data set that will be used to explore a relationship between two variables should include • Cases. Identify the cases and how many there are in the data set. • Label. Identify what is used as a label variable if one is present. • Categorical or quantitative. Classify each variable as categorical or quantitative. • Values. Identify the possible values for each variable. • Explanatory or response. If appropriate, classify each variable as explanatory or response.

Response Variable

A response variable measures an outcome of a study

Scatter plot

A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Standardization:

Allows us to compare correlations between data sets where variables are measured in different units or when variables are different.

Explanatory Variable

An explanatory variable explains or causes changes in the response variable.

IMPORTANT

Correlation should only be computed for X and Y when their relationship is linear.

How to Convert from quantitative to categorical

Ex: 1-3 (small), 4-7 (medium) 8-10 (Large) The Numbers being considered Sizes is Quantitative to Categorical

HOW TO DETERMINE EXPLANATORY VS QUANTITATIVE

Explanatory comes first Chronologically

If no Explanatory or Quantitative variables

If there is no explanatory-response distinction, either variable can go on the horizontal axis.

What is Chapter 2?

In Chapter 1 we learned to use graphical and numerical methods to describe the distribution of a single variable. Many of the interesting examples of the use of statistics involve relationships between pairs of variables. Learning ways to describe relationships with graphical and numerical methods is the focus of this chapter.

transformation

In data analysis transformation is the replacement of a variable by a function of that variable: for example, replacing a variable x by the square root of x or the logarithm of x. In a stronger sense, a transformation is a replacement that changes the shape of a distribution or relationship.

Categorical --> Quantitative

In many situations, we measure a collection of categorical variables and then combine them in a scale that can be viewed as a quantitative variable. The PSQI is an example. We can also turn the tables in the other direction.

No relationship:

Knowing X tells you nothing about Y.

How To Add Categorical Variables to Scatterplots

Mark different categories of dots with different colors and symbols.

In Section 2.2 we focus on graphical descriptions.

Section 2.3 and Section 2.4 move on to numerical summaries for these relationships.

Anti-Log

Taking the anti-log of a number undoes the operation of taking the log. Therefore, since Log10(1000)= 3, the antilog10 of 3 is 1,000.Taking the anti-log of a number undoes the operation of taking the log. Therefore, since Log10(1000)= 3, the antilog10 of 3 is 1,000. Taking the antilog of X raises the base of the logarithm in question to X.

Central Tendency

The center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median, and, mode. Others include the trimean, trimmed mean, and geometric mean.)

Geometric Mean

The geometric mean is a measure of central tendency. The geometric mean of n numbers is obtained by multiplying all of them together, and then taking the nth root of them. For example, for the numbers 1, 10, and 100, the product of all the numbers is: 1 x 10 x 100 = 1,000. Since there are three numbers, we take the cubed root of the product (1,000) which is equal to 10.

Log Transformation

The log transformation can be used to make highly skewed distributions less skewed. This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics.

What is our fundamental graphical tool for displaying the relationship between two quantitative variables?

The scatterplot is our fundamental graphical tool for displaying the relationship between two quantitative variables.

Negatively Associated

Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other, and vice versa. (Going down from left to right)

Positively Associated

Two variables are positively associated when above-average values of one tend to accompany above-average values of the other and below-average values also tend to occur together. (Going up from left to right)

ASSOCIATION BETWEEN VARIABLES

Two variables measured on the same cases are associated if knowing the values of one of the variables tells you something about the values of the other variable that you would not know without this information.

Tip # 1

When analyzing data to draw conclusions it is important to carefully consider the best way to summarize the data. Just because a variable is measured as a quantitative variable, it does not necessarily follow that the best summary is based on the mean (or the median). As the previous example illustrates, converting a quantitative variable to a categorical variable is a very useful option to keep in mind.

Important 1

When we study relationships between two variables, it is not sufficient to collect data on the two variables. A key idea for this chapter is that both variables must be measured on the same cases.

Natural Logarithm

a logarithm to the base e (2.71828...).

Correlation Coefficient

a number between −1 and +1 calculated so as to represent the linear dependence of two variables or sets of data.

Algorithm

a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.

Log_base_ -(X)- = ^

base = base number / X = result / ^ = what power you put the base to

Correlation is a way to better describe the strength of a relationship in a more detailed way..as opposed to...

just "strong" or "weak"

Weak Relationship between variables

points are more distant from a straight line

Strong Relationship between variables

points form more of a straight line


Related study sets

Chapter 12 meteorology study guide

View Set

Chemistry - Acid-base, Blood gases and Electrolytes

View Set

Health Chapter 15: Protecting against Sexually Transmitted Infections

View Set

Med-Surg Exam 3 - Lewis + ATI Chapter 31-37

View Set

CALIFORNIA REAL ESTATE PROPERTY MANAGEMENT: CHAPTER 6: MARKETING. TERMS AND QUIZ

View Set

Managerial Accounting - Chapter 2

View Set

TExES FULL Practice Study Questions

View Set

Brunner & Suddarth: Test Bank (Chapter 67)

View Set