module 5 statistics part 1

Ace your homework & exams now with Quizwiz!

Which graphical display is most appropriate when both variables are categorical? a) Scatterplot. b) Two-way table. c) Side-by-side box plot. d) Coordinate plane.

) Two-way table. Correct. The correct answer is b. If both variables are categorical, we display them in a two-way table (also known as a contingency table).

Which outlier would you expect would have a greater influence on the value of the correlation coefficient? . A b. B c. The influence of A and B would

A Correct. The answer is a. Outlier A is further away, therefore it would have a greater effect on the value of the correlation coefficient.

What determines the location of a dot on a scatterplot?

A dot is placed on a scatterplot according to its x - and y -value.

For the following two questions, refer to the boxplots below. The statistics refer to bone-density test scores for two groups of patients. boxplot 14. Which group has the greater percentage of patients with bone-density scores less than 0? Group A Group B Cannot be determined

A has the greater percentage of patients with bone-density scores less than 0.

In a two-way table, what does the sum of the joint frequencies in one row equal? a) The quantitative variable b) A marginal frequency c) The correlation coefficient d) The number of individuals in the placebo group

A marginal frequency Correct. The correct answer is b. In a two-way table, the sum of the joint frequencies in one row equals a marginal frequency.

A scatterplot is the appropriate graphical display when both variables are quantitative.

A scatterplot is useful for which type of data? a. Both variables are categorical. b. One variable is categorical, one variable is quantitative. c. Both variables are quantitative. d. One variable is discrete, one variable is continuous.

A scatterplot is useful when both variables in the data set are quantitative.

C→C : Two-way Tables

A two-way table, also known as a two-way frequency table or contingency table, is used to show the relationship between two categorical variables ( C→C ); the rows show the categories of one variable, and the columns show the categories of the other variable. In the table above, the cells in yellow show joint frequencies*. These represent the total number of instances that fall in both the corresponding row and header. For example, data in the "Male" row and "With Autism" column counts the number of males with autism. The data in the green cells show marginal frequencies*. These are equal to the sum of the number of individuals in the corresponding row or column. For example, data in the "Totals" column and "Female" row shows the total number of females in the study. It may be helpful to remember that marginal frequencies appear in the margins of the table. The bottom, right cell (in both the "Totals" column and the "Totals" row) measures the total number of individuals in the study.

A researcher wants to study the possible relationship between birth order and personality. Identify the explanatory variable and the response variable. Enter in "explanatory" or "response" for your answer. Birth Order: Effect on personality:

Birt order : explanatory is Correct × The explanatory variable is birth order. effect on personality: response is Correct × The response variable is effect on personality.

A recent study examined the possible relationship between consumption of salt and blood pressure. Identify the explanatory variable and the response variable. Enter in "explanatory" or "response" for your answer. Blood pressure: Check Consumption of salt:

Blood pressure: response is Correct × Blood pressure is the response variable. Consumption of salt: explanatory is Correct × Consumption of salt is the explanatory variable.

Which of the points in the scatterplot above are outliers? Point A Point B Points A and B Points C and D Points B and D

Both data point A and data point B are outliers in this scatterplot.

What does " C→C " refer to?

Both the explanatory and response variables are categorical. Correct. The correct answer is a. The notation " C→C " refers to a situation in which both the explanatory and response variables are categorical.

What does " Q→Q " refer to? a) There is a strong correlation between the two quantitative variables. b) There is a correlation between two qualitative variables. c) Both the explanatory and response variables are quantitative. d) Both the explanatory and response variables are qualitative.

Both the explanatory and response variables are quantitative. Correct. The correct answer is c. The notation " Q→Q " means that both the explanatory and response variable are quantitative

By removing the red outlier from this data set, we would expect the strength of the linear relationship to decrease. True or False?

Conditional percentages Correct. The answer is d. Both variables are categorical (C→C) so we will use a two-way table. Therefore, our numerical measure will be conditional percentages.

A cardiac care center recently implemented a new cardiac rehabilitation program. The center was interested in determining if this new program increased a patient's risk of having another heart attack within two years. Group A was exposed to the new rehabilitation regimen, while the Group B received the traditional rehabilitation program. Which numerical measure could be used to analyze the data? Heart Attack within 2 years Yes No Total Group A 11 39 50 Group B 29 21 50 Total 40 60 100 a. Marginal Frequency b. Relative Frequency c. Joint Frequency d. Conditional percentages

Correct. The answer is d. Correct. In this study both variables are categorical (C→C) , so a two-way table is used to present the data. Therefore, our numerical measure will be conditional percentages.

An exercise physiologist was interested in how age affects reaction time. He sampled a population ranging in age from 10 years old to 70 years old. Subjects were asked to complete a simple reaction time test. The test consisted of subjects sitting in front of a computer, and clicking a mouse when the image of a circle on a screen changed color from black to red. Subjects were asked to attempt the task 5 times. The times from the five trials were averaged to give each subject an overall reaction time score. Which numerical measure could be used to analyze the strength of the relationship between age and reaction time? a. Correlation coefficient b. Conditional percentages c. Mean d. Five-number Summary

Correlation coefficient Correct. The answer is a. In this study both variables are quantitative (Q→Q) that form paired data. The strength of a correlation can be measured by calculating the correlation coefficient.

The strength of the correlation between two variables can be measured by which statistic? a. Relationship factor b. Correlation coefficient c. Strength test d. Trending statistic

Correlation coefficient is a statistic that measures the strength of the correlation between two variables.

Are there any outliers? Please type in the letter that corresponds with your answer. Point A Point B Point C They are all outliers. There are no outliers.

Data point C would be considered an outlier based on the data set information presented in this scatterplot.

Step 2. Plot the data points on the coordinate plane

Each dot on the scatterplot represents one patient. Their BMI and cholesterol data can be seen as ordered pairs. The patient's BMI measurement is the x -value and the cholesterol level is the y -value. Here are these points plotted on a coordinate plane:

A study is to be performed on the possible relationship between brain volume and IQ scores. A scatterplot will be constructed to examine the relationship between the variables. Brain volume IQ Scores 7. What is the explanatory variable and the response variable in this scatterplot? Explanatory: Check Response:

Explanatory: The answer is brain volume. Response the answer is IQ scores

The larger the sample size, the greater the effect an outlier will have on correlation. True or False? a. True b. False.

False. Correct. This is a false statement. The smaller the sample size, the greater the effect of the outlier.

One of the challenges with young patients who suffer from acute asthma is delivering medication during asthma attacks. Young patients often have difficulty managing traditional delivery methods, such as inhalers. A study was done to determine whether using a nebulizer to deliver medication reduced the duration of asthma attacks in pediatric patients. Patients were split into two groups. One group was given traditional inhalers as a method to deliver medication during an attack. The other group was given the nebulizer. The effectiveness of both delivery methods was measured by comparing the time it took for an attack to subside. Which numerical measure would present the maximum and minimum amount of time that it took for attacks to subside for both delivery methods? a. Relative Frequency b. Five-number Summary c. Correlation d. Conditional percentages

Five-number Summary Correct. The answer is b. In this study one variable is categorical, and the other is quantitative (C→Q). The five-number summary will show five important statistical values: minimum, maximum, first quartile, median, and third quartile. Therefore, the five-number summary would be the best choice to look at the minimum and maximum values for both groups.

Consider a study on whether a certain medication improves kidney function in patients with chronic kidney disease. The Glomerular Filtration Rate (GFR) was collected for two groups: a treatment group and a placebo group. The researchers want to compare the middle 50% of the data for both groups. Which numerical measure would identify the two values that 50% of the data falls between for both groups? a. Median b. Five-number summary c. Mode d. Joint Frequencies

Five-number summary Correct. The answer is b. In this study one variable is categorical and the other is quantitative (C→Q). The five-number summary will show five important statistical values: minimum, maximum, first quartile, median, and third quartile. 50% of the data falls between the first(Q1) and third(Q3) quartile, so the researchers should look at the five-number summary to determine the values for Q1 and Q3 for both groups, as these values define the middle 50% of the data.

FORM

If a scatterplot has a pattern of points that form a reasonably straight line, we describe it as linear. If the points form a pattern that is more curved than straight, we say it is nonlinear, or curvilinear.

DIRECTION

If a scatterplot shows a pattern of points that increase from the lower left corner of the graph to the upper right corner, we say that there is a positive correlation between the two variables: when the x -variable increases, the y -variable increases. When the pattern goes from the upper left corner of the graph to the lower right corner, we say that there is a negative correlation between the two variables: when the x -variable increases, the y -variable decreases.

Conditional Percentages in a Two-way Frequency Table

If both variables are categorical, a two-way frequency table is used to display the data. The following example of a two-way table shows the association between blood pressure and survival for a group of men. In this example, blood pressure is the explanatory variable and survival is the response variable. blood Pressure (Explanatory Variable) Normal ( ≤120/80 mm Hg) died 21 survived 2655 total 2676 High ( ≥140/90 mm Hg) died 55 survived 3283 total 3338 grand total died 76 grand total survived 5938 total with norm BP 26014676 total with high BP 3338 grand total

C→Q : Five-number Summary

If the explanatory variable is categorical and the response variable is quantitative, we can use descriptive statistics, namely the five-number summary, for the quantitative variable, and compare the statistics for each of the categories. You might have already guessed this since you know that the side-by-side box plots are used for the C→Q classification.

relative frequencies.

If we calculate the percentage that each cell is of the total, the results are called relative frequencies.

In a two-way table, which best describes the sum of all of the joint frequencies? a) The total number of individuals in the study. b) The marginal frequency. c) The correlation coefficient. d) The total number of individuals in the treatment group.

In a two-way table, the sum of all of the joint frequencies is equal to the total number of individuals in the study.

What is the five-number summary for the treatment group? Minimum: Check Q1 : Check Median: Check Q3 : Check Maximum: Check

Minimum: The minimum for the treatment group is −1.19 Q1 : Correct. Q1 for the treatment group is −0.51 Median: The median for the treatment group is 0.225 Q3 :Q3 for the treatment group is 1.42 Maximum:The maximum for the treatment group is 2.24

Examine the graph below:

Notice that on the graphs below we have plotted the ordered pairs, as well as a line. This line is called the "line of best fit" or the "regression line," and it will be discussed in more detail in a later module. The line of best fit is the line that minimizes the distance between each point and the line itself. While we can draw many lines on the scatterplot, the line of best fit is the line that is the closest to all the points in the scatterplot. The closer the points are overall to the line of best fit, the stronger the correlation (or linear relationship) will be. We are including it here in these graphs to give you a better feel of the strength of the linear relationship. This scatterplot has 21 data points, including the outlier that is above and to the left of the rest of the data. There is a moderately strong positive correlation when this outlier is included.

On a horizontal side-by-side box plot, what is displayed on the horizontal axis? a) The categorical, response variable. b) The categorical, explanatory variable. c) The quantitative, response variable.

On a horizontal side-by-side box plot, the quantitative, response variable is displayed on the horizontal axis.

What is the explanatory variable and the response variable in this scatterplot? Percent Body Fat ( % ) Body Mass Index (BMI)

Percent Body Fat ( % )------response Body Mass Index (BMI)---explanatory

A recent study was conducted to determine if driving performance was influenced by texting. Specifically, the research investigated how many seconds it takes for a driver to respond when a leading car hits the brakes. Identify the explanatory variable and the response variable. Enter in "explanatory" or "response" for your answer. Response time measured in seconds: Check Presence of distraction from texting: Check

Response time measured in seconds: response is Correct × The response variable is response time measured in seconds. Presence of distraction from texting: explanatory is Correct × The explanatory variable is presence of distraction from texting.

Positive Correlation

Scatterplot (a) shows a positive correlation between the variables because as the x -variable increases, the y -variable increases.

Negative Correlation

Scatterplot (b) shows a negative correlation because as the x -variable increases, the y -variable decreases.

No Correlation

Scatterplot (c) shows no correlation*; there is no apparent overall trend between the two variables.

Which graphical display is most appropriate when both variables are quantitative? a) Horizontal side-by-side box plot b) Two-way table. c) Scatterplot. d) Vertical side-by-side box plot.

Scatterplot. Correct. The correct answer is c. If both variables are quantitative, a scatterplot is usually the best choice to display their relationship.

The Smell & Taste Treatment and Research Foundation conducted a study to investigate whether smell can affect learning. Subjects completed mazes multiple times while wearing masks. They completed the pencil-and-paper mazes three times wearing floral-scented masks, and three times with unscented masks. Participants were assigned at random to wear the floral mask during the first three trials or during the last three trials. For each trial, researchers recorded the time it took to complete the maze and the subject's impression of the mask's scent: positive, negative, or neutral. Identify the explanatory variable and the response variable. Enter in "explanatory" or "response" for your answer. Scent: Check Time it takes to complete the maze: Check

Scent: explanatory is Correct × The explanatory variable is scent. Time it takes to complete the maze: response is Correct × The response variable is the time it takes to complete the maze.

A drug company is testing how their two drugs (Drug A and Drug B) affect systolic blood pressure (measured in mm Hg, which is a continuous scale). Which of the following graphical displays should be used? a) Two-way table b) Scatterplot c) Side-by-side box plot d) Contingency table

Side-by-side box plot Correct. The correct answer is c. A side-by-side box plot is the best visual display to use.

Which graphical display is most appropriate when the explanatory variable is categorical and the response variable is quantitative? a) Two-way table. b) Scatterplot. c) Contingency table. d) Side-by-side box plot.

Side-by-side box plot. Correct. The correct answer is d. If the explanatory variable is categorical and the response variable is quantitative, we can use side-by-side box plots to display them.

Horizontal Side-by-Side Box Plot

Side-by-side box plots do not need to be presented vertically. Here is an example of a horizontal side-by-side box plot that compares data from a treatment group to a placebo* group: Here, the categorical, explanatory variable (Treatment and Placebo Group) are listed above and below one another. The quantitative, response variable is displayed on a horizontal axis.

Which variable, explanatory or response, is displayed on the x -axis on side-by-side boxplots?

Side-by-side boxplots can be horizontal or vertical, so either variable (explanatory or response) can be displayed on the x -axis.

response variable*

The affected variable is called the response variable*. In a randomized experiment, the researcher manipulates values of the explanatory variable and measures the resulting changes in the response variable.

Which statement is the most appropriate interpretation of this scatterplot? Enter the letter that corresponds to your answer choice. a. Countries with lower infant mortality rates tend to have shorter life expectancies. b. Countries with lower infant mortality rates tend to have longer life expectancies. c. Countries with higher infant mortality rates tend to have longer life expectancies. d. There is no clear relationship between infant mortality rates and life expectancy.

The answer is b. Countries with lower infant mortality rates tend to have longer life expectancies.

Which point do you think would be the most influential on the analysis of the two variables x and y ? Point A Point B Point C Point D

The answer is b. Data point B would be the most influential on the analysis of the two variables x and y, as this data point has the greatest divergence from the data set.

Describe the association between the number of TV's and the birth rate; is it positive or negative? Please type in the response that corresponds with your answer

The association between the number of TV's and the birth rate is negative.

When both variables are quantitative, what best describes the template on which the data is displayed? a) A table b) A number line c) A flowchart d) The coordinate plane

The coordinate plane Correct. The correct answer is d. When both the explanatory variable and the response variable are quantitative, the data is displayed on a scatterplot, which is created on the coordinate plane.

What can two-way tables be used to find? a) The marginal frequencies of certain outcomes. b) The relative frequencies of certain outcomes c) The joint frequencies of certain outcomes. d) All of the above.

The correct answer is d. All of the above. All of these things can be seen or calculated from a two-way table.

The correct answer is d. Both the explanatory and response variables are categorical, so a two-way table is the best graphical display to use.

To construct a scatterplot to see any potential relationship between variables, follow the instructions below: Step 1. Collect data

The data being collected should have two variables. These variables will correspond to the x -axis and y -axis of your scatterplot. Determine the explanatory variable (a suspected "cause") and the response variable (a suspected "effect"). For example, a study has been conducted to examine a possible relationship between body mass index (BMI) and total cholesterol level—that is, the researchers are investigating if BMI affects a patient's total cholesterol level. Therefore, BMI is the explanatory variable on the x -axis and cholesterol is the response variable on the y -axis. Here is the data that was collected: Patient Body Mass Index (BMI) Total Cholesterol Level (mg/dl) Patient #1 28 210 Patient #2 20 185 Patient #3 25 207 Patient #4 29 226

A public health study is measuring how height is passed down genetically from parents to offspring. In particular, the study has selected a sample of 25 -year-olds, and their biological parents. The study is examining the relationship between average height of the biological parents (measured in inches) and the height of the offspring (measured in inches). Which of the following labels best fits the explanatory and response variables? a) The explanatory variable is "Height of Mother" and the response variable is "Height of Father." b) The explanatory variable is "Average Height of Parents" and the response variable is "Height of Offspring." c) The explanatory variable is "Height" and the response variable is "Inches." d) The explanatory variable is "Height of Offspring" and the response variable is "Average Height of Parents."

The explanatory variable is "Average Height of Parents" and the response variable is "Height of Offspring."

A drug company is testing how their two drugs (Drug A and Drug B) affect systolic blood pressure (measured in mm Hg, which is a continuous scale). Which of the following labels best fits the explanatory and response variables? a) The explanatory variable is "Systolic Blood Pressure" and the response variable is "Drug." b) The explanatory variable is "Drug A" and the response variable is "Drug B." c) The explanatory variable is "Systolic Blood Pressure" and the response variable is "mm Hg." d) The explanatory variable is "Drug" and the response variable is "Systolic Blood Pressure."

The explanatory variable is "Systolic Blood Pressure" and the response variable is "Drug." Incorrect. The correct answer is d. The explanatory variable is "Drug" and the response variable is "Blood Pressure."

A hospital is studying the effectiveness of two different heartburn treatments (Treatment A and Treatment B) administered daily for a week. The results are measured after one week of treatment, by placing a patient into one of two groups: heartburn subsided OR heartburn remained. Which of the following labels best fits the explanatory and response variables? a) The explanatory variable is "Presence of Heartburn" and the response variable is "Treatment Type." b) The explanatory variable is "Treatment A" and the response variable is "Treatment B." c) The explanatory variable is "Heartburn Remained" and the response variable is "Heartburn Subsided." d) The explanatory variable is "Treatment Type" and the response variable is "Presence of Heartburn."

The explanatory variable is "Treatment Type" and the response variable is "Presence of Heartburn."

A hospital hires an independent consulting firm to perform a study about patients with high blood pressure, and the medicine they are being prescribed. The study is examining the relationship between a patient's starting blood pressure when they entered the treatment program and the dosage of blood pressure medicine they are prescribed during their treatment. For this study: What is the explanatory variable? Is the explanatory variable categorical or quantitative? What is the response variable? Is it categorical or quantitative? What graphical display should be used to show the results of the study?

The explanatory variable is patient's starting blood pressure. The explanatory variable is a quantitative variable. The response variable is the dosage of blood pressure medicine they are prescribed. The response variable is also a quantitative variable. As both the explanatory and response variables are quantitative (Q→Q) , a scatterplot would be an appropriate graphical display.

Name the explanatory variable and the response variable. Please type in the letter that corresponds with your answer. Number of TVs Birth Rate Explanatory variable: Check Response variable:

The explanatory variable is the Number of TV's. b is Correct × The response variable is the birth rate

Five-number Summary

The far left side of the box plots represents the minimum value of the data set. In the above example, this is 60 . The left whisker represents the lower 25% of the data. In this example, that is from 60 to 70 . That means 25% of students scored between 60 and 70 on the algebra exam. The first part of the box represents the next 25% of the data. It is shaded blue in the above box plot. So another 25% of the students scored between the Q1 of 70 and the median of 75 . The line in the middle of the box is the median ( Q2 ). That means 50% of the data is below this point, and 50% of the data is above this point. In the above example, 50% of students scored below 75 and 50% scored above 75 . The second part of the box represents the next 25% of the data. It is shaded orange in the above box plot. In our example, we see that 25% of the students scored between the median of 75 and the Q3 of 85 . The right whisker represents the upper 25% of the data. In the above example, we can see that 25% of students scored between 85 and 100 on the algebra exam. The far right side of the box plots represents the maximum value of the data set. In the above example, this is 100 .

Describe the form of the relationship; is it linear, curvilinear, or is there no association? Please type in the response that corresponds with your answer.

The form of the relationship is linear.

Describe the overall pattern, using the three characteristics of scatterplots.

The overall pattern is moderately strong, negative, and linear.

STRENGTH

The pattern of points tells us about the strength of the correlation between the variables. If the points form a tightly grouped pattern, we say there is a strong correlation between the variables. If the points are loosely scattered and are not tightly grouped, we say there is a weak correlation or no correlation between the variables.

Which group has less variation in its data? A. Treatment group B. Placebo group

The placebo group has less variation in its data. The variation in data is the difference between the maximum and minimum. For the treatment group this is equal to 2.24−(−1.19)=3.43; for the placebo group this is equal to 0.78−(−2.29)=3.07. 3.07 is less than 3.43, therefore the placebo group has less variation in its data.

When both variables are quantitative, how is a scatterplot created? a) The points are plotted on a coordinate plane based on their ordered pairs. b) A line is drawn connecting all of the points to one another. c) A box is drawn from the first quartile to the third quartile, with a line in the middle at the median. d) A table is created, where each cell counts the frequency of the corresponding row and column occurring.

The points are plotted on a coordinate plane based on their ordered pairs. Correct. The correct answer is a. When both variables are quantitative, a scatterplot created by plotting the points on a coordinate plane based on their ordered pairs.

Researchers want to investigate whether taking aspirin regularly reduces the risk of heart attack. Four-hundred men between the ages of 50 and 84 are recruited as participants. The men are divided randomly into two groups: one group will take aspirin, and the other group will take a placebo. Each man takes one pill each day for three years, but he does not know whether he is taking aspirin or the placebo. At the end of the study, researchers count the number of men in each group who have had heart attacks. Identify the following values for this study: population, sample, explanatory variable, response variable.

The population = Men aged 50 to 84 The sample = The 400 men who participated The explanatory variable = Oral medication The response variable = Whether a subject had a heart attack

Outliers and Correlation

The presence of an outlier may affect the correlation coefficient, depending on the placement of the outlier and the sample size of the data. If an outlier falls far from the regression line, it can weaken the correlation and move the correlation coefficient closer to 0 . If this outlier is removed, there will be a stronger relationship between the two variables. On the other hand, if an outlier falls near the regression line, it will have a diminished effect on the correlation, and may even strengthen the correlation.

correlation*.

The relationship between the two variables is referred to as the correlation*.

Describing the Relationship Between Two Quantitative Variables

The relationship between two quantitative variables ( Q→Q ) can be described by looking at a scatterplot of the two variables. We use three characteristics to describe the relationship: direction, form, and strength.

Q→Q : Scatterplots

The relationship between two variables that are both quantitative can be displayed in a scatterplot*. A scatterplot is a graphical display on the coordinate plane* that typically shows the explanatory variable on the x -axis and the response variable on the y -axis. The values of x and y for each subject are represented by a point on the scatterplot. As we've seen earlier, every point on a coordinate plane can be represented by an ordered pair*, ( x , y ). Here, the x -value is typically the explanatory variable's value for a piece of data, and the y -value is the corresponding value for the response variable. A simple way to remember this fact is that the term "explanatory" has an " x " in it. The points in this scatterplot show a trend: as waist circumference increases, arm circumference increases. The relationship between the two variables is referred to as the correlation*.

Once we remove the outlier, however, the correlation becomes stronger.

The scatterplot above is the same as the earlier graph, but with the outlier removed. The 20 remaining points have a very strong positive correlation. Note that by comparing the two graphs (with the outlier and without the outlier), in the second graph, the points are overall closer to the line of best fit, indicating a stronger correlation.

. Which group has the greater median bone density score? A. Treatment group B. Placebo group

The treatment group has a greater median bone density score, 0.225 is greater than −0.4 as illustrated by the median lines drawn in the box plots.

Which group has a larger Interquartile range? A. Treatment group B. Placebo group

The treatment group has a larger interquartile range. The interquartile range is the difference between Q3 and Q1. The interquartile range for the treatment group is 1.42−(−0.51)=1.93. The interquartile range for the placebo group is −0.11−(−0.83)=0.72. The interquartile range is illustrated by the length of both boxes.

Are there any possible outliers? Give the approximate coordinates.

There are two possible outliers: (92,47) and (52,41) .

marginal frequencies*.

These are equal to the sum of the number of individuals in the corresponding row or column. For example, data in the "Totals" column and "Female" row shows the total number of females in the study. It may be helpful to remember that marginal frequencies appear in the margins of the table.

joint frequencies*

These represent the total number of instances that fall in both the corresponding row and header. For example, data in the "Male" row and "With Autism" column counts the number of males with autism.

The presence of outliers have little effect on correlation. True or False? a. True b. False

This is a false statement. The presence of outliers can greatly influence correlation, and the value of the correlation coefficient.

True or False? A scatterplot always shows the explanatory variable on the horizontal, or x-axis.

This is a true statement. A scatterplot always shows the explanatory variable on the horizontal, or x-axis.

to find precentages in C to C table

To analyze this contingency table, we can calculate proportions for each cell in the table. If we want to examine how the outcome of the response variable is explained by the value of the explanatory variable, we calculate the relative frequencies, or conditional percentages, for each row of the table. The reason we calculate the percentages for each row of the table is because the explanatory variable's values are found in the first two rows of the table. If the table had been constructed with the explanatory variable's values in the first two columns of the table, we would need to calculate the percentages for each column of the table instead. If we want to find the conditional percentages, simply divide the number of instances by the total number of individuals in the corresponding explanatory category. in the above table to find the percentage of how many died with normal BP divide the number died by the total number of people with normal BP 21/2676 = 0.8%

conditional column percentages.

To determine relative frequency for each cell, we divide the data by the corresponding column's total number of individuals. The percentages obtained are called conditional column percentages.

Overall Percentages

We can also calculate percentages for the whole table. Here, each cell's data is divided by the total number of individuals. By calculating the overall percentages for the whole table, we are determining the relative frequency of each combination of activities: for example, what percentage of people are women who participated in a yoga class?

The Effects of Outliers on the Analysis of Two Quantitative Variables

When analyzing scatterplot displays of two quantitative variables, ( Q→Q ), we must be aware of the possibility that one or more outliers can dramatically influence the results. An outlier* in a scatterplot is a point that does not fit the overall trend of the other points. It is usually far away from the other points and is easy to spot. There may be more than one outlier, but if several points are grouped together away from the majority of points, we call them a cluster*, not outliers. Consider the following scatterplot:

explanatory variable*

When one variable causes change in another, we call the first variable the explanatory variable*.

Nursing Connections medical caduceus symbol Explanatory and Response Variables in Nursing

When reviewing evidence, multiple variables may be involved. In these situations, a nurse might be asked to compare data collected from two different groups or sets of patients. For example, patients who received a flu vaccine versus patients who did not get the vaccine. In this scenario, a nurse may track how many of the total patients contracted the flu and then compare this between the two groups. The results of this categorical data cannot be plotted on a single line graph but could be displayed using side-by-side line graphs. These comparisons or graphs may then be used to provide education to the community on the importance of getting a yearly flu vaccine.

conditional percentages.

When the relative frequencies are calculated from the row total or the column total, they are called conditional percentages.

C→Q : Side-by-side Box Plots

When two-variables—one categorical, explanatory variable and one quantitative, response variable—are to be displayed, side-by-side box plots are an effective choice. Both box plots are displayed on the same graph. The side-by-side box plot shows a graphical comparison for the two groups. Vertical Side-by-side Box Plot Example: In this example, gender is a categorical variable and height is a quantitative variable. Gender is also the explanatory variable and height is the response variable. These side-by-side box plots show that, in general, males are about six inches taller than females. There is about the same amount of variation within the heights of each gender.

Graphical Displays for Two-way Data

When we analyze data that has two variables, if both variables are categorical ( C→C ), we display them in a two-way frequency table (or contingency table). If the explanatory variable is categorical and the response variable is quantitative ( C→Q ), we can use side-by-side box plots to display them. If both variables are quantitative ( Q→Q ), a scatterplot is usually the best choice to display their relationship.

Relationships Between Two Variables on a Scatterplot ( Q→Q ) Summarizing Distributions of Two Variables

When we have two quantitative variables that are from paired data (that is, each x -value is paired with a particular y -value), we can use a scatterplot to display the data. The values of the explanatory variable appear on the x -axis, and the values of the response variable appear on the y -axis. Each pair of ( x , y ) values appears as a point on the scatterplot, and when we see a completed scatterplot, it forms a picture of the data that shows us the relationship between the variables. When we look at the picture, we look for an overall pattern and whether there are any deviations (outliers) from the pattern. We also can describe the scatterplot by the direction, the form, and the strength of the relationship.

If all of the points line up in a perfectly straight line in a negative direction (down and to the right), it is said to be

a perfectly linear negative correlation.

An outlier* in a scatterplot is

a point that does not fit the overall trend of the other points. It is usually far away from the other points and is easy to spot.

If the points are close to a straight line in the negative direction, but do not form a perfectly straight line, we say it is

a strong negative correlation.

A study is being conducted comparing age and running speed for a sample of people of a variety of ages. Which of the following best describes the relationship between age and running speed? Running speed is the explanatory variable, while age is the response variable. Both age and running speed are explanatory variables. Age is the explanatory variable, while running speed is the response variable. Both age and running speed are response variables.

answer is c. Age is the explanatory variable, and running speed is the response variable.

A side-by-side box plot: a) is always horizontal b) is always vertical c) can be either horizontal or vertical d) whether it is horizontal or vertical depends upon whether there is a positive or negative correlation.

can be either horizontal or vertical c. A side-by-side box plot can be either horizontal or vertical.

Which group has more patients? Group A Group B Cannot be determined

can not be determined which group has more patients.

Side-by-side box plots are a good choice for two-variable data where the explanatory variable is ____________ data and the response variable is ____________ data.

categorical, quantitative.

The relationship between the x -variable and the y -variable is called _____________.

correlation

The presence of the outlier in this scatterplot _______________ the strength of the linear relationship in this scatterplot.

decreases Correct. The answer is a. The presence of the outlier in this scatterplot decreases the strength of the linear relationship.

Scatterplots can also be characterized by how closely the points follow a clear form. For example,

f all of the points line up in a perfectly straight line in a positive direction (up and to the right), it is said to be a perfectly linear positive correlation.

frequency count and joint freguencies

frequency counts in each cell of the table are the joint frequencies. The totals in each row and column are the marginal frequencies.

What does " C→Q " refer to?

he notation " C→Q " means the explanatory variable is categorical and the response variable is quantitative.

How strong is the relationship? Is it weak, moderate, or strong? Please type in the response that corresponds with your answer.

he relationship is moderate.

cluster

here may be more than one outlier, but if several points are grouped together away from the majority of points, we call them a cluster*, not outliers. Consider the following scatterplot: There is one outlier in this data set: the data point in the upper left corner of the scatterplot. If it is included in the analysis of the data, it will skew the measures of correlation. If it is not included, there is an almost perfect strong positive linear correlation between the explanatory variable (shoe size) and the response variable (height). This example illustrates the importance of plotting the data and looking for patterns when analyzing two quantitative variables.

An experimental unit

is a single object or individual to be measured.

When working with two-variable data, if both variables are quantitative, what is the most appropriate choice to display the data? Side-by-side boxplots Scatterplot Bar chart Two-way frequency table Histogram

orrect × When working with two-variable data, if both variables are quantitative, a scatterplot is the most appropriate choice to display the data.

There are four examples of scatterplots, representing ( Q→Q ):

positive correlation, negative correlation, no correlation, non-linear relationship

A scatterplot helps to show any

potential relationships between two quantitative variables, ( Q→Q ), represented by their x - and y -coordinates on a coordinate plane. Data points are plotted as dots on the coordinate plane, and the concentration, dispersion, or overall trend of these dots shows if there is relationship between the variables, and what that relationship looks like

A scatterplot is a good choice to display two-variable data that are both __________ variables.

quantitative

Numerical Analysis for C→Q and Q→Q

there is one categorical, explanatory variable and one quantitative,

The different values of the explanatory variable are called

treatments

If the points are close to a straight line in the positive direction, but do not form a perfectly straight line,

we say it is a strong positive correlation.

When analyzing a possible relationship for two-variable data, if both variables are categorical, what is the most appropriate choice to display the data? Side-by-side boxplots Scatterplot Bar chart Two-way frequency table Histogram

× A two-way frequency table is the most appropriate way to graphically display a possible relationship for two-variable data, when both variables are categorical.

When working with two-variable data, if the explanatory variable is categorical and the response variable is quantitative, what is the most appropriate choice to display the data? Side-by-side boxplots Scatterplot Bar chart Two-way frequency table Histogram

× When working with two-variable data, if one variable is categorical and the other is quantitative, a side-by-side boxplot is the most appropriate way to display the data.

See all study sets

module 5 statistics part 1

Related study sets

44 & 46: Acute Kidney Injury and Chronic Kidney Disease, Assessment of Urinary System

C LANGUAGE

Econ 335 final exam ch.10

Programming Final Exam

Module 2: Reading Assessment Quiz

Exam 2 Review

EOH Final Exam

APES U7 Test

HIST222 - ALL

Entrepreneurship Midterm

biology

World History - The Age of Napoleon (Ch. 3 Lesson 8)

Business policy

Chapter 3- pt 2

Franska siffror 0-12

Intro to Theatre Exam #2 Study Guide

Chapter 8 Anatomy Quiz

Mr. Brewer's Java: Ch. 10 Review Questions

Management of Patients with Hematologic Neoplasms

Exam 2