wgu stats module 5- statistics with 2 variables
different types of frequencies
joint frequency- numbers inside the boxes marginal frequency- totals on the margins relative frequencies- %
A cardiac care center recently implemented a new cardiac rehabilitation program. The center was interested in determining if this new program increased a patient's risk of having another heart attack within two years. Group A was exposed to the new rehabilitation regimen, while the Group B received the traditional rehabilitation program. Which numerical measure could be used to analyze the data?
Conditional percentages Correct. The answer is d. Correct. In this study both variables are categorical (C→C), so a two-way table is used to present the data. Therefore, our numerical measure will be conditional percentages.
An exercise physiologist was interested in how age affects reaction time. He sampled a population ranging in age from 10 years old to 70 years old. Subjects were asked to complete a simple reaction time test. The test consisted of subjects sitting in front of a computer, and clicking a mouse when the image of a circle on a screen changed color from black to red. Subjects were asked to attempt the task 5 times. The times from the five trials were averaged to give each subject an overall reaction time score. Which numerical measure could be used to analyze the strength of the relationship between age and reaction time?
Correlation coefficient Correct. The answer is a. In this study both variables are quantitative (Q→Q) that form paired data. The strength of a correlation can be measured by calculating the correlation coefficient.
relative frequencies vs conditional percentages
If we calculate the percentage that each cell is of the total, the results are called relative frequencies. When the relative frequencies are calculated from the row total or the column total, they are called conditional percentages.
perfect linear vs strong linear
, if all of the points line up in a perfectly straight line in a positive direction (up and to the right), it is said to be a perfectly linear positive correlation. If the points are close to a straight line in the positive direction, but do not form a perfectly straight line, we say it is a strong positive correlation.
A hospital is studying the effectiveness of two different heartburn treatments (Treatment A and Treatment B) administered daily for a week. The results are measured after one week of treatment, by placing a patient into one of two groups: heartburn subsided OR heartburn remained. Which numerical measure could be used to analyze the data?
Both variables are categorical (C→C) so we will use a two-way table. Therefore, our numerical measure will be conditional percentages.
One of the challenges with young patients who suffer from acute asthma is delivering medication during asthma attacks. Young patients often have difficulty managing traditional delivery methods, such as inhalers. A study was done to determine whether using a nebulizer to deliver medication reduced the duration of asthma attacks in pediatric patients. Patients were split into two groups. One group was given traditional inhalers as a method to deliver medication during an attack. The other group was given the nebulizer. The effectiveness of both delivery methods was measured by comparing the time it took for an attack to subside. Which numerical measure would present the maximum and minimum amount of time that it took for attacks to subside for both delivery methods?
Five-number Summary Correct. The answer is b. In this study one variable is categorical, and the other is quantitative (C→Q). The five-number summary will show five important statistical values: minimum, maximum, first quartile, median, and third quartile. Therefore, the five-number summary would be the best choice to look at the minimum and maximum values for both groups.
Consider a study on whether a certain medication improves kidney function in patients with chronic kidney disease. The Glomerular Filtration Rate (GFR) was collected for two groups: a treatment group and a placebo group. The researchers want to compare the middle 50% of the data for both groups. Which numerical measure would identify the two values that 50% of the data falls between for both groups?
Five-number summary Correct. The answer is b. In this study one variable is categorical and the other is quantitative (C→Q). The five-number summary will show five important statistical values: minimum, maximum, first quartile, median, and third quartile. 50% of the data falls between the first(Q1) and third(Q3) quartile, so the researchers should look at the five-number summary to determine the values for Q1 and Q3 for both groups, as these values define the middle 50% of the data.
form
If a scatterplot has a pattern of points that form a reasonably straight line, we describe it as linear. If the points form a pattern that is more curved than straight, we say it is nonlinear, or curvilinear. is is straight, or curved?
hospital hires an independent consulting firm to perform a study about patients with high blood pressure, and the medicine they are being prescribed. The study is examining the relationship between a patient's starting blood pressure when they entered the treatment program and the dosage of blood pressure medicine they are prescribed during their treatment. For this study: What is the explanatory variable? Is the explanatory variable categorical or quantitative? What is the response variable? Is it categorical or quantitative? What graphical display should be used to show the results of the study?
The explanatory variable is patient's starting blood pressure. The explanatory variable is a quantitative variable. The response variable is the dosage of blood pressure medicine they are prescribed. The response variable is also a quantitative variable. As both the explanatory and response variables are quantitative (Q→Q) , a scatterplot would be an appropriate graphical display.
strength
The pattern of points tells us about the strength of the correlation between the variables. If the points form a tightly grouped pattern, we say there is a strong correlation between the variables. If the points are loosely scattered and are not tightly grouped, we say there is a weak correlation or no correlation between the variables. are the dots close together?
marginal frequencies
These are equal to the sum of the number of individuals in the corresponding row or column. For example, data in the "Totals" column and "Female" row shows the total number of females in the study. It may be helpful to remember that marginal frequencies appear in the margins of the table.
joint frequencies
These represent the total number of instances that fall in both the corresponding row and header. For example, data in the "Male" row and "With Autism" column counts the number of males with autism.
2 types of variables
explanatory variables influences the response variable
Describing the Relationship Between Two Quantitative Variables
We use three characteristics to describe the relationship: direction, form, and strength.
In a two-way table, which best describes the sum of all of the joint frequencies?
a) The total number of individuals in the study.
negative correlation
as the x -variable increases, the y -variable decreases.
positive coorelation on a scatterplot
as the x -variable increases, the y -variable increases.
What is the name for several data points that are grouped together away from the rest of the results?
cluster
percentages obtained from COLUMNS
conditional column percentages
nonlinear relationship
curvlinear
treatments
different values of the explanatory variable
numerical analysis for c-->c
divide each number in the boxes by total number from the study make sure all add up to 100% called conditional percentages only use ROWS- not columns- ACROSS NOT BOTTOM
no correlation
no apparent overall trend between the two variables.
Researchers want to investigate whether taking aspirin regularly reduces the risk of heart attack. Four-hundred men between the ages of 50 and 84 are recruited as participants. The men are divided randomly into two groups: one group will take aspirin, and the other group will take a placebo. Each man takes one pill each day for three years, but he does not know whether he is taking aspirin or the placebo. At the end of the study, researchers count the number of men in each group who have had heart attacks.
population- men aged 50-84 the sample= 400 men the explanatory variable= oral medication the response variable= wether the subject had a heart attack
numerical analysis for q-->q
positive coorelation- as one variable increases, the other increases linear relationship- can be measured using coorelation cooeficient A correlation coefficient is a number that falls somewhere from −1 to 1 . The closer to 0 that the correlation coefficient is, the weaker the linear relationship. A correlation coefficient at or near −1 represents a strong negative linear correlation*. A correlation coefficient at or near 1 represents a strong positive linear correlation*.
direction
positive or negative which way is it going?
If both variables are quantitative ( Q→Q )
scatterplot explanatory variable on x axis response variable on y axis The relationship between the two variables is referred to as the correlation*.
A study was done of the amount of nicotine (mg) per cigarette in three types of cigarettes: king-size, 100 -mm menthol, and 100 -mm non-menthol. Which type of graphical display would be the best choice for this data?
side by side box plot Because the variables are amount of nicotine (quantitative) and type of cigarette (categorical), side-by-side box plots are the best choice to display the data.
If the explanatory variable is categorical and the response variable is quantitative ( C→Q If the explanatory variable is categorical and the response variable is quantitative ( C→Q)
side by side box plots
experimental unit
single object or individual to be measured
the greater the diversion of the outlier from the line
the greater the impact on the coorelation coefficient
if both variables are catagorical (c --> c)
two way frequency tables contingency table
Numerical Analysis for C→Q
use the 5 number summary for each group
Given a linear relationship between two variables, which r -value indicates a strong, negative relationship between the two variables?
−0.9564 −0.9564 is correct, because it is the closest value to −1.0. Correlation coefficients are between −1.0 and +1.0, so any value less than −1.0 is not a valid correlation coefficient.