Ch. 3

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

T or F: It is easier to identify relationships between two categorical variables when looking at the percentages in the respective joint categories than when looking at the count data in the respective joint categories.

True - counts can make data relationship hard to tell if there is a big difference between the total number of observations in each category

If the correlation of variables is close to 0, then we expect to see

a cluster of points with no apparent relationship on the scatterplot

Example of comparison problems include: -salary being broken down between male and female, -cost of living being broken down by region of country, -recovery rate of patients broken down between placebo and actual drug -starting salary of graduates being broken down by major -all of the above

all of the above

The limitation of covariance as a descriptive measure of association is that it

is very sensitive to the units of the variables

The tool that provides useful information about a data set by breaking it down into categories is the

pivot table

We are usually on the lookout for large correlations near:

-1 and +1

A useful way of comparing the distribution of a numerical variable across categories of some categorical variable is: A. side by side pivot table B. side by side box plot C. both

Both side by side pivot table and box plot

Which of the following patterns on a scatterplot are most likely to indicate a correlation close to 0 between the respective variables? Select all that apply. Graph A: points along an upward-sloping line Graph B: points along a downward-sloping line Graph C: points are hovering around in a non-descript "cloud" Graph D: points slope downward then upward in a parabola shape

C and D C: no clustering so equals out to 0 correlation D: involves a correlation of -1 and then +1 so those average out to 0

Correlation and covariance measure:

The strength and direction of a linear relationship between 2 numerical variables

In a focus group, a group of people is asked whether or not they would buy a new smartphone for a suggested retail price of $1000. Their answers are reported by gender in the table below. Gender: No Yes Total Female 65.0% 35.0% 100% Male 44.1% 55.9% 100% TOTAL 54.55% 45.45% 100% Based on this data, what can we conclude about the relationship between gender and interest in the product?

There appears to be a relationship between the variables, because the percentage of males who would buy the new smartphone is higher than the percentage of females. ----yes because there is a big difference in the majority of females saying "No" and majority of males saying "Yes". Percentages between males and females are very different

Tables used to display counts of a categorical variable are called: -cross tabs -contingency tables -neither of above -both of above

both

To examine the relationship between ZIP code and household income, we can use: -counts, percentages, and corresponding charts -scatterplots -correlations -box-whisker plots

box whisker plots: zip code is a categorical variable; household income is a numerical variable: we have a cat. var and a num. var. so box-whisker plot to analyze those (counts are good for categorical variables (scatterplot and correlation are good for numerical vars

If the standard deviation of X is 15, the covariance of X and Y is 94.5, the coefficient of correlation r = 0.90, then the variance of Y is 7.0.

false

One characteristic of "paired variables" is

same # of observations

The most common data format is

stacked

A line or curve superimposed on a scatterplot to quantify an apparent relationship is known as a(n): -slope -trendline

trendline

The data set below contains data on different medical conditions from the 2015 Medical Expenditure Panel Survey by the Agency for Healthcare Research and Quality. HW 2 Data.xlsx The variable "Care Male" lists the number of males (in thousands) who received care for the respective medical condition. Likewise, the variable "Care Female" lists the number of females (in thousands) who received care. The variables "Total Male" and "Total Female" list the number of males and females who reported the respective condition. You may assume that the difference between the two variables did not seek and/or receive care. You want to know whether there is a relationship between gender and seeking care for each of the conditions (i.e. are males more or less likely to seek care for cancer than females). For which conditions does gender appear to have a relationship with whether or not the person sought and/or received care? -Cancer -complications of surgery or device -Hernias -Hypertension -Non-malignant neoplasm -Poisoning by medical and non-medical substances ***look at graph

-complications of surgery or device -Hernias -Non-malignant neoplasm -Poisoning by medical and non-medical substances ----calculate percentages: M or F who sought care divided by total M or F with condition cancer-numbers are identical complications-higher % of females hernias-higher % of females hypertension-almost the same non-malignant neoplasm-higher % of males poisoning- higher % of males

Correlation and covariance can be used to examine relationships between numerical variables as well as for categorical variables that have been coded numerically.

False

We do not even try to interpret correlations numerically except possibly to check whether they are positive or negative.

False

Relationships between two variables are less evident when counts are expressed as percentages of row totals or column totals.

False -esp for categorical vars

Correlation can be affected by the measurement scales applied to X and Y variables.

False- not as sensitive

Below you will find current annual salary data and related information for 30 employees at Gamma Technologies, Inc. These data include each selected employees gender (1 for female; 0 for male), age, number of years of relevant work experience prior to employment at Gamma, number of years of employment at Gamma, the number of years of post-secondary education, and annual salary. The tables of correlations and covariances are presented below.: Correlations: IQ Age Prior exp. Gamma Exp. Edu Salary IQ 1.000 Age -0.111 1.000 PE 0.054 0.800 1.000 GE -0.203 0.616 0.587 1.000 Edu -0.039 0.518 0.434 0.342 1.000 Sal -0.154 0.623 0.723 0.780 0.617 1.000 Covariances: IQ Age Prior exp. Gamma Exp. Edu Salary IQ 0.259 Age -0.633 134.051 PE 0.117 39.060 19.045 GE -0.700 72.047 17.413 49.421 Edu -0.033 9.951 3.140 3.987 2.947 Sal -1825.97 249700.35 78699.75 14303.29 24747.68 5846.. Which two variables have the strongest positive linear relationship? -Salary and Age -Prior experience and Age

Prior experience and Age --looking at Correlations table: Salary and Age= 0.623 Prior exp and Age= 0.800 0.800>0.623

An economic development researcher wants to understand the relationship between the average monthly expenditure on utilities for households in a particular middle-class neighborhood and each of the following household variables: -family size, -approximate location of the household within the neighborhood, -indication of whether those surveyed owned or rented their home, -gross annual income of the first household wage earner, -gross annual income of the second household wage earner (if applicable), -size of the monthly home mortgage or rent payment, - the total indebtedness (excluding the value of a home mortgage) of the household. **** look at table for data*** relationship between Debt and Utilities: 0.778 How would you interpret this relationship?

There is a positive linear relationship between debt and utility payments. ==0.778 is greater than 0

A trend line on a scatterplot is a line or a curve that "fits" the scatter as well as possible.

True

Correlation is a single-number summary of a scatterplot.

True

Counts for categorical variable are often expressed as percentages of the total.

True

If the coefficient of correlation r = 0 .80, the standard deviations of X and Y are 20 and 25, respectively, then Cov(X, Y) must be 400.

True

If the standard deviations of X and Y are 15.5 and 10.8, respectively, and the covariance of X and Y is 128.8, then the coefficient of correlation r is approximately 0.77.

True

It is possible that the data points are close to a curve and have a correlation close to 0, because correlation is relevant only for measuring linear relationships.

True

Problems in data analysis where we want to compare a numerical variable across two or more subpopulations are called comparison problems.

True

Side-by-side box plots allow you to quickly see how two or more categories of a numerical variable compare.

True

Statisticians often refer to the pivot tables that display counts as contingency tables or crosstabs

True

Strongly related variables have a relationship close to zero if the relationship is nonlinear.

True

T or F: Side-by-side box plots allow you to quickly see how two or more categories of a numerical variable compare.

True

The advantage that the coefficient of correlation has over the covariance is that the former has a set lower and upper limit.

True

The correlation between two variables is a unitless and is always between -1 and +1.

True

The scatterplot is a graphical technique used to make apparent the relationship between two numerical variables

True

To form a scatterplot of X versus Y, X and Y must be paired variables.

True

We must specify appropriate bins for side-by-side histograms in order to make fair comparisons of distributions by category.

True

Scatterplots are also referred to as

X-Y charts

A scatterplot allows one to see: A.whether there is any relationship between two variables B.what type of relationship there is between two variables C.Both (a) and (b) are correct

both A and B are correct

Displaying all correlations between 0.6 and 0.999 on a scatterplot as green and all correlations between -1.0 and -0.6 as red is known as

conditional formatting

Which of the following are considered numerical summary measures?

correlation and covariance

We study relationships among numerical variables using

correlation, covariances, and scatterplots

To examine relationships between two categorical variables, we can use

counts and corresponding charts of the counts

Correlation is useful only for

measuring the strength of a linear relationship

We can infer that there is a strong linear relationship between two numerical variables when:

the points on a scatterplot cluster tightly around an upward-sloping OR downward-sloping straight line

An example of a joint category of two variables is the count of all non-drinkers who are also nonsmokers.

true

For a sample of 100 employees, the covariance between annual salary (measured in $) and hours worked per week (measured in hours) is equal to 809.43, while the covariance between annual salary (measured in $) and sick days per year (measured in days) is equal to 450.86. Which of the following statements is correct? -The relationship between annual salary and and hours worked per week is stronger than the relationship between annual salary and sick days per year. -We need additional info to determine which relationship is stronger

we need additional info --need correlation data


Kaugnay na mga set ng pag-aaral

Nursing Fundamental LEC chapter 5

View Set

human growth and development exam 2

View Set

Lab 3: Mystery Mutant Yeast & RD Mutants

View Set