DS - Chapter 4, 5, and 6

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

5. 1. 27 This chart to the right summarizes explanations given for missing work. The data are the explanations given for 100 absences by employees on the assembly​ line, administration, and supervising managers. The explanations are classified as​ medical, family​ emergency, or other. ​(a) For which group are absences due to medical reasons most​ common? ​(b) Are the two variables​ associated? How can you​ tell?

(a): Assembly Line (b): Yes​, because the bars and the segments of the given chart are not approximately identical for the three groups

5. 1. 47 The table to the right shows percentages of men and women employed in four industries. Use the table to complete parts​ (a) through​ (c) below. Men Women Advertising 33% 67% Publishing 36% 64% Law firms 42% 58% Banking 64% 36% ​(a) Is there association between the gender of the employee and the​ industry? How can you​ tell? ​(b) Interpret the association​ (or lack​ thereof). Select all that apply.

(a): F. ​Yes, because the percentages differ among the columns. (b): A. It means that the percentages of men in different industries are not all the same. E. One consequence is that some industries have a higher proportion of male employees than female​ employees, assuming that approximately​ 50% of the workforce is male. F. One consequence is that some industries have a higher proportion of female employees than male​ employees, assuming that approximately​ 50% of the workforce is female. *Use JMP and create a table* Find the​ chi-squared statistic and​ Cramer's V if these data were derived from n=400 ​employees, with 100 in each industry. Repeat the process for data derived from n=1,200 ​employees, with 300 in each industry. Which statistic​ changes? Which remains the​ same? For n=400 ​employees, with 100 in each​ industry, χ2=23.9223.92. (Round to two decimal places as​ needed.) For n=400 ​employees, with 100 in each​ industry, V=. 24.24. ​(Round to two decimal places as​ needed.) If instead the data were derived from n=1,200 ​employees, with 300 in each​ industry, which statistic changes and which remains the​ same? Select the correct choice below and fill in the answer box to complete your choice. ​(Type an integer or a​ fraction.) C. Chi-squared is 33 times larger and​ Cramer's V is unchanged.

4. 1. 1 Mode

A mode of a numerical variable identifies the position of an isolated cluster of values in the distribution. For a categorical​ variable, the mode is the most common category. In either​ case, the mode is the position of a peak in the histogram.

5. 1. 21 T/F If the percentage of female job candidates who are hired is larger than the percentage of male candidates who are​ hired, then there is association between the categorical variables Sex​ (male, female) and Hire​ (yes, no)

Answer: The statement is true. Reason= Two categorical variables are associated if the conditional distribution of one variable depends on the value of the other. Since the Hire variable depends on the Sex​ variable, the two variables are associated.

5. 1. 33 A study of purchases at a​ 24-hour supermarket recorded two categorical​ variables: the time of the purchase​ (8 A.M to 8 P.M vs. late​ night) and whether the purchase was made by someone with children present. Would you expect these variables to be​ associated?

Answer: Yes. Fewer shoppers with children present would be expected during late night. Reason= Late night purchases made by someone with children present are going to be much less common than daytime purchases made with children​ present, because most children are asleep at night.​ Thus, it is likely that these two variables will be associated.

4. 1. 23 If the median size used by 550 songs is 3.5 ​MB, will these all fit on a device that has 2 GB of​ storage? Can you​ tell?

From the given​ information, it is not possible to determine if these songs will fit on the device.

4. 1. 17 False: (The empirical rule indicates that the range from y−s up to y+s holds​ two-thirds of the distribution of any numerical variable.) True:(the empirical rule works well only when the distribution of the numerical variables is unimodal and symmetric.)

If the distribution of a numerical variable is a​ bell-shaped (symmetric and​ unimodal), the empirical rule uses the standard deviation s to describe how the data cluster around the mean y. According to the empirical​ rule, 68%​ (about two-thirds) of the data lie within one standard deviation of the​ mean, 95% of the data lie within two standard​ deviations, and almost all of the data fall within three standard deviations of the mean. Since not all distributions are​ symmetric, the statement is false.

4. 1. 7 Variance

The average squared deviation from the mean is called the variance. The symbol s2 is used for the​ variance, with the exponent 2 to remind you that the deviations are squared before averaging them.

4. 1. 3 Interquartile Range (IQR)

The distance from the lower quartile to the upper quartile on a boxplot is known as the interquartile range. This is the length of the box in the boxplot.

4. 1. 5 Skewed

The extremes at the left and right of a histogram where the bars become short locate the tails of the distribution. If one tail of the distribution stretches out farther than the​ other, the distribution is skewed.

5. 1. 18 T/F If a variable X is associated with a variable​ Y, then Y is caused by X.

The statement is false. Association does not imply causation.

6. 1. 25 If the correlation between number of customers and sales in dollars in retail stores is r = ​0.6, then what would be the correlation if the sales were measured in thousands of​ dollars? In​ euros? (1 euro is worth about 1.2 to 1.5​ dollars.)

What would be the correlation if the sales were measured in thousands of​ dollars? Choose the correct answer below. Answer: C. There would be no change. Reason= Because the correlation has no​ units, it is unaffected by the scale of measurement. Recall that the correlation has no units and always lies between −1 and +1. What would be the correlation if the sales were measured in euros instead of​ dollars? Choose the correct answer below. Answer: There would be no change. Reason=Because the correlation has no​ units, it is unaffected by the scale of measurement. Recall that the correlation has no units and always lies between −1 and +1.

The data available below give the amount of CO2 produced in 40 nations along with the level of economic activity during a recent year. CO2 emissions are given in millions of metric tons. Economic activity is given by the gross domestic product​ (GDP), a summary of overall economic output. Complete parts​ (a) through​ (d) below.

​(a) Make a scatterplot of CO2 emissions and GDP. Which variable have you used as the response and which as the explanatory​ variable? Which scatterplot below shows the​ data? *Make a scatterplot with JMP* Which variable have you used as the response and which as the explanatory​ variable? Answer: The response variable is CO2 emissions, and the explanatory variable is GDP. ​(b) Describe any association between CO2 emissions and GDP. Answer: D. Ignoring any​ outliers, there is a strong positive linear association. ​(c) Find the correlation between CO2 emissions and GDP. Answer: ​corr(x ,​y)= 0.661 ​(Round to three decimal places as​ needed.) ​(d) Which cases are​ outliers? How does the correlation change if outliers are​ removed? List all the points that are outliers. Use GDP as the first coordinate and CO2 emissions as the second coordinate. Answer: (10034.3, 7888.4), (11438.4, 8589.4) If the outliers are​ removed, the correlation to . Answer: increases ; 0.876.

5. 1. 39-T A service station near an interstate highway sells three grades of​ gasoline: regular,​ plus, and premium. During the last​ week, the manager counted the number of cars that purchased these types of gasoline. He kept the counts separate for weekdays and the weekend. The accompanying data table has two categorical variables. One distinguishes weekdays from​ weekends, and the other indicates the type of gas​ (regular, plus, or​ premium). Complete parts​ (a) through​ (e) below ​ (a) Find the contingency table defined by the day of the week and the type of gas. Include the marginal distributions. Complete the contingency table​ below, using counts throughout.

(a): download data, then imput it to JMP, then in distribution; put grades in "y" and day in "by". (b): see frequency on the side(in JMP) (c): now change distribution; put days in "y" and grades in "by". (d): ​No, because these conditional distributions are not directly comparable. (e):weekends, the percentage of all sales on those days that are premium sales is greatest.

4. 1. 51 The figure shows the histogram of the annual tuition at 69 top undergraduate business schools. (a)Estimate from the figure the center and spread of the data. Are the usual notions of center and spread useful for these​ data? (b)Describe the shape of the histogram. (c)If you were only shown the​ boxplot, would you be able to identify the shape of the distribution of these​ data? (d)Can you think of an explanation for the shape of the​ histogram?

(a):The median is about $14,000 and the interquartile range is about $22,000. No, because they do not capture the bimodal nature of the data. (b):The shape of the histogram is bimodal. (c):NO (d):The first mode represents public schools and the second mode represents private schools.

6. 1. 3 Match the the value of the correlation to the data in the scatterplot. ​(a) r=0 ​(b) r=0.5 ​(c) r=0.8 ​(d) r=−0.6

(a): IV. Reason= When r equals​ 0, there is no pattern among the points. The scatterplot to the right is an example of a scatterplot with r approximately equal to 0. (b):III. Reason= The larger the magnitude of​ r, the tighter the points cluster along the diagonal line. The scatterplot to the right is an example of a scatterplot with r approximately equal to 0.5. (c): I. Reason= The larger the magnitude of​ r, the tighter the points cluster along the diagonal line. The scatterplot to the right is an example of a scatterplot with r approximately equal to 0.8. (d): II. Reason= The larger the magnitude of​ r, the tighter the points cluster along the diagonal line. The scatterplot to the right is an example of a scatterplot with r approximately equal to −0.6.

5. 1. 24 The accompanying table summarizes the status of 1000 loans made by a bank. Each loan either ended in default or was repaid. Loans were divided into large​ (more than​ $50,000) or small size. Repaid Default Large 30 10 Small 900 20 (a) What would it mean to find association between these​ variables? ​(b) Does the table show​ association? (You should not need to do much​ calculation.)

(a): Large and small loans have different chances of being repaid. (b): ​Yes, because the payment statuses among large and small loans are not approximately the same.

6. 1. 23 States in a country are allowed to set their own rates for sales taxes as well as taxes on​ services, such as phone calls. The scatterplot shown below graphs the state taxes charged for wireless phone calls​ (as a​ percentage) versus the state sales taxes​ (also as a​ percentage). Complete parts a through d. ​(a) Describe the​ association, if​ any, that you find in the scatterplot. There is . ​(b) Estimate the correlation between the two variables from the plot. Is it​ positive, negative, or​ zero? Is it closer to zero or to ±​0.5? The correlation is It is closer to . ​(c) What is the effect of the cluster of states with no sales tax in the lower left corner on the​ association? If these were excluded from the​ analysis, would the correlation​ change? If these were excluded from the​ analysis, would the correlation​ change? ​(d) Would it be appropriate to conclude that states that have high sales tax charge more for services like wireless phone​ use?

(a): a weak positive association. (b): positive. ; 0.5. (c): The cluster of states increases the correlation. ; A. If the cluster of states were excluded from the​ analysis, the correlation would decrease. (d): D. No. When the outliers are​ excluded, the association is too weak to arrive at this conclusion.

4. 1. 11 False: (The boxplot shows the mean plus or minus one standard deviation of the data.) True: (The boxplot shows the​ median, with the lower edge at the 25th percentile point and the upper edge at the 75th percentile point.)

A boxplot is a graphical summary of a numerical variable that shows the​ five-number summary of a variable in a graph. Vertical lines locate the median and quartiles. Joining these lines with horizontal lines forms a box. The span of the box locates the middle half of the​ data, and the length of the box is equal to the interquartile range​ (IQR).

6. 1. 13 T/F If the correlation between the growth of a stock and the growth of the economy as a whole is close to​ 1, then this would be a good stock to hold during a recession when the economy shrinks.

Answer: C. False. The value of the stock would fall along with the economy. It would be better to have one that was negatively related to the overall economy. Reason=The correlation can reach​ 1, but only if all the data fall exactly on a diagonal line. Since the correlation is almost equal to​ 1, the relationship between the stock and the economy is strong positive. The​ stock's growth will match that of the economy. During​ recession, the economy will​ shrink, or​ decline, and the stock will also decline. A stock that has a negative​ relationship, or r<​0, would rise during recession.

6. 1. 33 Which data do you think produce a larger correlation between the weight and the price of​ diamonds: using a collection of gems of various​ cuts, colors, and​ clarities, or a collection of stones that have the same​ cut, color, and​ clarity?

Answer: D. The correlation is larger among stones of the same​ cuts, colors and clarities. These factors add variation around the correlation line. By forcing these to be the​ same, the pattern is more consistent. Reason= It is assumed that the stones with similar attributes will have similar weights and​ prices, whereas, the gems that vary will be less consistent and have different weights and prices. The similar stones will have a consistent pattern and as such a larger correlation.

4. 1. 45 A survey in 2006 reported that the median household net worth in a country was​ $93,100 in 2004. In​ contrast, the mean household net worth was​ $448,200. How is it possible for the mean to be so much larger than the​ median?

Answer: The distribution of income is very​ right-skewed, with the upper tail reaching out to very high incomes. Reason= Income distributions are typically​ right-skewed, with the upper tail reaching out to very high incomes. As a result the mean is much larger than the median.

5. 1. 19 T/F If the categorical variable that identifies the supervising manager is associated with the categorical variable that indicates a problem with processing​ orders, then the manager is causing the problems.

Answer:False. Due to the possible presence of a lurking​ variable, association cannot be interpreted as causation. Reason=In all data where there is an​ association, it is possible that a lurking variable—a concealed variable that affects the apparent relationship between two other variables—exists. Thus, it cannot be assumed that association indicates causation. In the given​ sitution, it cannot be assumed that the association between the supervising manager and a problem with processing orders indicates that the manager is causing the problems.

4. 1. 9 z-score

A​ z-score is the distance from the mean of a set of​ data, counted as a number of standard deviations.

5. 1. 46 After a collapse of the stock​ market, a business newspaper polled its readers and asked whether they expected another big drop in the market during the next 12 months. A contingency table of the responses is available below. ​(a)Quantify the amount of association between the​ respondents' stock ownership and expectation about the chance for another big drop in stock prices. ​(b)Reduce the table by combining the counts of very likely and somewhat likely and the counts of not very likely and not likely at​ all, so that the table has three​ rows: likely, not​ likely, and unsure. Compare the amount of association in this table to that in the original table.

​(a) Compute the​ chi-squared statistic for the table. χ2=3.03.0 (Round to one decimal place as​ needed.) Compute​ Cramer's V for the table. V=. 09.09(Round to two decimal places as​ needed.) Describe the association between the​ respondents' stock ownership and the expectation about the chance for another big drop in stock prices. Choose the correct answer below. = The association is weak. ​(b) Compute the​ chi-squared statistic for the reduced table. χ2=. 5.5 (Round to one decimal place as​ needed.) Compute​ Cramer's V for the table. V=. 04.04 (Round to two decimal places as​ needed.) Describe the association between the​ respondents' stock ownership and the expectation about the chance for another big drop in stock prices. Choose the correct answer below = The association is weak. Compare the amount of association in this table to that in the original table = The associations in both tables are weak.

6. 1. 39-T Each​ month, a government agency releases its latest estimate of construction activity in the housing industry. A key measure is the percentage change in the number of new homes under construction. Does the release of this number come with a change in the stock​ market? The accompanying data show the percentage change in the number of​ new, privately owned housing units started each​ month, as reported by the government agency. The data also include the percentage change in the​ S&P 500 index on the same day the government agency releases the housing results. Complete parts a through c.

​(a) Draw the scatterplot of the percentage change in the​ S&P 500 index on the percentage change in the number of new housing units started. Describe the association. *Use JMP to create scatterplot* Answer: B. Describe the association. Choose the correct answer below. Answer: E. There is little or no association. ​(b) Find the correlation between the two variables in the scatterplot. What does the size of the correlation suggest about the strength of the association between these​ variables? *Make JMP to show line and anaswer is under Summary Statistics, for Correlation and Value* Answer = (negative)−0.101 (Type an integer or decimal rounded to three decimal places as​ needed.) Describe the correlation. Choose the correct answer below. Answer: C.The size of the correlation suggests the association is slightly negative. ​(c) Suppose you know that there was a​ 5% increase in the number of new homes. From what​ you've seen, can you anticipate movements in the stock​ market? Answer: C. No, because the association is too weak to be useful for prediction.

6. 1. 31 The timeplot shown to the right shows the values of two indices of the economy in a​ country: Inflation​ (left axis, in​ red, measured as the​ year-over-year change in a price​ index) and the Survey of Consumer Sentiment​ (right axis, in​ blue, from a​ university). Both series are monthly and cover the time period January 2004 through December 2005. Complete parts a through e.

​(a) From the​ chart, do you think that the two sequences are​ associated? Answer: C. Yes.​ Overall, they appear to move in opposite directions. ​ (b) The scatterplot shown to the right displays the same time​ series, with Consumer Sentiment plotted versus Inflation. Does this scatterplot change your impression of the association between the​ two? Answer: D. No. The graph shows a negative association. ​ (c) Estimate the correlation between these two series. Answer: The correlation r is approximately −0.30. ​ (d) When looking at the relationship between two time​ series, what are the advantages of these two​ plots? Each shows some​ things, but hides others. Which helps you visually estimate the​ correlation? Which tells you the timing of the extreme lows and highs of each​ series? Answer: B. The timeplot shows the timing of the events. The scatterplot shows the contemporaneous association more clearly and reveals the linear association. ​ (e) Does either plot prove that inflation causes changes in consumer​ sentiment? Answer: C. No. The plots are only able to show​ association, not causation. Other factors in the economy could cause both series to move.

6. 1. 47-T The data in the accompanying table describe housing prices near a large city. Each of the 40 data points of this data table describes a region of the metropolitan area. The column labeled Selling Price gives the median price for homes sold in that area during 1999 in thousands of dollars. The column labeled Crime Rate gives the number of crimes committed in that​ area, per​ 100,000 residents. Complete parts a through e.

​(a) Make a scatterplot of the selling price on the crime rate. Which observation stands out from the​ others? Is this outlier unusual in terms of either marginal​ distribution? *Make scatterplot with JMP* Answer: plot with selling price on the y-axis Review the scatterplot to determine which observation stands out from the others. Compare its​ x-coordinate to the other​ x-values and its​ y-coordinate to the other​ y-values to determine if this outlier is unusual in terms of either marginal distribution. Select the correct choice below and fill in the answer box to complete your choice. ​(Type an ordered pair. Use integers or decimals for any numbers in the​ expression.) Answer: B. The observation at left parenthesis 36.81 comma 96.719 right parenthesis(36.81,96.719) is an outlier. This data point is unusual in terms of crime​ rate, but not selling price. ​(b) Find the correlation using all of the data as shown in the prior scatterplot. Answer: corr(x, y)= (negative)−0.224 ​(Round to three decimal places as​ needed.) ​(c) Exclude the distinct outlier and redraw the scatterplot focused on the rest of the data. Does your impression of the relationship between the crime rate and the selling price​ change? *Redraw the scatterplot. Choose the correct graph below.* Does your impression of the relationship between the crime rate and the selling price​ change? Answer:C. Yes. The new scatterplot shows the pattern more clearly. There is now a weak negative trend that appears to be curved. ​(d) Compute the correlation without the outlier. ​corr(x, y)= ​(Round to three decimal places as​ needed.) Answer: -0.430 ​(e) Can we conclude from the correlation that crimes in the this area cause a rise or fall in the value of real​ estate? Answer: A.No. The correlation measures the association between the two variables. It does not signify causation.

6. 1. 45-T The accompanying data report characteristics of 15 types of cars sold in a certain country last year. One column gives the official mileage​ (combined MPG), and another gives the rated horsepower. Complete parts​ (a) through​ (e) below.

​(a) Make a scatterplot of the two variables. Which variable makes the most sense to put on the​ x-axis, and which belongs on the​ y-axis? Choose the correct answer below. Answer: C. Horsepower makes more sense on the​ x-axis, given that one would want to understand variation on MPG. MPG belongs on the​ y-axis. Make a scatterplot. Choose the correct graph below. *Use JMP to make scatterplot* Answer: A. ​(b) Describe any pattern in the plot. Be sure to identify any outliers. Choose the correct answer below. Answer: C. There is a strong​, negative linear association with some variation. There are no outliers. ​(c) Find the correlation between these two variables. Answer: r = (negative)−0.852 (Round to three decimal places as​ needed.) ​(d) Interpret the correlation in the context of these data. Does the correlation provide a good summary of the strength of the​ relationship? Answer: C. The correlation shows that as horsepower​ increases, the MPG decreases. The correlation does provide a good summary of the strength because the scatterplot indicates that there is a strong linear​ association, with little variation. ​(e) Use the correlation line to estimate the mileage of a car with 200 horsepower. Does this seem like a sensible​ procedure? What if the car has a 1.6 liter​ engine? Answer: The estimated mileage of a car with 200 horsepower is 27 MPG. ​(Round to the nearest integer as​ needed.) Does this seem like a sensible​ procedure? Answer: B. This seems like a sensible procedure because the scatterplot appears to have a linear association. The estimated mileage of a car with a 1.6 liter engine is MPG. Answer: 25 MPG

5. 1. 53 The data to the right compare the​ on-time arrival performance of two​ airlines, X and Y. The table shows the status of 13,767 arrivals for one year. Complete parts​ (a) through​ (c) below.

​(a) On the basis of this initial​ summary, find the percentages​ (row or​ column) that are appropriate to comparing the​ on-time arrival rates of the two airlines. Which arrives on time more​ often? Airline X Airline Y On time 6,064 5,030 80.9​% 80.2​% Delayed 1,431 1,242 19.1​% 19.8​% =100% =100% ​(Round to one decimal place as​ needed.) Which airline arrives on time more​ often? Answer: X ​(b) The next two tables organize these same flights by destination. The first also shows arrival time and the second shows airline. Does it appear that a lurking variable might be at work​ here? How can you​ tell? Select the correct choice below​ and, if​ necessary, fill in the answer boxes to complete your choice. ​Answer: Yes, because the​ on-time rate for flights to Denver is 81.281.2​%, whereas the​ on-time rate for flights to Philadelphia is 80.680.6​%. This is important since most of airline​ X's flights go to​ Denver, whereas most of airline​ Y's flights go to Philadelphia.​(Round to one decimal place as​ needed.) ​(c) Each cell of the following table shows the number of​ on-time arrivals for each airline at each destination. Is the destination a lurking factor behind the original 2×2 table? Select the correct choice below and fill in the answer boxes to complete your choice. *Find by dividing numbers on this chart, by the second chart in (b)* Answer: ​Yes, because airline Y has a better​ on-time arrival rate in Denver (80.9​% for X vs. 84.7​% for​ Y) and in Philadelphia (76.8​% for X vs. 80.9​% for​ Y).


Set pelajaran terkait

Investment Banking 400 Qs - basics only

View Set

Exercise - 750 english example sentence

View Set

Exam 3: Developmental Psychology

View Set

ARH2051 Spring 2015 Exam 2 (Final) Review Slides (Professor Ashley Jones)

View Set