Incorrect Questions - Organizing, Visualizing, and Describing Data
Which of the following is a potential problem with interpreting a correlation coefficient? Outliers Spurious correlation Both outliers and spurious correlation
Both outliers and spurious correlation Both outliers and spurious correlation are potential problems with interpreting correlation coefficients.
linear interpolation
P20 ≈ X2 + (location - number obs) × (X3 − X2)
spurious correlation
a relationship between two variables that is actually caused by a third factor
An analyst uses a software program to analyze unstructured data—specifically, management's earnings call transcript for one of the companies in her research coverage. The program scans the words in each sentence of the transcript and then classifies the sentences as having negative, neutral, or positive sentiment. The resulting set of sentiment data would most likely be characterized as: a. ordinal data. b. discrete data. c. nominal data.
a. ordinal data. Ordinal data are categorical values that can be logically ordered or ranked. In this case, the classification of sentences in the earnings call transcript into three categories (negative, neutral, or positive) describes ordinal data, as the data can be logically ordered from positive to negative. B is incorrect because discrete data are numerical values that result from a counting process. In this case, the analyst is categorizing sentences (i.e., unstructured data) from the earnings call transcript as having negative, neutral, or positive sentiment. Thus, these categorical data do not represent discrete data. C is incorrect because nominal data are categorical values that are not amenable to being organized in a logical order. In this case, the classification of unstructured data (i.e., sentences from the earnings call transcript) into three categories (negative, neutral, or positive) describes ordinal (not nominal) data, as the data can be logically ordered from positive to negative.
standard deviation
the square root of the variance
location of a percentile
Lp = (n+1)(P/100)
An analyst is using the data in the following exhibit to prepare a statistical report. Portfolio's Deviations from Benchmark Return for a 12-Year Period (%) Year 1 2.48 Year 7 -9.19 Year 2 −2.59 Year 8 −5.11 Year 3 9.47 Year 9 1.33 Year 4 −0.55 Year 10 6.84 Year 5 −1.69 Year 11 3.04 Year 6 −0.89 Year 12 4.72 The cumulative relative frequency for the bin −1.71% ≤ x < 2.03% is closest to: 0.250. 0.333. 0.583.
0.583 The cumulative relative frequency of a bin identifies the fraction of observations that are less than the upper limit of the given bin. It is determined by summing the relative frequencies from the lowest bin up to and including the given bin. The following exhibit shows the relative frequencies for all the bins of the data from the previous exhibit
A fund had the following experience over the past 10 years: Year Return 1 4.5% 2 6.0% 3 1.5% 4 −2.0% 5 0.0% 6 4.5% 7 3.5% 8 2.5% 9 5.5% 10 4.0% The target semideviation of the returns over the 10 years if the target is 2% is closest to: 1.42%. 1.50%. 2.01%.
1.50%. take all returns below target, find deviation squared for each, the sum. divide by n-1 then take sqrrt. The target semi-deviation is the square root of the sum of the squared deviations from the target, divided by n − 1:
The following exhibit shows the annual MSCI World Index total returns for a 10-year period. Year 1 15.25% Year 6 30.79% Year 2 10.02% Year 7 12.34% Year 3 20.65% Year 8 −5.02% Year 4 9.57% Year 9 16.54% Year 5 −40.33% Year 10 27.37% For Year 6-Year 10, the mean absolute deviation of the MSCI World Index total returns is closest to: 10.20%. 12.74%. 16.40%.
10.20%. Sum annual returns and divide by n to find the arithmetic mean of 16.40%. Column 2: Calculate the absolute value of the difference between each year's return and the mean from Column 1. Sum the results and divide by n to find the MAD.
A fixed-income portfolio manager creates a contingency table of the number of bonds held in her portfolio by sector and bond rating. The contingency table is presented here: Sector A AA AAA Communication Services 25 32 27 Consumer Staples 30 25 25 Energy 100 85 30 Health Care 200 100 63 Utilities 22 28 14 The relative frequency of AA rated energy bonds, based on the total count, is closest to: 10.5%. 31.5%. 39.5%.
10.5%. The relative frequency for any value in the table based on the total count is calculated by dividing that value by the total count. Therefore, the relative frequency for AA rated energy bonds is calculated as 85/806 = 10.5%.
The following exhibit shows the annual returns for Fund Y. Fund Y (%) Year 1 19.5 Year 2 −1.9 Year 3 19.7 Year 4 35.0 Year 5 5.7 The geometric mean return for Fund Y is closest to: 14.9%. 15.6%. 19.5%.
14.9%. The geometric mean return for Fund Y is found as follows: Fund Y = [(1 + 0.195) × (1 − 0.019) × (1 + 0.197) × (1 + 0.350) × (1 + 0.057)]^(1/5) − 1 = 14.9%.
A fund had the following experience over the past 10 years: Year Return 1 4.5% 2 6.0% 3 1.5% 4 −2.0% 5 0.0% 6 4.5% 7 3.5% 8 2.5% 9 5.5% 10 4.0% Question The standard deviation of the 10 years of returns is closest to: 2.40%. 2.53%. 7.58%.
2.53%.
The following exhibit shows the annual MSCI World Index total returns for a 10-year period. Year 1 15.25% Year 6 30.79% Year 2 10.02% Year 7 12.34% Year 3 20.65% Year 8 −5.02% Year 4 9.57% Year 9 16.54% Year 5 −40.33% Year 10 27.37% Question The fourth quintile return for the MSCI World Index is closest to: 20.65%. 26.03%. 27.37%.
26.03%. Quintiles divide a distribution into fifths, with the fourth quintile occurring at the point at which 80% of the observations lie below it. The fourth quintile is equivalent to the 80th percentile. To find the yth percentile (Py), we first must determine its location. The formula for the location (Ly) of a yth percentile in an array with n entries sorted in ascending order is Ly = (n + 1) × (y/100). In this case, n = 10 and y = 80%, soL80 = (10 + 1) × (80/100) = 11 × 0.8 = 8.8. With the data arranged in ascending order (−40.33%, −5.02%, 9.57%, 10.02%, 12.34%, 15.25%, 16.54%, 20.65%, 27.37%, and 30.79%), the 8.8th position would be between the 8th and 9th entries, 20.65% and 27.37%, respectively. Using linear interpolation, P80 = X8 + (Ly − 8) × (X9 − X8),
coefficient of variation
Standard deviation divided by the mean
When analyzing investment returns, which of the following statements is correct? The geometric mean will exceed the arithmetic mean for a series with non-zero variance. The geometric mean measures an investment's compound rate of growth over multiple periods. The arithmetic mean measures an investment's terminal value over multiple periods.
The geometric mean measures an investment's compound rate of growth over multiple periods. B is correct. The geometric mean compounds the periodic returns of every period, giving the investor a more accurate measure of the terminal value of an investment.
Which of the following data types would be classified as being categorical? a. Discrete b. Nominal c. Continuous
b. nominal B is correct. Categorical data (or qualitative data) are values that describe a quality or characteristic of a group of observations and therefore can be used as labels to divide a dataset into groups to summarize and visualize. The two types of categorical data are nominal data and ordinal data. Nominal data are categorical values that are not amenable to being organized in a logical order, while ordinal data are categorical values that can be logically ordered or ranked. A is incorrect because discrete data would be classified as numerical data (not categorical data). C is incorrect because continuous data would be classified as numerical data (not categorical data).
An equity analyst gathers total returns for three country equity indexes over the past four years. The data are presented below. Time Period Index A Index B Index C Year t-3 15.56% 11.84% −4.34% Year t-2 −4.12% −6.96% 9.32% Year t-1 11.19% 10.29% −12.72% Year t 8.98% 6.32% 21.44% Each individual row of data in the table can be best characterized as: panel data. time-series data. cross-sectional data.
cross-sectional data. Cross-sectional data are observations of a specific variable from multiple observational units at a given point in time. Each row of data in the table represents cross-sectional data. The specific variable is annual total return, the multiple observational units are the three countries' indexes, and the given point in time is the time period indicated by the particular row. A is incorrect because panel data consist of observations through time on one or more variables for multiple observational units. The entire table of data is an example of panel data showing annual total returns (the variable) for three country indexes (the observational units) by year. B is incorrect because time-series data are a sequence of observations of a specific variable collected over time and at discrete and typically equally spaced intervals of time, such as daily, weekly, monthly, annually, and quarterly. In this case, each column (not row) is a time series of data that represents annual total return (the specific variable) for a given country index, and it is measured annually (the discrete interval of time).
A portfolio manager invests €5,000 annually in a security for four years at the prices shown in the following exhibit. Purchase Price of Security (€ per unit) Year 1 62.00 Year 2 76.00 Year 3 84.00 Year 4 90.00 The average price is best represented as the: harmonic mean of €76.48. geometric mean of €77.26. arithmetic average of €78.00
harmonic mean of €76.48. harmonic mean is appropriate for determining the average price per unit. It is calculated by summing the reciprocals of the prices, then averaging that sum by dividing by the number of prices, then taking the reciprocal of the average:4/[(1/62.00) + (1/76.00) + (1/84.00) + (1/90.00)] = €76.48.
Equity return distributions are best described as being: leptokurtic. platykurtic. mesokurtic.
leptokurtic Most equity return distributions are best described as being leptokurtic (i.e., more peaked than normal).
For a positively skewed unimodal distribution, which of the following measures is most accurately described as the largest? Median Mean Mode
mean For a positively skewed unimodal distribution, the mode is less than the median, which is less than the mean.
A two-dimensional rectangular array would be most suitable for organizing a collection of raw: panel data. time-series data. cross-sectional data.
panel data. Panel data consist of observations through time on one or more variables for multiple observational units. A two-dimensional rectangular array, or data table, would be suitable here as it is comprised of columns to hold the variable(s) for the observational units and rows to hold the observations through time. B is incorrect because a one-dimensional (not a two-dimensional rectangular) array would be most suitable for organizing a collection of data of the same data type, such as the time-series data from a single variable. C is incorrect because a one-dimensional (not a two-dimensional rectangular) array would be most suitable for organizing a collection of data of the same data type, such as the same variable for multiple observational units at a given point in time (cross-sectional data).
nvestors should be most attracted to return distributions that are: negatively skewed. positively skewed. normal.
positively skewed. Investors should be attracted by a positive skew (distribution skewed to the right) because the mean return falls above the median. Relative to the mean return, positive skew amounts to a limited, though frequent, downside compared with a somewhat unlimited, but less frequent, upside.
An analyst examined a cross-section of annual returns for 252 stocks and calculated the following statistics: Arithmetic Average 9.986% Geometric Mean 9.909% Variance 0.001723 Skewness 0.704 Excess Kurtosis 0.503 This distribution is best described as: negatively skewed. having no skewness. positively skewed.
positively skewed. C is correct. The skewness is positive, so it is right-skewed (positively skewed).
variance
standard deviation squared
An equity analyst gathers total returns for three country equity indexes over the past four years. The data are presented below. Time Period Index A Index B Index C Year t-3 15.56% 11.84% −4.34% Year t-2 −4.12% −6.96% 9.32% Year t-1 11.19% 10.29% −12.72% Year t 8.98% 6.32% 21.44% Each individual column of data in the table can be best characterized as: panel data. time-series data. cross-sectional data.
time-series data. Time-series data are a sequence of observations of a specific variable collected over time and at discrete and typically equally spaced intervals of time, such as daily, weekly, monthly, annually, and quarterly. In this case, each column is a time series of data that represents annual total return (the specific variable) for a given country index, and it is measured annually (the discrete interval of time). A is incorrect because panel data consist of observations through time on one or more variables for multiple observational units. The entire table of data is an example of panel data showing annual total returns (the variable) for three country indexes (the observational units) by year. C is incorrect because cross-sectional data are a list of the observations of a specific variable from multiple observational units at a given point in time. Each row (not column) of data in the table represents cross-sectional data.
A line chart with two variables—for example, revenues and earnings per share—is best suited for visualizing: the joint variation in the variables. underlying trends in the variables over time. the degree of correlation between the variables.
underlying trends in the variables over time. An important benefit of a line chart is that it facilitates showing changes in the data and underlying trends in a clear and concise way. Often a line chart is used to display the changes in data series over time. A is incorrect because a scatter plot, not a line chart, is used to visualize the joint variation in two numerical variables. C is incorrect because a heat map, not a line chart, is used to visualize the values of joint frequencies among categorical variables.
A tree-map is best suited to illustrate: underlying trends over time. joint variations in two variables. value differences of categorical groups.
value differences of categorical groups. A tree-map is a graphical tool used to display and compare categorical data. It consists of a set of colored rectangles to represent distinct groups, and the area of each rectangle is proportional to the value of the corresponding group. A is incorrect because a line chart, not a tree-map, is used to display the change in a data series over time. B is incorrect because a scatter plot, not a tree-map, is used to visualize the joint variation in two numerical variables.