Statistics - Chapter 4 Exam #2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Trimmed Mean

To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. For example, for the n = 33 P/E ratios, we want a 5 percent trimmed mean (i.e., k = .05). To determine how many observations to trim, multiply k by n, which is 0.05 x 33 = 1.65 or 2 observations. So, we would remove the two smallest and two largest observations before averaging the remaining values. The trimmed mean mitigates the effects of very high values, but still exceeds the median.

Mean

A familiar measure of center.

Growth Rates

A variation on the geometric mean used to find the average growth rate for a time series.

Variation

is the "spread" of data points about the center of the distribution in a sample. Consider the following measures of variability:

Group Mean and Standard Deviation

Each interval j has a midpoint mj and a frequency fj. We calcite the estimated mean by multiplying the midpoint of each class by its class frequency, taking the sum over all k classes and dividing by sample size n

Negative z

means the observation is to the left of the mean

Positive z

means the observation is to the right of the mean

standardized variable

redefines each observation in terms of the number of standard deviations from the mean.

Estimating Sigma

For a normal distribution, the range of values is almost 6s (from m - 3s to m + 3s). If you know the range R (high - low), you can estimate the standard deviation as s = R/6. Useful for approximating the standard deviation when only R is known. This estimate depends on the assumption of normality.

Chebyshev's Theorem

For any population with mean m and standard deviation s, the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 - 1/k2]. For k = 2 standard deviations, 100[1 - 1/22] = 75% So, at least 75.0% will lie within m + 2s For k = 3 standard deviations, 100[1 - 1/32] = 88.9% So, at least 88.9% will lie within m + 3s

Method of Medians

For small data sets, find quartiles using method of medians: Step 1: Sort the observations. Step 2: Find the median Q2. Step 3: Find the median of the data values that lie below Q2. Step 4: Find the median of the data values that lie above Q2.

Weighted Mean

Is a sum that assigns each data value a weight wj that represents a fraction of the total (the k weights must sum to 1)

Box-Plot Midhinge

Is the average of the first and third quartiles Formula M= Q1+Q2/2 Median < Midhinge - Skewed right (longer right tail) Median = Midhinge - Symmetric (tails equal) Median > Midhinge - Skewed Left (longer left tail)

Range

Is the difference between the largest and smallest observations: range = xmax - xmin

Shape

Key Characteristic of numerical data Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal? Compare mean and median or look at the histogram to determine degree of skewness. Figure 4.10 shows prototype population shapes showing varying degrees of skewness.

Variability

Key Characteristic of numerical data How much dispersion is there in the data? How spread out are the data values? Are there usual values?

Center

Key Characteristic of numerical data Where are the data values concentrated? What seem to be typical or middle data values? Is there central tendency?

Midrange

The midrange is the point halfway between the lowest and highest values of X. Easy to use but sensitive to extreme data values Useful when you have an xmin and xmax Formula = xmin + xmax/2

Mode

The most frequently occurring data value. May have multiple modes or no mode. The mode is most useful for discrete or categorical data with only a few distinct data values. For continuous data or data with a wide range, the mode is rarely useful.

The Empirical Rule

The normal distribution is symmetric and is also known as the bell-shaped curve. The Empirical Rule states that for data from a normal distribution, we expect the interval ± k to contain a known percentage of data. For k = 1, 68.26% will lie within m + 1s k = 2, 95.44% will lie within m + 2s k = 3, 99.73% will lie within m + 3s

Variance and Standard Deviation

The population variance is defined as the sum of squared deviations from the mean divided by the population size. If we had a sample, we placed the population mean with the sample mean to get the sample variance Standard deviation (the square root of the variance). The standard deviation is a single number that helps us understand how individual values in a data set vary from the mean.

Correlation Coefficient

The sample correlation coefficient is a statistic that describes the degree of linearity between paired observations on two quantitative variables X and Y.

Mean Absolute Deviation (MAD)

This statistic reveals the average distance from the center. Absolute values must be used since otherwise the deviations around the mean would sum to zero

Coefficient of Variation

The CV is the standard deviation expressed as a percent of the mean. In some data sets, the standard deviation can actually exceed the mean.

Covariance

The covariance of two random variables X and Y (denoted σXY ) measures the degree to which the values of X and Y change together. A correlation coefficient is the covariance divided by the product of the standard deviations of X and Y. Formula for pop Formula for sample

Geometric Mean

The geometric mean (G) is a multiplicative average.

Median

The median (M) is the 50th percentile or midpoint of the ordered sample data. M separates the upper and lower halves of the ordered observations. If n is odd, the median is the middle observation in the ordered data set. If n is even, the median is the average of the middle two observations in the ordered data set.

Deciles

are data that have been divided into 10 groups.

Percentiles

are data that have been divided into 100 groups. For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you. Percentiles may be used to establish benchmarks for comparison purposes (e.g. health care, manufacturing, and banking industries use 5th, 25th, 50th, 75th and 90th percentiles). Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios. Percentiles can be used in employee merit evaluation and salary benchmarking.

Quartiles

are data that have been divided into 4 groups. are scale points that divide the sorted data into four groups of approximately equal size. The three values that separate the four groups are called Q1, Q2, and Q3, respectively. The second quartile Q2 is the median, a measure of central tendency. Q1 and Q3 measure dispersion since the interquartile range Q3 - Q1 measures the degree of spread in the middle 50 percent of data values. The first quartile Q1 is the median of the data values below Q2, and the third quartile Q3 is the median of the data values above Q2.

Quintiles

are data that have been divided into 5 groups.

Box Plot

based on the five-number summary A box plot shows variability and shape. Use quartiles to detect unusual data points by defining fences using the following formulas: Lower Fence - Upper Fence - Inner Fence - Q1 -1.5(Q3-Q1) Outer Fence Values outside the inner fences are unusual while those outside the outer fences are outliers. Here is a visual illustrating the fences:


Kaugnay na mga set ng pag-aaral

Interaksyon ng Suplay at Demand - AP 9

View Set

Менеджмент і маркетинг в ЗМІ

View Set

Week 5: Traveling Waves and Sound

View Set