Statistics Chapter 3 (Bentley)

Ace your homework & exams now with Quizwiz!

Point Estimator

A sample statistic of the corresponding population parameter.

Distribution Shape: Skewness

An important measure of the shape of a distribution Symmetric (Not Skewed) -Skewness is zero -Mean and median are equal Moderately Skewed Left -Skewness is negative -Mean will usually be less than the median Moderately Skewed Right -Skewness is positive -Mean will usually be more than the median Highly Skewed Right -Skewness is positive (often above 1.0) -Mean will usually be more than the median

Covariance

An objective numerical measure that reveals the direction of the linear relationship between two variables. -A positive value indicates a positive linear relationship between the two variables; on average, if x is above its mean, then y tends to be above its mean, and vice versa. -A negative value of covariance indicates a negative linear relationship between two variables; on average, if x is above its mean, then y tends to be below its mean, and vice versa. -The covariance is zero if y and x have no linear relationship. Difficult to interpret because it is sensitive to the units of measurement. A measure of the linear association between two variables.

Trimmed Mean

Another measure, sometimes used when extreme values are present. Obtained by deleting a percentage of the smallest and largest values from a data set and then computing the mean of the remaining values.

Calculating the pth Percentile

1. Arrange the data in ascending (smallest to largest) order. 2. Locate the approximate position of the percentile by calculating Lp: Lp=(n+1)(p/100) where Lp indicates the location of the desired pth percentile and n is the sample size. For the population percentile, replace n by N. For example, we set p=50 for the median as it is the 50th percentile. 3. Once you find the value for Lp, observe whether or not Lp is an integer: -If Lp is an integer, then Lp denotes the location of the pth percentile. For instance, if L20 is equal to 2, then the 20th percentile is equal to the second observation in the ordered data set. -If Lp is not an integer, we need to interpolate between two observations to approximate the desired percentile. So if L20 is equal to 2.25, then we need to interpolate 25% of the distance between the second and third observations in order to find the 20th percentile.

Constructing a Box Plot

1. Plot the five-number summary values in ascending order on the horizontal axis. 2. Draw a box encompassing the first and third quartiles. 3. Draw a dashed vertical line in the box at the median. 4. To determine if a given observation is an outlier, first calculate the difference between Q3 and Q1. This difference is called the interquartile range or IQR. Therefore, the length of the box is equal to the IQR and the span of the box contains the middle half of the data. Draw a line ("whisker") that extends from Q1 to the minimum data value that is not farther than 1.5 x IQR from Q1. Similarly, draw a line that extends from Q3 to the maximum data value that is not farther than 1.5 x IQR from Q3. 5. Use an asterisk to indicate points that are farther than 1.5 x IQR from the box. These points are considered outliers.

Five-Number Summary

1. Smallest Value 2. First Quartile 3. Median 4. Third Quartile 5. Largest Value

Compact Formula for the Sample Mean

=(x1,+x2+...+x10)/n

Population Mean Formula

=(Σx1)/N

Chebyshev's Thereom

By Russian mathematician Pavroty Chebyshev He found bounds for the proportion of the data that lie within a specified number of standard deviations from the mean. For any data, the proportion of observations that lie within k standard deviations from the mean is at least 1-1/(k^2), where k is any number greater than 1. Main advantage is that it applies to all data sets, regardless of the shape of distribution. However, it results in conservative bounds for the percentage of observations falling in a particular interval. The actual percentage of observations lying in the interval may in fact be much larger.

Empirical Rule

Can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean.

Standardizing the Data

Converting sample data into z-scores.

Correlation Coefficient

Correlation-a measure of linear association and not necessarily causation. Just between two variables are highly correlated, it does not mean that one variable is the cause of the other. Can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. The closer the correlation is to zero, the weaker the relationship.

Sharpe Ratio

Developed by Laureate William Sharpe. Originally called the "reward-to-variability" ratio. Is used to characterize how well the return of an asset compensates for the risk the investor takes. (Want to pick investments with high Sharpe ratios) Measures the extra reward per unit of risk.

Measures of Distribution Shape, Relative Location, and Detecting Outliers

Distribution Shape z-Scores Chebyshev's Theorem Empirical Rule Detecting Outliers

Percentile (the pth)

Divides a data set into two parts: -Approximately p percent of the observations have values less than the pth percentile. -Approximately (100-p) percent of the observations have values greater than the pth percentile. Provides information about how the data are spread over the interval from the smallest value to the largest value. Calculating: 1. Arrange the data in ascending order. 2. Compute index i, the position of the pth percentile. i=(p/100)n 3. If i is not an integer, round up. The pth percentile is the value in the ith position. 4. If i is an integer, the pth percentile is the average of the values in positions i and i+1.

Outliers

Extremely small or large values.

LOOK UP AND PRACTICE EQUATIONS

For sample MAD For population MAD For sample variance and the sample standard deviation For population variance and population standard deviation Sample CV Population CV The Sharpe Ratio The Empirical Rule Z-Score Calculating the Mean and the Variance for a Frequency Distribution The Covariances (Sample and Population) The Correlation Coefficient (Sample and Population)

Screwed

If positive, typically the mean is greater than the median. If negative, typically the mean is less than the median.

Coefficient of Variation

Indicates how large the standard deviation is in relation to the mean.

Box Plot (or "Box-and-Whisper Plot")

Is a convenient way to graphically display the minimum value (Min), the quartiles (Q1, Q2, and Q3), and the maximum value (Max) of a data set. Also used to informally gauge the shape of the distribution. Is a graphical summary of data that is based on a five-number summary. Provide another way to identify outliers.

Variance

Is a measure of dispersion that utilizes all the data. It is based on the difference between the value of each observation and the mean. Is a useful in comparing the dispersion of two or more variables. The average of the squared differences between each data value and the mean.

Mean Absolute Deviation (MAD)

Is an average of the absolute differences between the observations and the mean.

Mean (or "Average")

Is calculated by adding up the values of all data points and divide by the number of data points in the population or sample. Most commonly used measure of central location. It's weakness is that it is influenced by outliers. Perhaps the most important measure of location. Provides central location. The average of all the data values.

Median

Is the middle value of a data set. To calculate, the data is arranged in ascending order (smallest to largest) and then calculated -The middle value if the number of observations is odd -The average of the two middle values if the number of observations is even. Whenever the data set has extreme values, this is the preferred measure of central location. Note: This is especially useful when the outliers are present.

Standard Deviation

Is the positive square root of the variance. It is measured in the same units as the data, making it more easily interpreted than the variance.

Arithmetic Mean

Is the primary measure of central location. Generally referred to as the mean or the average.

Range

Is the simplest measure of dispersion. The difference between the maximum (Max) and the minimum (Min) values in a data set. =Max-Min Not considered a good measure of dispersion because it focuses solely on the extreme values and ignores every other observation in the data set.

Mode

Is the value that occurs most frequently. A data set can have more than one or none of these (multimodal, bimodal, etc.)

Descriptive Statistics: Numerical Measures

Measures of Distribution Shape, Relative Location, and Detecting Outliers Exploratory Data Analysis Measures of Association Between Two Variables

Detecting Outliers

Outlier-an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. It might be: -an incorrectly recorded data value -a data value that was incorrectly included in the data set -a correctly recorded data value that belongs in the data set

Mean-Variance Analysis

Postulates that we measure the performance of an asset by its rate of return and evaluate this rate of return in terms of its reward (mean) and risk (variance). In general, investments with higher average returns are also associated with higher risk.

Measures of Dispersion

Range Interquartile Range Variance Standard Deviation Coefficient of Variation

Central Location

Relates to the way quantitative data tend to cluster around some middle or central value. Measures of this attempt to find a typical or central value that describes the data. Examples: finding a typical value for return on an investment, the number of defects in a production process, the salary of a business graduate, the rental price in a neighborhood, the number of customers at a local convenience store,etc.

Weighted Mean

Relevant when some observations contribute more than others. =ΣWiXi

Coefficient of Variation (CV)

Serves as a relative measure of dispersion and adjusts for differences in the magnitudes of the means. Calculated by dividing a data set's standard for deviation by its mean. A unitless measure that allows for direct comparisons of mean-adjusted dispersion across different data sets.

Quartiles

Specific percentiles. First Quartile=25th Percentile Second Quartile=50th Percentile Third Quartile=75th Percentile

Interquartile Range (IQR)

The difference between Q3 and Q1, which is used to determine an outlier when constructing a box plot.

Interquartile Range

The difference between the third and the first quartile. The range for the middle 50% of the data. Overcomes the sensitivity to extreme data values.

Symmetric

The mean, the median, and the mode are all equal (if unimodal)

Sample Statistics

The measures computed for data from a sample.

Population Parameters

The measures computed for the data from a population.

Parameter

The population mean

Statistic

The sample mean

Explanatory Data Analysis

These procedures enable us to use simple arithmetic and easy-to-draw pictures to summarize data.

Measures of Association Between Two Variables

Thus far we have examined numerical methods used to summarize the data for one variable at a time. Often a manager or decision maker is interested in the relationship between two variables. Two descriptive measures of the relationship between two variables are covariance and correlation coefficient.

Z-Score

Used to find the relative position of a sample value within the data set by dividing the deviation of the sample value from the mean by the standard deviation. Often called the standardized value. An observation A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z-score greater than zero. A data value equal to the sample mean will have a z-score of zero.


Related study sets

Psychopharm Exam 1 practice questions

View Set

Introduction to Health Science Chapter 5

View Set

Science Quiz Chapter 1 - Quizzez

View Set

Intro to UNIX / Linux - Chapter 2

View Set

ENG 4A Cumulative Semester Exam Review

View Set