ISDS Chapter 3

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Mean

($3,000,000/5) = $600,000

The Sample Varience

- Can never be negative because values are squared - Will equal zero only if all observations have the same value (no variation) - The sum of the squared differences around the mean divided by sample size minus 1

A Box Plot

- Graphically display the distribution of a data set. - Compare two or more distributions. - Identify outliers in a data set.

Shape of Boxplots

- If data are symmetric around the median then the box and central line are centered between the endpoints - Can be shown in either a vertical or horizontal orientation

Data Analysis

- Is objective - Should report the summary measures that best describe and communicate the important aspects of the data set

Data Interpretation

- Is subjective - Should be done in fair, neutral and clear manner

The Standard Deviation O

- Measures variation in the population - Calculation is similar to sample standard deviation - Like sample statistics, population standard deviation is the square root of the population variance

The Standard Deviation O

- Most commonly used measure of variation - Shows variation about the mean - Is the square root of the population variance - Has the same units as the original data

The Sample Standard Deviation

- Most commonly used measure of variation - Shows variation about the mean - Is the square root of the variance - Has the same units as the original data

Median

- Not affected by extreme values - When data set contains odd number * Middle value - When data set contains even number * Take the average of the 2 middle values

The Arithmetic Mean

- Often just called the "mean" - The most common measure of central tendency

Rules when Calculation the Ranked Position

- Rule 1: If the result is a whole number then it is the ranked position to use - Rule 2: If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then average the two corresponding data values. - Rule 3: If the result is neither (not a whole number or a fractional half) then round the result to the nearest integer to find the ranked position.

Numerical Descriptive Measures with Ethical Considerations

- Should document both good and bad results - Should be presented in a fair, objective and neutral manner - Should not use inappropriate summary measures to distort facts

Mean

- Sum of values divided by the number of values - Affected by extreme values (outliers)

The Sample Standard Deviation

- The majority of observations will lie within +1 to -1 standard deviations. - Shows variation about the mean - Is square root of the variation

Measures of Variation

- The more the data are spread out, the greater the range, variance, and standard deviation. - The more the data are concentrated, the smaller the range, variance, and standard deviation. - If the values are all the same (no variation), all these measures will be zero. - None of these measures are ever negative.

Mode

- Value that occurs most often - Not affected by extreme values - Used for either numerical or categorical (nominal) data - There may be several, there may be none

Z- Score

- __________ Is useful in identifying outliers - Larger the ________ the greater the distance from value to the mean

The five numbers that help describe the center, spread and shape of data

1) Xsmallest 2) First Quartile (Q1) 3) Median (Q2) 4) Third Quartile (Q3) 5) Xlargest

Steps to Compute Sample Standard Deviation

1. Compute the difference between each value and the mean. 2. Square each difference. 3. Add the squared differences. 4. Divide this total by n-1 to get the sample variance. 5. Take the square root of the sample variance to get the sample standard deviation.

Range, Variance, and Standard Deviation

3 Measures of variation

IQR

= Q3-Q1

Range

= Xlargest - Xsmallest

The Boxplot

A Graphical display of the data based on the five-number summary

Variation and Shape

A data set can be characterized by its ______

Extreme Outlier

A data value is considered an _________ if its Z-score is less than -3.0 or greater than +3.0.

The IQR

A measure of variability that is not influenced by outliers or extreme values

Mean

Acts like the "balance point" for the data set

Quartiles, Five-Number Summary, Boxplot

Another way to describe numerical data

The Empirical Rule

Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean or

The Empirical Rule

Approximates the variation of data in a bell-shaped distribution

The Sample Variance

Average (approximately) of squared deviations of values from the mean

Shape of a Distribution

Describes how data are distributed

Sample

Descriptive statistics discussed previously described a _______, not the population.

Mean

Generally used, unless extreme values (outliers) exist.

Measures of Variation

Give information on the spread or variability or dispersion of the data values.

Most Data Sets

Have a pattern that looks approximately like a bell with a peak of values somewhere in the middle (bell shaped curve).

Population Mean, Population Variance, and Population Standard Deviation

Important population parameters are the _________, __________, and ________

Median

In an ordered array, the _______ is the "middle" number (50% above, 50% below)

Middle Fifty

Interval between Q1 and Q3 sometimes called the _______

Shape

Is defined in 2 measures, Skewness or Kurtosis

Shape

Is either symmetrical or skewed

Q2

Is the median, 50% of values are higher and 50% are lower

The Sample Variance

It Does take into account how all the data values are distributed

Left Skewed

Long tail to left caused by extremely low values, pulls down the mean so it is less than the median

Right- Skewed

Long tail to the right caused by extremely high values which pull the mean upward so mean is greater than median

Symmetric

Mean = Median

Left Skewed

Means the Mean is to the LEFT of the MEDIAN

Right Skewed

Means the Mean is to the RIGHT of the MEDIAN

Resistant Measures

Measures like the median, Q1, Q3, and IQR that are not influenced by outliers are called __________

Mean, Median, and Mode

Measures of central tendency

IQR

Measures spread in middle 50 of data or midspread

Skewness

Measures the amount of asymmetry in a distribution

Kurtosis

Measures the relative concentration of values in the center of a distribution as compared with the tails

Variation

Measures the spread or dispersion of values

Symmetrical Data Sets

Median and mean are same - produces bell shaped distribution

Mode

Most frequent value

Median

Often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers.

Third Quartile

Only 25% of the observations are greater than the _________

Mean

Only measure in which all values play an equal role (why outliers affect it)

u

Population mean

N

Population size

First Quartile

Q1, is the value for which 25% of the observations are smaller and 75% are larger

Second Quartile

Q2 is the same as the median (50% of the observations are smaller and 50% are larger)

The IQR

Q3 - Q1 and measures the spread in the middle 50% of the data

Quartile Measures

Quartiles split the ranked data into 4 segments with an equal number of values per segment

Symmetric

Right and left tails are equal, so mean = median

The Range

Simplest measure of variation Difference between the largest and the smallest values

Comparing Standard Deviations

Simply gives you an idea of how the data is dispersed around the mean and the number of standard deviations from the mean

SS

Sum of squares IS the top part of the equation WHICH IS the summation of all squared differences between x values and the mean

Parameter

Summary measures describing a population, called _________, are denoted with Greek letters.

Variance and Standard Deviation

The 2 common measures of variation

Midspread

The IQR is also called the ________ because it covers the middle 50% of the data

Larger

The _______ the absolute value of the Z-score, the farther the data value is from the mean.

The Variation

The amount of dispersion or scattering of values

The Central Tendency

The extent to which all the data values group around a typical or central value

The Five Number Summary

The five numbers that help describe the center, spread and shape of data

Median Position

The number of data points +1 /divided by 2 - NOTE THAT gives the position in the data set NOT the value

Z- Score

The number of standard deviations a data value is from the mean.

The Shape

The pattern of the distribution of values from the lowest value to the highest value

Central Tendency, Variation, and Shape

The ways to measure

Z- Score

To compute _____, subtract the mean and divide by the standard deviation.

The Empirical Rule

Use to examine the variability in distributions i.e., cluster around the median, right skewed cluster left of mean, left skewed cluster right of mean

Ignores the way in which the data is distributed

Why the range can be misleading

95%

___ % of data in bell shaped distribution implies that 1 of 20 values will be beyond two standard deviations from mean in either direction

99.7%

____% of the data in a bell-shaped distribution lies within three standard deviations of the mean, or µ ± 3σ

Q1

divides smallest 25% of values from other 75%

Q3

divides smallest 75% from largest 25%

Population Mean

is the sum of the values in the population divided by the population size, N

Xi

ith value of the variable X

Median

middle value of ranked data


संबंधित स्टडी सेट्स

Male/Female Reproductive System Test 9

View Set

HR Law Ch. 8: Affirmative Action

View Set

life insurance policy provisions, options and riders

View Set

Chapter 10 Project Scheduling: Lagging, Crashing, and Activity Networks

View Set

Traditions & Culture of IU - Unit 2 EXAM !

View Set

Biology CK-12: Mendel's First Experiment

View Set

Consumer Behavior MK-320 MIDTERM 3/3

View Set