01.02 Describing Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Examples of Shape

1. Symmetric 2. Skewed right (a graph that has a long tail on the right side of the data set) 3. Skewed left (a graph that has a long tail on the left side of the data set)

If the scores on a 22-point quiz from a class have been gathered and you know the five-number summary for the class is 5, 12, 14, 17, and 21, you can tell the following:

1. The minimum score in the class was 5 points. 2. The first quartile (Q1) is 12. Twenty-five percent of students earned 12 points or less, and 75% earned 12 points or more. 3. The median is 14. Fifty percent of students earned 14 points or less and fifty percent of students earned 14 points or more. 4. The third quartile (Q3) is 17. Seventy-five percent of students earned 17 points or less, and 25% earned 17 points or more. 5. The maximum score in the class was 21 points.

Symmetric

A graph in which the left and right sides are mirror images (only if the graph is exactly symmetric or roughly symmetric)

Skewed left

A graph that has a long tail on the left side of the data set

Skewed right

A graph that has a long tail on the right side of the data set

Resistant value

A value that is not changed by adding extreme values to the data set

Interquartile Range (IQR)

Another way to find the spread of a data set; unlike range, it's resistant because it is not affected by extreme values (IQR = Q3 - Q1)

Outliers

Any unusual parts of the data set that do not fit the pattern of the data set. Informally, outliers can be found by looking at the data or graph; but when you use this method to describe the data, you have to say, "the outliers appear to be ...." a point that falls more than 1.5 times the IQR above the third quartile or below the first quartile. Lower Limit: Outlier < Q1- 1.5(IQR) Upper Limit: Outlier > Q3 + 1.5(IQR)

Standard deviation of a population

Calculated by finding an average of the squared deviations and then taking its square root

Standard deviation

Can be used to describe the spread; measures the average distance of the observations from the mean; not a resistant measure of spread

Five-number summary of a distribution

Consists of the smallest number, the first quartile, the median, the third quartile, and the largest number, written in order. The summary is: Minimum, Q1, Median, Q3, Maximum.

Finding first quartile

Find the median of the lower half of the data (the lower half of the data does not include the median of the data)

Finding the third quartile

Find the median of the upper half of the data (the upper half of the data does not include the median of the data)

Resistance

How a measure is influenced by extreme values

Outlier Example (solution)

IQR = Q3- Q1 = 20- 10 = 10 Outlier limits: Q1- 1.5(IQR) 10- 1.5(10) = 10- 15 =-5 Q3 + 1.5(IQR) 20 + 1.5(10) = 15 + 20 = 35 Therefore, any value less than-5 or greater than 35 is an outlier. Because 51 is greater than 35, it is an outlier. We indicate the outlier with a dot, and only draw the whisker to the next greatest value, which is 22.

Range

Maximum value − Minimum value

Outlier Example (Part 1)

Mean: To find the new mean, add 51 to the total salaries and add 1 to the number of employees: 204+51/14+1 = 255/15=17.00 The mean has increased by nearly $2.50! Median: To find the new median, list the values in ascending order. Now that there are 15 values, the median is the middle (or eighth) value. 8, 8, 8, 10, 10, 10, 16, 18, 18, 18, 18, 20, 20, 22, 51 The new median is 18.00. The median has increased by 1. Standard deviation: Because the mean has changed, the standard deviation calculation will change, as well, to about 10.31. This means that most of the employees should be making between $6.69 and $27.31.

NBA Salaries, $US, millions 17.1 5.8 5.0 4.5 4.3 4.2 3.1 2.1 2.0 1.0 1.0 0.8 0.7 0.3 Find the five-number summary of the data set

Minimum: 0.3 Quartile 1: 1.0 Median: 2.6 Quartile 3: 4.5 Maximum: 17.1

Mode

Mode is simply the number in a data set that occurs most often. It is not used frequently at this level of statistics

Variance

The average squared distance from the mean

Center

The center of a data set is described by either the mean or the median of the set of values. Unless the data set is symmetric, the median, rather than the mean, should be used to describe the center, because the median is a resistant measure whereas the mean is not.

Uniform

The data do not appear to have any distinct modes; there are no clear peaks on the graph

Bimodal

The data have exactly two clear modes, shown by two peaks of similar size on the graph

Multimodal

The data have multiple modes, shown by more than two peaks of similar size on the graph

Unimodial

The data set has one clear mode, shown by one peak on the graph

Population

The entire group of individuals about which we want information

Median

The median is the number that falls in the middle when the numbers are arranged in order from least to greatest

Mean

The most common measure of center is the mean, which is the arithmetic average of a set of data

Modes

The number that occurs most frequently in a set; can be used to describe data as the number of peaks represented in a display; peaks represent possible modes

First quartile (Q1)

The point at which 25% of the data is below that point and 75% is above that point

Second quartile (median)

The point at which 50% of the data is below and 50% is above that point

Third quartile (Q3)

The point at which 75% of the data is below and 25% is above that point

Shape

The shape the graph takes (which includes histograms, stem-and-leaf plots, dotplots, or boxplots)

Spread

The spread of a data set is used to describe the variability in the data. One way to describe spread is to find the range of the data, subtracting the smallest point of data from the largest point of data.

Standard deviation of a sample

The square root of the variance

Percentiles

The values that divide a rank-ordered set of elements into 100 equal parts

Outlier Example (Part 2)

Using a boxplot to represent data graphically can often help you to recognize outliers. In a boxplot, outliers fall significantly below the first quartile, or significantly above the third quartile. We measure the significance according to the interquartile range (IQR), which is Q3- Q1, and it is another measure of spread. An outlier is a point that falls more than 1.5 times the IQR above the third quartile or below the first quartile. Lower Limit: Outlier < Q1- 1.5(IQR) Upper Limit: Outlier > Q3 + 1.5(IQR)

Does standard deviation describe the spread?

Yes, the more spread out the data, the greater the standard deviation

Is standard deviation based on mean?

Yes; which means that standard deviation is not resistant because mean is not resistant

Quartiles

a specific type of percentile

A percentile

a value that describes how one value in a data set compares with all other values in the set

Does the value of the mean influence the magnitude of the standard deviation?

No, Imagine adding 5 to every value in the data set. Would that change the spread of the data? No, but it would change the value of the mean.

Does standard deviation depend on the size of the data set?

No, because it is the average distance from the mean, adding more values does not necessarily change the standard deviation.

Mathematically, how are outliers found using the interquartile range?

Outlier < Q1 − 1.5(IQR) → an outlier includes anything less than this value AND Outlier > Q3 + 1.5(IQR) → an outlier includes anything greater than this value.

Sample

Part of the population from which information is collected; used to draw conclusions about the entire population

SOCS

Shape, Outliers, Center, Spread


Kaugnay na mga set ng pag-aaral

y8 w27 Wie ist das Wetter 8X3 2021-22

View Set

ITSC 1316: Linux Installation and Configuration - Quiz 2

View Set

HESI case study [BSN345] "Neurocognitive Disorder: Alzheimer's Disease (Early Onset)"

View Set

APHUG Spring Semester Final Exam Review

View Set

Civ. Pro. Subject Matter Jurisdiction

View Set

Quiz: Life policy Provisions, Riders, and Options

View Set