Statistics

Ace your homework & exams now with Quizwiz!

Mean/median graphs

(a) When a data set is skewed to the right, the mean is generally greater than the median. (b) When a data set is approximately symmetric, the mean and median will be approximately equal. (c) When a data set is skewed to the left, the mean is generally less than the median.

Re-read about z scores in chapter 3

A National Center for Health Statistics study states that the mean height for adult men in the United States is μ = 69.4 inches, with a standard deviation of σ = 3.1 inches. The mean height for adult women is μ = 63.8 inches, with a standard deviation of σ = 2.8 inches. Who is taller relative to their gender, a man 73 inches tall, or a woman 68 inches tall? Solution We compute the z-scores for the two heights: The height of the 73-inch man is 1.16 standard deviations above the mean height for men. The height of the 68-inch woman is 1.50 standard deviations above the mean height for women. Therefore, the woman is taller, relative to the population of women, than the man is, relative to the population of men.

EXAMPLE 1.2 Determining whether a sample is a simple random sample

A physical education professor wants to study the physical fitness levels of students at her university. There are 20,000 students enrolled at the university, and she wants to draw a sample of size 100 to take a physical fitness test. She obtains a list of all 20,000 students, numbered from 1 to 20,000. She uses a computer random number generator to generate 100 random integers between 1 and 20,000, then invites the 100 students corresponding to those numbers to participate in the study. Is this a simple random sample? Solution Yes, this is a simple random sample because any group of 100 students would have been equally likely to have been chosen.

Resistant

A statistic is resistant if its value is not affected much by extreme values (large or small) in the data set.

Read 3.11 from ebook

Computing the population variance

Class width

Largest data value- smallest data value/ number of classes

Qualitative variables come in two types

Ordinal- is one whose categories have a natural ordering. The letter grade received in a class, such as A, B, C, D, or F, is an ordinal variable. Nominal-no natural ordering

Sometimes it is desirable to construct a bar graph in which the categories are presented in order of frequency or relative frequency, with the largest frequency or relative frequency on the left and the smallest one on the right. Such a graph is called a

Pareto chart

Median

Procedure for Computing the Median Step 1: Arrange the data values in increasing order. Step 2: Determine the number of data values, n. Step 3: If n is odd: The median is the middle number. If n is even: The median is the average of the middle two numbers.

Quantitative variables can be either

Quantitative variables can be either discrete or continuous

Relative frequency distributions

Relative frequency= Frequency/Sum of all frequencies

EXAMPLE 1.13 Distinguishing between discrete and continuous variables Which of the following variables are discrete and which are continuous? The age of a person at his or her last birthday The height of a person The number of siblings a person has The distance a person commutes to work

Solution Age at a person's last birthday is discrete. The possible values are 0, 1, 2, and so forth. Height is continuous. A person's height is not restricted to any list of values. Number of siblings is discrete. The possible values are 0, 1, 2, and so forth. Distance commuted to work is continuous. It is not restricted to any list of values.

Which of the following variables are discrete and which are continuous? The age of a person at his or her last birthday The height of a person The number of siblings a person has The distance a person commutes to work

Solution Age at a person's last birthday is discrete. The possible values are 0, 1, 2, and so forth. Height is continuous. A person's height is not restricted to any list of values. Number of siblings is discrete. The possible values are 0, 1, 2, and so forth. Distance commuted to work is continuous. It is not restricted to any list of values.

Which of the following variables are qualitative and which are quantitative? A person's age A person's place of birth The mileage (in miles per gallon) of a car The color of a car

Solution Age is quantitative. It tells how much time has elapsed since the person was born. City names are qualitative. Mileage is quantitative. It tells how many miles a car will go on a gallon of gasoline. Color is qualitative.

Distinguishing between ordinal and nominal variables Which of the following variables are ordinal and which are nominal? State of residence Gender Letter grade in a statistics class (A, B, C, D, or F) Size of soft drink ordered at a fast-food restaurant (small, medium, or large)

Solution State of residence is nominal. There is no natural ordering to the states. Gender is nominal. Letter grade in a statistics class is ordinal. The order, from high to low, is A, B, C, D, F. Size of soft drink is ordinal.

Compute the range of a data set

Solution The largest value for San Francisco is 63 and the smallest is 51. The range for San Francisco is 63 - 51 = 12. Page 109 The largest value for St. Louis is 79 and the smallest is 30. The range for St. Louis is 79 - 30 = 49. The range is much larger for St. Louis, which indicates that the spread in the temperatures is much greater there.

Continuous variables

can, in principle, take on any value within some interval. For example, height is a continuous variable because someone's height can be 68, or 68.1, or 68.1452389 inches. The possible values for height are not restricted to a list.

Standard deviation on calculator

Sx=

Deviation

The difference between a population value, x, and the population mean, μ, is x-μ. This difference is called a deviation.

Deviation

The difference between a population value, x, and the population mean, μ, is x-μ. This difference is called a deviation. Values less than the mean will have negative deviations, and values greater than the mean will have positive deviations. If we were simply to add the deviations, the positive and the negative ones would cancel out. So we square the deviations to make them all positive. Data sets with a lot of spread will have many large squared deviations, while those with less spread will have smaller squared deviations. The average of the squared deviations is the population variance.

Computing the standard deviation

The lifetimes, in hours, of six batteries (first presented in Example 3.12) were 3, 4, 6, 5, 4, and 2. Find the standard deviation of the battery lifetimes. Solution In Example 3.12, we computed the sample variance to be s2 = 2. The sample standard deviation is therefore s=(square root)S2=(Square root)2=1.414

CAUTION

The population variance will never be negative. It will be equal to zero if all the values in a population are the same. Otherwise, the population variance will be positive

EXAMPLE 1.3 Determining whether a sample is a simple random sample

The professor in Example 1.2 now wants to draw a sample of 50 students to fill out a questionnaire about which sports they play. The professor's 10:00 a.m. class has 50 students. She uses the first 20 minutes of class to have the students fill out the questionnaire. Is this a simple random sample? Solution No. A simple random sample is like a lottery, in which each student in the population has an equal chance to be part of the sample. In this case, only the students in a particular class had a chance to be in the sample.

Range

The range of a data set is the difference between the largest value and the smallest value. Range = Largest value - Smallest value

EXAMPLE 1.1 Choosing a simple random sample

There are 300 employees in a certain company. The Human Resources department wants to draw a simple random sample of 20 employees to fill out a questionnaire about their attitudes toward their jobs. Describe how technology can be used to draw this sample. Solution Step 1: Make a list of all 300 employees, and number them from 1 to 300. Step 2: Use a random number generator on a computer or a calculator to generate 20 random numbers between 1 and 300. The employees who correspond to these numbers comprise the sample.

The Standard Deviation

There is a problem with using the variance as a measure of spread. Because the variance is computed using squared deviations, the units of the variance are the squared units of the data. For example, in Example 3.12, the units of the data are hours, and the units of variance are squared hours. In most situations, it is better to use a measure of spread that has the same units as the data. We do this simply by taking the square root of the variance. The quantity thus obtained is called the standard deviation. The standard deviation of a sample is denoted s, and the standard deviation of a population is denoted σ.

EXAMPLE 1.4 In a simple random sample, all samples are equally likely

To play the Colorado Lottery Lotto game, you must select six numbers from 1 to 42. Then lottery officials draw a simple random sample of six numbers from 1 to 42. If your six numbers match the ones in the simple random sample, you win the jackpot. Sally plays the lottery and chooses the numbers 1, 2, 3, 4, 5, 6. Her friend George says that this isn't a good choice, since it is very unlikely that a random sample will turn up the first six numbers. Is he right? Page 4 Solution No. It is true that the combination 1, 2, 3, 4, 5, 6 is unlikely, but every other combination is equally unlikely. In a simple random sample of size 6, every collection of six numbers is equally likely (or equally unlikely) to come up. So Sally has the same chance as anyone to win the jackpot.

Variance

When a data set has a small amount of spread, like the San Francisco temperatures, most of the values will be close to the mean. When a data set has a larger amount of spread, more of the data values will be far from the mean. The variance is a measure of how far the values in a data set are from the mean, on the average. We will describe how to compute the variance of a population.

z-Scores and the Empirical Rule

When a population has a histogram that is approximately bell-shaped, then Approximately 68% of the data will have z-scores between -1 and 1. Approximately 95% of the data will have z-scores between -2 and 2. All, or almost all, of the data will have z-scores between -3 and 3. It is best, therefore, to use z-scores only for populations that are approximately bell-shaped.

Qualitative variables

also called categorical variables, classify individuals into categories. For example, college major and gender are qualitative variables

Quantitative variables

are numerical and tell how much or how many of something there is. Height and score on an exam are examples of quantitative variables.

Discrete variables

are those whose possible values can be listed. Often, discrete variables result from counting something, so the possible values of the variable are 0, 1, 2, and so forth.

discrete variables

are those whose possible values can be listed. Often, discrete variables result from counting something, so the possible values of the variable are 0, 1, 2, and so forth.

simple random sampling

every member of the population has an equal probability of being selected for the sample. (of size "n")

pie chart

is an alternative to the bar graph for displaying relative frequency information

population

is the entire collection of individuals about which information is sought.

Statistics

is the study of procedures for collecting, describing, and drawing conclusions from information.

DO check your understanding on page 13

mmkayy

is the entire collection of individuals about which information is sought. Ex. If you wanted the weight of adult women, the population would be the set of weights of all the women in the world.

population

Variables can be divided into two types

qualitative and quantitative

continuous variables

take on any value within some interval. For example, height is a continuous variable because someone's height can be 68, or 68.1, or 68.1452389 inches. The possible values for height are not restricted to a list.

Sample mean

Population mean

μ

Population mean (μ)

μ

population standard deviation

σ


Related study sets

Cold War Review for World Studies Honors

View Set

Organization Behavior MGMT Trust Justice & Ethics Ch 7

View Set

Real Estate Test - State Portion VA - Disclosure Requirements

View Set

Chapter 15 - Reconstruction Study Guide

View Set

PTA 230: typical motor development Birth to 3 months

View Set

Maternity by Lowdermilk Chapters 12, 13, 14, 15

View Set