Chapter 3 & 4

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Sample arithmetic mean

Computed using sample data. The sample mean is a statistic

Median

Divides the lower 50% and upper 50% sets of data. Is a special case of the general concept called the percentile.

Most widely used 3 measures of central tendency

Mean, median, and mode

An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were ​$442​, ​$440​, ​$462​, and ​$206. Compute the​ mean, median, and mode cost of repair.

Mean: $387.50 Median: $441.00 No mode

Quartiles

Most common percentiles. Divide data into 4ths, 4 equal parts.

Weighted mean formula

Multiplying each value of the variable by its corresponding weight, summing these products, and dividing the result by the sum of the weights.

What does it mean if r=​0?

No linear relationship exists between the variables.

Dispersion

The degree to which the data are spread out

What does it mean to say that two variables are positively​ associated?

There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable increases.

Another measure of central tendency is the trimmed mean. It is computed by determining the mean of a data set after deleting the smallest and largest observed values. Compute the trimmed mean for the data given in the accompanying table. Is the trimmed mean resistant to changes in the extreme values in the given​ data?

Trimmed mean: 0.875

True or​ False: When comparing two​ populations, the larger the standard​ deviation, the more dispersion the distribution​ has, provided that the variable of interest from the two populations has the same unit of measure.

True

The sum of all deviations about the mean must equal

Zero

The standard deviation is used in conjunction with the mean to numerically describe distributions that are

bell shaped and symmetric

Exploratory data analysis

exploring data thru summaries, defined by John Turkey

How to check for outliers

(1) determine Q1 and Q3 (2) compute IQR (3) determine the fences (which serve as cutoff points for determining outliers) - Lower fence = Q1 - 1.5 (IQR) - Higher fence = Q2 + 1.5 (IQR) (4) If the data value is less than the LF or greater than the UF, it is an outlier

2nd Quartile

Q2. Divides the bottom 50% from the top 50%. Equal to the 50th percentile. Equal to the median of the entire set of data.

3rd Quartile

Q3. Divides the bottom 75% from the top 25%. Equal to the 75th percentile

Z-score

Represents the distance that a data value is from the mean in terms of the number of standard deviations. It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation. There is both a population and sample z-score. It is unitless, with a mean 0 and SD 1.

Mode

The most frequent observation of the variable that occurs in the data set. Can be qualitative or quantitative data.

What makes the range less desirable than the standard deviation as a measure of​ dispersion?

The range does not use all the observations. The range of a variable is the difference between the largest data value and the smallest data value. The range is less desirable than the standard deviation as a measure of dispersion because it is computed using only two values in the data set​ (the largest and​ smallest).

Is the mean pulse rate of sample 1 (76) an overestimate​ of, an underestimate​ of, or equal to the population​ mean (73.6)?

The sample mean overestimates the population mean

Median

The value that lies in the middle of the data when arranged in ascending order.

The 5th percentile of the weight of males 36 months of age in a certain city is 11.0 kg.

5​% of​ 36-month-old males weigh 11.0 kg or​ less, and 95​% of​ 36-month-old males weigh more than 11.0 kg.

Scatter Diagram

A graph that shows the relationship between 2 quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.

Average

A measure of central tendency that numerically describes the typical data value

A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be​ larger, the mean or the​ median? Why?

The mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail.

Weighted Mean

Used when certain data values have a higher importance or weight associated with them. Example: GPA, with the weights equal to the number of credit hours in each course. The value of the variable is equal to the grade converted to a point value

Negatively Associated

When 2 variables that are linearly related, When above average values of one variables are associated with below average values of the other variable (and vice versa)

Positively Associated

When 2 variables that are linearly related, and when above-average values of one variable are associated with above average values of the other. Same goes with below average values, it causes below average values of the other variable.

Dividing by n results in...

an underestimate, so we divide by a smaller number (n-1) to increase our guess

Since raw data cannot be retrieved from a frequency table, we assume that, within each class, the mean of the data values in equal to

the class midpoint, then multiply the class midpoint by the frequency, and this product is expected to be close to the sum of the data that lie within each class, and repeat the process for each class and sum the results. This sum approximates the sum of all the data

Steps in finding the median of a data set

(1) Arrange the data in ascending order (2) Determine the number of observations, n (3) Determine the observation in the middle of the data set - If the number of observations is odd, then the median is the data value that is exactly in the middle of the data set. That is, the median is the observation that lies in the (n + 1)/2 position - If the number of observations is even, then the median is the mean of the 2 middle observations in the data set. That is, the median is the mean of the observations that lie in the n/2 position and the (n/2) + 1 position

Find the population mean or sample mean as indicated. Sample: 23, 14, 1, 6, 21

13

The median for the given set of six ordered data values is 27.5: 7 12 21 __ 41 48. What is the missing value?

34

The following data for a random sample of banks in two cities represent the ATM fees for using another​ bank's ATM. Compute the range and sample standard deviation for ATM fees for each city. Which city has the most dispersion based on​ range? Which city has more dispersion based on the standard​ deviation? City A (1.50, 1.00, 1.50, 1.50, 1.50) City B (2.25, 1.00, 1.75, 0.00, 2.00)

City A range = 0.50 City B range = 2.25 City A SD = 0.22 City B SD = 0.91

Which city has the most dispersion based on standard​ deviation?

City B, because it has a higher standard deviation

Which city has the most dispersion based on​ range?

City B​, because it has a higher range.

Arithmetic mean

Computed by determining the sum of all the values of the variable in the data set and dividing by the number of observations. Generally referred to as the mean

Sample variance

Computed by determining the sum of the squared deviations about the sample mean and dividing the result by n-1.

Population arithmetic mean (mew)

Computed using all the individuals in a population. The population mean is a parameter

Outliers

Extreme observations. Should always be checked for in data analysis. When encountered, their origins should be investigated. Can occur by chance, or error in measurement, sampling, and data entry. Are sometimes common within a population, which can cause outliers in sampling.

Multimodal

If a data set has 3 or more data values that occur with the highest frequency

No mode

If no observation occurs more than once

kth percentile

Is a value such that k % of observations are less than or equal to the value

For the histogram on the right determine whether the mean is greater​ than, less​ than, or approximately equal to the median. Justify your answer.

Mean < M because the histogram is skewed left

The following data represent the pulse rates​ (beats per​ minute) of nine students enrolled in a statistics course. Treat the nine students as a population: 60, 63, 65, 69, 71, 77, 82, 86, 89.

Population mean: 73.6

The following data represent the pulse rates​ (beats per​ minute) of nine students enrolled in a statistics course. Treat the nine students as a population. (68, 72, 79, 88, 60, 77, 86, 65, 73)

Population variance: 76.8 Population SD: 8.8

1st Quartile

Q1. Divides the bottom 25% from the top 75%. Equal to the 25th percentile

Interquartile Range

Resistant to extreme values, so it is the preferred measure of dispersion based on quartiles. Is the range of the middle 50% of the observations. IQR = Q3 - Q1. Similar to SD and range in that the more spread out of set of data is, the higher the IQR will be.

Sample 1: 60, 86 82 Sample 2: 89, 82, 65

Sample 1 mean: 76 Sample 2 mean: 78.7

Find the sample variance and standard deviation. (23, 12, 6, 7, 10)

Sample variance = 46.3 SD = 6.8

Determine the sample variance and sample standard deviation of the following two simple random samples of size 3. Sample 1: (86, 65, 88)

Sample variance: 162.3 Sample SD: 12.7

Determine the sample variance and sample standard deviation of the following two simple random samples of size 3. Sample 2: (79, 65, 77)

Sample variance: 57.3 Sample SD: 7.6

n represents

Size of sample

A random sample of 15 college students were asked​ "How many hours per week typically do you work outside the​ home?" Their responses are shown on the right. Determine the shape of the distribution of hours worked by drawing a frequency histogram and computing the mean and median. Which measure of central tendency better describes hours​ worked? (2, 8, 9, 10, 11, 17, 18, 18, 19, 21, 21, 24, 25, 26, 32)

Symmetric Mean: 17.4 Median: 18 Mean best described data

Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile​ range?

The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.

Is the mean pulse rate of sample 2 (78.7) an overestimate​ of, an underestimate​ of, or equal to the population​ mean (73.6)?

The sample mean overestimates the population mean

Response Variable

The variable whose value can be explained by the value of the explanatory or predictor variable. It is a dependent variable, while the explanatory variable is independent. Ex: the speed of a golf club head would be the explanatory variable to the distance the golf ball travels, which would be the response variable.

What does it mean to say that the linear correlation coefficient between two variables equals​ 1? What would the scatter diagram look​ like?

When the linear correlation coefficient is​ 1, there is a perfect positive linear relation between the two variables. The scatter diagram would contain points that all lie on a line with a positive slope.

Resistant

When the value of a numerical summary of data is substantially affected by extreme values (very large or very small)

Population standard deviation

o; obtained by taking the square root of the population variance

Because the Empirical Rule requires that the distribution be bell shaped, while the Chebyshev's Inequality applies to all distributions, the Empirical Rule provides results that are more

precise

3 numerical measures for describing dispersion, or the spread, of data

range, variance, and standard deviation

Sample standard deviation

s; obtained by taking the square root of the sample variance

The most popular methods for numerically describing the distribution of a variable

standard deviation and the mean, because these 2 measures are used for most types of statistical inference

Roman letters are used to represent

statistics

The procedure for approximating the variance and standard deviation from grouped data is similar to

that of finding the mean from grouped data; and because we do not have access to the original data, the variance is approximate

If data have a distribution that is bell shaped, the Empirical Rule can be used to determine

the % of data that will lie within k standard deviations of the mean

The approximate mean from grouped data is equal to

the actual mean

The further an observation is from the mean...

the larger the absolute value of the deviation

When the word average is used in the media, it usually refers to

the mean

We use M to represent

the median

The larger the standard deviation

the more dispersion the distribution has, provided that the variable of interest from the 2 populations has the same unit of measure

The mean measures the center of the distribution, while the standard deviation measures

the spread of the distribution

The Greek letter capital sigma tells us

the terms are to be added

The standard deviation is the typical deviation from the

mean

The median is resistant while the...

mean is not resistant

Symmetric

mean roughly equal to median

Skewed right

mean substantially larger than median

Skewed left

mean substantially smaller than median

Degrees of Freedom

n-1, because the first n-1 observations have freedom to be whatever value they wish, but the nth value has not freedom. It must be whatever value forces the sum of the deviations about the mean to equal zero.

We cannot determine the value of the mean or median of data that are:

nominal (only mode)

Variance is based on the

Deviation about the mean

True or​ False: A data set will always have exactly one mode.

False

Bimodal

If a data set has 2 data values that occur with the highest frequency

The histogram on the right represents the connection time in seconds to an internet provider. Determine which measure of central tendency better describes the​ "center" of the distribution.

Median

Biased

Whenever a statistic consistently over or underestimates a parameter

Range

Simplest measure of dispersion. The data must be quantitative. Also seen as R. IS the difference between the largest data value and the smallest data value. Range = R = Largest data value - smallest data value. Is affected by extreme values.

N represents

Size of population

Greek letters are used to represent

parameters

Chebyshev's Inequality

used to determine a lower bound on the % of observations that lie within 'k' standard deviations of the mean, where k > 1. The bound is obtained regardless of the basic shape of the distribution (skewed left, right, or symmetric)

To obtain an unbiased estimate of population variance...

we divide the sum of the squared deviations about the sample mean by n-1

Violent crimes include​ rape, robbery,​ assault, and homicide. The following is a summary of the​ violent-crime rate​ (violent crimes per​ 100,000 population) for all states of a country in a certain year. Q1 =273.8​, Q2 = 388.5​, Q3 = 529.1

​25% of the states have a​ violent-crime rate that is 273.8 crimes per​ 100,000 population or less.​ 50% of the states have a​ violent-crime rate that is 388.5 crimes per​ 100,000 population or less.​ 75% of the states have a​ violent-crime rate that is 529.1 crimes per​ 100,000 population or less. IQR = 255.3 (The middle​ 50% of all observations have a range of 255.3 crimes per​ 100,000 population.)

The 90th percentile of the length of newborn females in a certain city is 54.3 cm.

​90% of newborn females have a length of 54.3 cm or​ less, and 10​% of newborn females have a length that is more than54.3 cm

Is the trimmed mean resistant to changes in the extreme values for the given​ data?

​Yes, because changing the extreme values does not change the trimmed mean.


Ensembles d'études connexes

Chapter 21 cladding with metal and glass (pt.2)

View Set

Chapter 7 EARLY CHRISTIAN ART AND ARCHITECTURE

View Set

Ch 15, Ch 16, Ch 17 and Cumulative FINAL

View Set

HA Chapter 19 Assessing Thorax and Lungs

View Set

Chapter 58: Concepts of Care for Patients With Problems of the Thyroid and Parathyroid Glands

View Set

International Monetary System and Trade Policy

View Set