Statistics Ch 3 Review

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Variance

of a variable, is the square of the standard deviation

Comparing two populations

If we are comparing two populations, then the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure

Summary

This chapter concentrated on describing distributions numerically. Measures of central tendency are used to indicate the typical value in a distribution. Three measures of central tendency were discussed. The mean measures the center of gravity of the distribution. The data must be quantitative to compute the mean. The median separates the bottom 50% of the data from the top 50%. The data must be at least ordinal to compute the median. The mode measures the most frequent observation. The data can be either quantitative or qualitative to compute the mode. The median is resistant to extreme values, while the mean is not. Measures of dispersion describe the spread in the data. The range is the difference between the highest and lowest data values. The standard deviation is based on the average squared deviation about the mean. The variance is the square of the the standard deviation. The range, standard deviation, and variance, are not resistant. The mean and standard deviation are used in many types of statistical inference. The mean, median, and mode can be approximated from grouped data. The standard deviation can also be approximated from grouped data. We can determine the relative position of an observation in a data set using z-scores and percentiles. A z-score denotes how many standard deviations an observation is from the mean. Percentiles determine the percent of observations that lie above and below an observation. The interquartile range is a resistant measure of dispersion. The upper and lower fences can be used to identify potential outliers. Any potential outlier must be investigated to determine whether it was the result of a data-entry error or some other error in the data-collection process, or is an unusual value in the data set. The five-number summary provides an idea about the center and spread of a data set through the median and the interquartile range. The length of the tails in the distribution can be determined from the smallest and largest data values. The five-number summary is used to construct boxplots. Boxplots can be used to describe the shape of the distribution and to visualize outliers

Biased

Whenever a statistics consistently underestimates a parameter, it is said to be this. To obtain an unbiased estimate of the population variance, we divide the sum of squared deviations about the sample mean by n-1 Ex.) Suppose you work for a carnival in which you must guess a person's age. After 20 people come to your booth, you notice that you have a tendency to underestimate people's age (you guess too low.) What would you do about words, originally your guesses were biased. To remove the bias, you increase your guess. This is what dividing by n-1 in the sample variance formula accomplishes.

Caution: mean

Whenever you hear the word average, be aware that the word may not always be referring to the mean. One average could be used to support one position, while another average could be used to support a different position

Resistant

a numerical summary of data is said to be this, if extreme values (very large or small) relative to the data do not affect its value substantially

Nominal data

are qualitative data that cannot be written in any meaningful order. We cannot determine the value of the mean or median of data that is qualified as this. The only measure of central tendency that can be determined for this kind of data, is the mode

Quartiles

divide data sets into fourths, or four equal parts

Outliers

extreme observations are referred to as these. These distort both the mean and the standard deviation, because neither is resistant. Because these measures often form the basis for most statistical inference, any conclusions drawn from a set of data that contains these can be flawed

Note

if a data set has many observations that are "far" from the mean, the sum of the squared deviations will be large, and therefore the standard deviation will be large

Multimodal

if a data set has three or more data values that occur with the highest frequency, the data set is said to be this

Bimodal

if a data set has two modes

Using the Empirical Rule

if data have a distribution that is bell shaped, the Empirical Rule can be used to determine the percentage of data that will lie within k standard deviations of the mean

No mode

if no observation occurs more than once, we say the data has this

Population variance

is

Exploratory data analysis

is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task

Class midpoint

is found by adding consecutive lower class limits and dividing the results by 2

Sample variance

is s squared

Identifying the shape of a distribution from a boxplot (or from a histogram)

is subjective. When identifying the shape of a distribution from a graph, be sure to support your opinion.

Dispersion

is the degree to which the data are spread out

Population standard deviation

of a variable is the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is, it is the square root of the mean of the squared deviations about the population mean.

Arithmetic mean

of a variable, is computed by adding all the values of the variable in the data set and dividing by the number of observations

Range (R)

of a variable, is the difference between the largest and the smallest data value. That is, Range=R=largest data value-smallest data value

Mode

of a variable, is the most frequent observation of the variable that occurs in the data set

Median

of a variable, is the value that lies in the middle of the data when arranged in ascending order. We use M to represent this.

Sample standard deviation

s, of a variable is the square root of the sum of squared deviations about the sample mean divided by n-1, where n is the sample size

Mean

the arithmetic mean is generally referred to as this

Describe the distribution

this will mean to describe its shape (skewed left, skewed right, symmetric), its center (mean or median), and its spread (standard deviation or interquartile range)

Population arithmetic mean

u (pronounced "mew"), is computed using all the individuals in a population. This is also a parameter.

Standard deviation

uses all the data values in the computations

Degrees of freedom

we call n-1 this, because the first n-1 observations have freedom to be whatever they wish, but the nth value has no freedom. It must be whatever value forces the sum of the deviations about the mean to equal zero

What do we use to represent statistics and parameters?

we usually use Greek letters to represent parameters and Roman letters to represent statistics

Steps in Finding the Median of a Data Set

1.) Arrange the data in ascending order 2.) Determine the number of observations, n 3.) Determine the observation in the middle of the data set If the number of observations is odd, then the median is the data value exactly in the middle of the data set. That is, the median is the observation that lies in the n+1/2 position If the number of observations is even, then the median is the mean of the two middle observations in the data set. That is, the median is the mean of the observations that lie in the n/2 position and the n/2 +1 position

Three numerical measures for describing the dispersion of data

1.) Range 2.) Standard deviation 3.) Variance

Sample arithmetic mean

x (with a line above x, pronounced "x-bar"), is computed using sample data. This is a statistic.


Ensembles d'études connexes

Solving Trigonometric Equations Quiz

View Set

Chapter 3 - External Analysis: Industry Structure

View Set

Physics Chapter 3 Force and Motion & Chapter 4 Work and Energy

View Set

Chapter exam 1 & 2 life policies & provision

View Set