Chapter 2: Methods of describing data

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the Empirical rule to describe data?

The Empirical rule applies to only mound shaped and symmetric data sets and it says that a. ≈ 68% of the data falls within 1 s.d of the mean b. ≈ 95% of the data falls within 2 s.d of the mean c. ≈ 99% of the data falls within 3 s.d of the mean

What is the mean?

The mean is the sum of measurements divided by the number of measurements contained in the data set. It represents the average (or typical) value of the data set.

What are the measures of central tendency?

The measures of central tendency are the mean, median and mode

What are the measures of variability?

The measures of variability are range, variance, standard deviation and interquartile range

What is the pth percentile?

The pth percentile is a number such that p% of the measurements fall below the pth percentile, and (100-p)% fall above it

What are the pros and cons of using a range?

The range is easy to compute and understand, but it is an insensitive measure of data variation because two data sets can have the same range, but still be vastly different in respect to data variation

What are the notations for sample mean and the population mean?

The sample mean is denoted by x bar (x̄) and the population mean is denoted by mu (μ)

How are dot plots used to describe quantitative data?

The x-axis in a dot plot acts as a scale for a quantitative variable, (EG: %) A dot is placed to represent a single observation, and if there is more than 1 observation for a single value, the dots are stacked on top of each other

How are histograms used to describe quantitative data?

The x-axis on a histogram shows class intervals that each have the same width, while the y-axis shows either the class frequency or class relative frequency

What are quartiles? List them

These are percentiles that split a data set into four quarters - The lower quartile (QL, 25th percentile), the middle quartile (QM, 50th percentile), and the upper quartile (QU, 75th percentile)

What is meant by measures of relative standing and list the types

This measures the relationship of a value to the rest of the data. Measures of relative standing include the i) percentile and ii) z-scores.

What are the symbols for variance and standard deviation in both sample and population parameters?

Variance and standard deviation: SAMPLE s^2 and s Variance and standard deviation: POPULATION σ^2 and σ

What is variance and how do you calculate it?

Variance is a measure of dispersion that captures variation about the mean. The formula to calculate variance is: ∑ (x1 - x̄) / n-1 (calculating sample variance = s^2)

What are some ways to describe quantitative data?

We can use three graphical methods to describe quantitative data, that is 1. dot plots, 2. stem-and-leaf displays and 3. histograms

What are z-scores?

Z-scores specify the relative location of a measurement and they make use of the mean and the standard deviation of the data set in order to do so

Calculator hack to find standard deviation

1) set mode to STAT 2) press 1-VAR 3) input all the values, then shift 1 -> data -> AC -> shift 1 -> VAR -> σx = standard deviation

What are the steps of drawing a box plot?

1. Calculate QU and QL and mark the ends to draw a box 2. Find the IQR and mark the median 3. Develop the inner fences (QL - 1.5 = lower inner fence & QU + 1.5 = upper inner fence) 4. Find the outer fence (QL - 3 = lower inner fence & QU + 3 = higher inner fence) 5. Any value that is between the inner fence and outer fence = mark with ✴ = potential outlier. Any value that lies beyond the outer fence = mark with ○ = definite outliers

What are the two main types of numerical measures to describe the general nature of a quantitative data set?

1. Central tendency 2. Variability Central tendency is the tendency for the data to cluster, or center, about certain numerical values. Variability refers to the spread of the data.

For a summary table, explain what i) class ii) class frequency iii) class relative frequency and iv) class percentage is

A class is one of the categories in which qualitative data can be classified Class frequency is the number of observations in the data set falling to a particular class Class relative frequency is the class frequency divided by the total number of observations in the data set; that is, class frequency ÷ n Class percentage is the class relative frequency × 100

What does a higher or lower z-score mean?

A higher z-score means that the observation is further away from the mean/center

What is Chebyshev's Rule to describe data?

Chebyshev's rule applies to any set of data and it predicts that a. No useful information is provided by measurements that fall within 1 s.d of the mean [i.e within the interval x̄-s, x̄+s / μ-σ, μ+σ] b. at least 3/4 of the measurements falls within 2 s.d of the mean [x̄-2s, x̄+2s / μ-2σ, μ+2σ] c. at least 8/9 of the measurements falls within 3 s.d of the mean [x̄-3s, x̄+3s / μ-3σ, μ+3σ] d. For any number k > 1 = at least 1 - 1/k^2 of the measurements fall within k s.d of the mean [x̄-ks, x̄+ks / μ-kσ, μ+kσ]

What does it mean when data is skewed?

Data is said to be skewed if one tail of the distribution carries more extreme observations than the other tail

How are pie charts used to describe qualitative data?

Each slice in the pie chart represents a class (a category). The size of each slice measures the class relative frequency

What are some ways to describe qualitative data?

For qualitative data, By the tabular method: 1. Summary table By the graphing method: 1. Pie chart 2. Bar chart 3. Pareto diagram

Compare left skewness, right skewness, and symmetry in terms of median and mean

If the data is skewed to the left, then the mean is to the left of the median. If the data is skewed to the right, the mean is to the right of the median. If data is symmetric, the mean and the median are equal to each other and lie on the same point

How are stem-and-leaf displays used to describe quantitative data?

Observations are divided into a 'stem' value and a 'leaf' value. Stems are listed in order in a column, while leaves are placed in the corresponding stem row to the right of the bar. The stem values are in the 10s place value, while the leaf values are in the 1s place value

What are some useful methods of detecting outliers?

One graphical method to detect outliers are box plots. One numerical method to detect outliers are z-scores

What are outliers typically attributable to?

Outliers are attributable to the following causes: 1. The measurement is observed/recorded/entered incorrectly 2. measurement is correct but represents a rare (chance) event 3. measurement comes from a different population

What is an outlier?

Outliers are observations that are unusually large or small, relative to the other values in the data set

What is the range?

Range refers to the largest measurement minus the smallest measurement

What is the formula for calculating the z-score? (List formulas for both sample z-scores and population z-scores)

Sample: z = x-x̄ / s Population: z = x-μ/σ *x is any value

What is standard deviation and how do you calculate it?

Standard deviation is a measurement expressing how much a value differs from the mean of the data set. The formula to calculate standard deviation is the square root of variance: √s^2

What is the difference between linear and area based scaling in pie charts?

Linear scaling uses 1-dimensional scaling, where only the diameter of the circle changes in size. Whereas area based scaling uses 2-dimensional scaling, so the entire area of the circle (πr^2) will change in size

How are Pareto diagrams used to describe qualitative data?

The bars are arranged in descending order from L to R, where the classes are represented by each bar, and the bar height represents either the class frequency, class relative frequency, or the class percentage. (Basically a bar graph, but in descending order of height)

How are bar graphs used to describe qualitative data?

The classes (categories) are represented by the bars, where the height of each bar can either be the class frequency, class relative frequency, or the class percentage

What is the median?

The median is the middle value in an ordered sequence. If n is odd, the median is the middle number. If n is even, the median is the mean of the two middle numbers.

What is the mode?

The mode is the value that occurs most frequently in a set of data.


Conjuntos de estudio relacionados

History of Architecture - Greek Architecture

View Set