01.02: DESCRIBING DATA (AP Stat)

¡Supera tus tareas y exámenes ahora con Quizwiz!

counter

i is simply a__________.

following terms

xi represents each of the________________.

n ∑ xi i = 1 ------- n

x̅ =

mean

x̅ represents the_______.

summation

Σ, or sigma, is the notation for______________(meaning, add up all the terms).

population variance, population mean, i-th element from the population, elements

σ2=(1/n)∑(Xi−µ)^2 Where σ² is the________________, µ is the_________________, Xi is the_________________, and n is the number of____________in the population.

compares

A percentile is simply a value that describes how one value in a data set_______________with all other values in the set.

standard deviation

Although range and the five-number summary can give you information about the spread of the terms in a data set,________________is more commonly used in statistics.

magnitude

Although standard deviation is dependent on the mean for its calculation, the value of the mean does not influence the_______________of the standard deviation.

peaks, peaks, modes

Another way to describe data is to discuss the number of____________represented in a display. These___________represent possible_________.

interquartile range (IQR), resistant

Another way to describe the spread of a data set is to find the_________________, which, unlike range, is____________because it is not affected by extreme values.

percentiles

Assume the elements in a data set are rank-ordered from the smallest to the largest. The values that divide a rank-ordered set of elements into 100 equal parts are called_______________.

The data have exactly two clear modes, shown by two peaks of similar size on the graph.

Bimodal:

median

But, if data have extreme values (such as the difference between a millionaire and someone making $20,000 a year), you need to look at the______________, because you'll find a much more accurate representation of the middle of the sample.

mean

Clearly, the standard deviation is influenced dramatically by the outlier. This is because the standard deviation is based on the_________, which is not resistant.

mean, median

C—Center: The center of a data set is described by either the_________or the____________of the set of values.

median, IQR

For a skewed distribution, use the____________for the measure of center and the_______for spread.

Q3 - Q1

IQR =

mean, standard deviation

If a set of data has a symmetric distribution, then use the_________for the measure of center and the_____________for spread.

mean, middle

If n is even, the median will be the________of the_________two numbers.

σ^2=(1/n)∑(Xi−µ)^2

In a population, variance is the average squared deviation from the population mean, as defined by the following formula:

mean, median, C

In statistics, there are two primary ways to describe the center of a distribution or data set: the__________and the___________. These represent the "__" in the SOCS data analysis strategy.

"the outliers appear to be ...."

Informally, outliers can be found by looking at the data or graph; but when you use this method to describe the data, you have to say,

Outlier < Q1 − 1.5(IQR) → an outlier includes anything less than this value AND Outlier > Q3 + 1.5(IQR) → an outlier includes anything greater than this value.

Mathematically, outliers are found using the interquartile range.

center, spread

Measures of__________and__________are ways that data can be analyzed.

most often

Mode is simply the number in a data set that occurs_____________.

the number that occurs most frequently in a set

Mode:

The data have multiple modes, shown by more than two peaks of similar size on the graph.

Multimodal:

adding up all the numbers

Notice that the summation notation is equivalent to___________________.

range

One way to describe spread is to find the_____________of the data, subtracting the smallest point of data from the largest point of data.

Outliers are any unusual parts of the data set that do not fit the pattern of the data set.

O—Outliers:

the entire group of individuals about which we want information

Population:

IQR

Q3 - Q1 =

number of terms

The letter n represents the________________in the data set.

middle, least, greatest

The median is the number that falls in the___________when the numbers are arranged in order from__________to_____________.

arithmetic average of a set of data

The most common measure of center is the mean, which is the...

50%, 50%

The second quartile (the median) is the point at which__________of the data is below and________is above that point.

σ∨x =√(1/n)∑(X∨i−µ)²

The standard deviation of a population is calculated by finding an average of the squared deviations and then taking its square root:

s∨x =√ (1/n-1)∑(x∨i - x̅)²

The standard deviation of a sample, not an entire population, is calculated using a slightly different formula:

75%, 25%

The third quartile (Q3 or Q3) is the point at which_________of the data is below and___________is above that point.

median, mean

The____________is resistant, but the__________is not resistant.

standard deviation, variance, interquartile range, range

There are two primary ways to describe the spread of a distribution or data set: the____________________and the_________. Two additional measures of spread are the________________and, the less commonly used statistic,__________.

The shape the graph takes (which includes histograms, stem-and-leaf plots, dotplots, or boxplots).

S—Shape:

variability

S—Spread: The spread of a data set is used to describe the______________in the data.

influenced by extreme values

The concept of resistance refers to how a measure is...

25%, 75%

The first quartile (Q1 or Q1) is the point at which__________of the data is below that point and__________is above that point.

Minimum, Q1, Median, Q3, Maximum

The five-number summary of a distribution consists of:

Maximum value − Minimum value

Range =

not resistant

Range: The range is_______________.

S—Shape O—Outliers C—Center S—Spread

SOCS:

part of the population from which information is collected; used to draw conclusions about the entire population

Sample:

Data are skewed to the left; the "tail" of the data is on the left side.

Skewed Left:

Data are skewed to the right; the "tail" of the data is on the right side.

Skewed Right:

the average distance of a value from the mean of the data

Standard Deviation

not resistant

Standard Deviation: The standard deviation is___________________.

greater

Standard deviation describes the spread. The more spread out the data, the_____________the standard deviation.

size, average distance from the mean

Standard deviation is independent of the___________of the data set. Because it is the__________________________, adding more values does not necessarily change the standard deviation.

not a resistant measure of spread

Standard deviation is...

Data are not skewed right or left, they are spaced relative evenly on either side of a peak and generally appear symmetric.

Symmetric:

Symmetric (only if the graph is exactly symmetric) or roughly symmetric (a graph in which the left and right sides are mirror images) Skewed right (a graph that has a long tail on the right side of the data set) Skewed left (a graph that has a long tail on the left side of the data set)

S—Shape Descriptions include the following:

median, median

To find the first quartile, find the__________of the lower half of the data (the lower half of the data does not include the___________of the data).

median, median

To find the third quartile, find the___________of the upper half of the data (the upper half of the data does not include the___________of the data).

The data do not appear to have any distinct modes; there are no clear peaks on the graph.

Uniform:

The data set has one clear mode, shown by one peak on the graph.

Unimodal:

median, resistant measure

Unless the data set is symmetric, the___________, rather than the mean, should be used to describe the center, because the median is a________________________ whereas the mean is not.

average squared distance from the mean

Variance is the...

resistant

When a value is not changed by adding extreme values to the data set, it is said to be______________.

shape Skewed Right Skewed Left Symmetric

When examining the___________of a display, there are three key terms with which you must become familiar:

first

When i=1, x1 is the_________term in the data set.

second

When i=2, x2 is the____________term in the data set.

third

When i=3, x3 is the_____________term in the data set.

s^2= (1/n-1)∑(Xi−X)^2

When using a sample rather than a population, the formula for variance is expressed as:

SOCS

When working with quantitative data, you will be asked to make observations and analyze information from a data set. A simple mnemonic you can use to look at data strategically is_________.

mean

Whenever data appear to be symmetric, using the___________to analyze the data is a good choice.

x1+x2+x3+...+xn ----------------- n

Without summation notation: x̅ = n = ∑ xi i = 1 ------- n

Standard deviation, average distance of the observations from the mean

___________________can also be used to describe the spread. The standard deviation measures the_________________________________________.


Conjuntos de estudio relacionados

Intro to C# Programming Lesson 7 Chapter 2

View Set

Chapter 3 - Cyberattacks and Cybersecurity

View Set