IM 6th Grade Unit 8 - Data Sets & Distribution
Measure of Center
The _______________ for a data distribution is a number that can be thought of as the middle or typical value of the distribution. The mean can also be seen as ________________ that "balances" the points in a data set.
Interquartile Range (IQR)
The _______________ of a data set is a measure of spread of its distribution. It is the difference between the third quartile (Q3) and the first quartile (Q1).
Mean Absolute Distribution (MAD)
This measures the spread in a distribution. It is the mean (or average) of the distances of the data points from the mean of the distribution. It is also called Mean Absolute Deviation because the distance of a data point from the mean is the absolute value of its deviation from the mean).
Categorical data
...are data where the values are categories. For example, the breeds of 10 different dogs (chihuahua, St. Bernard, German Shepherd, etc). Another example is the colors of 100 different flowers.
Box plot
A _____________ is a representation of a data set that shows the five-number summary. It shows the first quartile (Q1) and the third quartile (Q3) as the left and right sides of a rectangle or a box. The median (Q2) is shown as a vertical segment (a line that goes from the top of the box to the bottom) inside the box. The "whiskers" (just straight horizontal lines) on the sides represent the bottom quarter (lower or first fourth) and the top quarter (highest or last fourth). They always extend to the minimum (smallest) and maximum (highest) values of the data sets.
Mean
Also known as the average of a data set...is the value you get by adding up all of the values in the set and dividing by the number of values in a set. For example: If your data set has 80, 90, and 100, you add up 80 + 90 + 100, which equals 270. Then you divide 270 by 3 (because you added the 3 values) and then average or _______ is 90.
Average
Also known as the mean of a data set...is the value you get by adding up all of the values in the set and dividing by the number of values in a set. For example: If your data set has 80, 90, and 100, you add up 80 + 90 + 100, which equals 270. Then you divide 270 by 3 (because you added the 3 values) and then mean or _______ is 90.
Center
For a symmetrical or almost symmetrical data distribution, it is the value around which the distribution is symmetrical. We also use the idea of center for distributions that are not symmetrical (for example the mean or median).
Frequency
In statistics, it is the number of times a particular data value occurs in a data set. When that number is expressed as a fraction of the total number of data values, then it is called the relative frequency (can also be expressed in decimal or percentage). Ex: There were 21 dogs in the park, some white, some brown, some black. The table shows the frequency and the relative frequency of each color.
Range
The _________ of a data set is the difference between the maximum (biggest value) and the minimum (smallest value). For example: If the following data set (5, 8, 9, 11, 15), the _________ is 10, which is the difference between the Maximum/biggest value of 15 and the minimum/smallest value of 5....or 15 - 5 = 10.
Quartile
The ____________ for a data set are three numbers that divide the data into fourths (4 parts). The median divides the set into two halves, and the first ________ (Q1) is the median of the lower half (or first half). The second _________ (Q2) is the median itself, and the third _________ (Q3) is the median of the upper half (second half).
Median
The _______ of a data set is the middle value when data values are listed in order. If the number of values is even, it is the mean/average of the two middle values. For Example: If we have the data set of 5 values, which an odd number, (3, 5, 7, 10, 15), 7 is the _______ because it is in the middle. If we have a data set of 8 values, which is an even number, (2, 5, 8, 8, 10, 11, 14, 17), the _______ will be the mean/average of 8 & 10, which is 9.
Statistical question
a question that can only be answered by using data and where we expect the data to have variability, meaning different answers. EXAMPLES: Who is the most popular musical artist at your school? When do students in your class typically eat dinner? Which classroom in your school has the most books. NON-EXAMPLES: How many ears do you have? What color is FMS?
Histogram
a way of representing a numerical data set by grouping the data into bins (a range of numbers) and showing how many values are in each bin with a vertical bar graph.
Numerical data
also called measurement or quantitative data, are data where the values are numbers, measurement, or quantities. For example, weights of 10 different dogs, ages a group of people (with different ages).
Distribution
for a numerical or categorical data set, it tells you how many of each value or each category there are in the data set.
Spread
it tells you how "spread out" the data values are. The greater the ________ means the greater the variability. For example, on a dot plot, if one dot plot has data plotted on 2 and also some at 10 while another has some on 5 and some the biggest value is 7, then the first dot plot has wider ________.
Dot plot
sometimes call a line plot...is a way to represent the distribution of a numerical data set by placing dots on specific numbers on a number line.
Variability
the tendency of a data set (a group of data) to have different data values (different answers).