Chapter 2 "Organizing and Summarizing Data"
Frequency distributions
*A frequency distribution is a collection of observations (or data) presented in an organized manner by listing the frequency of each observation's occurrence. Observations can be sorted into class intervals or remain ungrouped. Regardless of how the data are grouped, the frequency of the data is reported for each value or class interval.* -are one way to organize data collected from research. Frequency distributions help display how spread out or how similar data are and make it easy to identify any extreme values, or outliers. -Frequency distributions are a way of organizing raw data, that is, the data that we collect during a research project, to make it easier to get a snapshot of what we have. They also make it easier to see how spread out or how similar our data are.
*Raw data have no meaning until they are organized and analyzed.*
...
Bar graph Used with qualitative data Histogram Uses a continuous scale for the x-axis and plots quantitative data Scatter plot Uses points to plot paired quantitative data Line graph Used to graph two quantitative matched variables with a line connecting each point implying the data in continuous Stem and leaf plot Looks like a bar graph turned on its side, gives a quick visual representation of the data Pie chart Used to show percentage of a whole
...
Class intervals Equal size groups in a grouped frequency distribution Relative frequency The frequency of each class divided by the total number of observations
...
You gather data for the average temperature each month over the course of a year. You want to represent your data to show the continuous change in temperature across the year. The best way to visually represent your data is by using a line graph . You conduct a survey of people in a coffee shop to determine how frequently people order drinks in the small, medium, and large sizes. The best way to visually represent your data is by using a bar graph .
...
Histogram
A bar graph often used to display quantitative data. -A histogram looks like a bar graph but it isn't! True, it uses the same bar-like rectangles, but in this case, the x-axis is a continuous scale. For this reason, a histogram is used with quantitative data on the interval or the ratio scale. The y-axis still represents the frequency of observations at each x-axis value.
Pie Charts
A circular graph divided into sectors that represent qualitative data categories such as colors, races, and genders. -Pie charts are used to show a percentage or proportion of a whole. They are used with qualitative or categorical data and show simply the frequency of observations from each category. Pie charts can only be used when it is appropriate to show how many observations of all observations fall in to each category.
Outliers
A data entry that is "very different" from the other entries in a data set.
Scatter Plots
A graph that represents the relationship between paired data, where each entry in one data set corresponds to an entry in the second data set. -A scatter plot is used to graph matched or paired quantitative data. A value from variable one is matched to the corresponding value on variable two, and a point is plotted where the two values intersect on the graph. For example, if we tracked rats' weight at a particular age, we could plot each point on a graph. If a rat is 5 weeks old and weighs 1.8 grams, then we can plot the point at (5, 1.8). Scatter plots allow us to visually see relationships in our data (for example, do younger rats always have smaller weights?).
Line Graphs
A line graph is similar to a scatter plot in that we use it to graph two quantitative, matched variables. The difference here is that a line connects each point, and implies that the data is continuous in nature.
Stem and Leaf Plots
A stem and leaf plot gives a quick visual of our data. The stem goes to the left of the vertical line, and the leaves go to the right. For each observation in this distribution, the stem is the value in the tens place, and the leaf is the value in the ones place. We include the 20s even though we don't have any observations in that category. Once we complete the stem and leaf plot, we can quickly see which category has the largest frequency of observations: the 60s. In fact, the stem and leaf plot looks visually like a bar graph turned on its side. For example, if we had the data set of 16, 35, 37, 43, 44, 47, 48, 51, 51, 52, 52, 54, 55, 57, 60, 62, 62, 64, 64, 65, 66, 68, 68, 71, 76, 76, 77, 78, 83, 85, 87, 90, 96, our stem and leaf plot would look like this:
Class Intervals
A unit used to group data entries. It is also called an interval. -Class intervals are often used in frequency distributions, especially when the number of observations is large or there are lots of individual observations. Class intervals must be the same size, must be consecutive, and must include all individual observations (or data). Likewise, each observation can only be included in one class interval.
Bar Graphs
Bar graphs are used with qualitative or categorical datal. A bar graph simply presents the count for each category, that is, the frequency of responses in each group.
Qualitative
Consists of attributes, labels, or nonnumeric entries.
Quantitative Data
Consists of numerical measurements or counts.
Frequency Distribution Table – Cumulative Frequency and Cumulative Relative Frequency
Cumulative frequency and cumulative relative frequency show us at a glance how many or what proportion of our observations falls at or below a certain class interval. Both columns start at the lowest class and work up to the largest class, each class accumulating the observations from all classes before. Cumulative frequency is a count of how many observations fall at or below a given class, and the final entry should be the total number of observations overall. Cumulative relative frequency is the proportion of observations that fall at or below a given class, and the final entry should sum to 1.000.
Visually Presenting Data
Data can be organized and presented visually using graphs. There are several types of graphs that can be used to illustrate the data. Caution: only certain graphs can be used with certain types of data!! In other words, you can't just choose the graph you like the best to illustrate your data. You must choose a graph that can appropriately showcase the type of data with which you are dealing.
Relative Frequency
Percentage of the data in a particular class.
Steps to creating a frequency distribution:
Step 1: Order the data from smallest to largest. Step 2: Decide where the lowest class interval should begin. For example, with our data above, the lowest value is 93. Thus, we might choose 93 to be the start of our lowest interval or we might choose 90 to keep nice round numbers there. It doesn't matter where we start, it only matters that our class intervals be the same size and that they capture all of our data. Step 3: - Decide the size of the class interval. There is not a precise way of doing this. The desired number and size of the class intervals should be neither too large nor too small. A suggested method is to divide the range of the data (that is, the lowest value subtracted from the highest value) by the number of class intervals desired. In our example, the range is 205-93 = 112. Let's say we want to have approximately ten class intervals. To find the approximate size of each interval, we divide the range by the number of desired class intervals: 112/10 = 11.2. - Obviously, 11.2 is a difficult size for our class intervals, since we have only whole numbers in our data. This is just an approximation of what size we should use, but we can make our class interval any size we want to. Let's say that we simplify and make our class interval size a 10. Step 4: Now that we've chosen intervals of 10, we can begin to build our table. The class interval goes on the left, and each interval must be equivalent in size. Also, all consecutive class intervals must be included, even if there are no observations from that interval. The frequency (f), or number, of observations goes in the right column. For example, our first class interval is 90-99. The only observations that fall within that interval are 93 and 95. Therefore, the frequency for that class interval is 2. Step 5: Complete the frequency distribution table. Step 6: The frequency distribution allows us to quickly skim our data to see which class interval had the most observations.
Follow these steps to create a frequency distribution:
Step 1: Order the data from smallest to largest. Step 2: Decide where the lowest class interval should begin. It could start with the least value in the data set or an even lesser value, as long as all values will be included. Step 3: Decide the size of the class interval. Step 4: Build the frequency table with the class intervals on the left and the frequency, or number, of observations in the next row. Be sure to include all class intervals, even if a particular interval has no observations. Step 5: Complete the table.
Descriptive statistics
describe the information in the data set.
Relative frequency
is the ratio of observations in each class interval to the total number of observations. This value is always a decimal less than 1.