Statistics Chapter 2
Basic shapes of a distribution
1. Uniform 2. Symmetrical, but not uniform 3. Skewed to the right 4. Skewed to the left
Probability distribution
A theoretical distribution used to predict the probabilities of particular data values occurring in a population.
Grouped frequency distribution
Data are often grouped into ranges of values. The classes are ranges of possible values. Grouped distributions are more common and take more skill to create.
Two basic types of frequency distributions
Group distributions Ungrouped distributions
Stacked bar graph
Similar to a side-by-side graph but the data is stacked instead of side by side. Allows the reader to view different groups in a category as one in order to make comparisons between the categorical data.
Upper class limit
The largest number that can belong to a particular class. The upper limit of each class is determined so that they do not overlap.
Bar graphs
Another way to display QUALITATIVE data. Bar graphs are used to represent categorical data. The height of the bar represents the amount of data in that category. The horizontal axis contains the qualitative categories, and the vertical axis represents the frequency of each category. Because the bars represent categories, the width of each bar is meaningless, and the bars usually do not touch. However, to avoid misrepresenting the data, the bars should be of uniform width.
Purpose of graphs
Graphs have several advantages over other forms of data display such as lists, ordered arrays, texts, or tables. -Graphs convey information immediately. -Graphs can have more impact than text, lists or tables. -Graphs are persuasive. -Graphs can often bring out hidden relationships and general trends. -Graphs can be more attractive and interesting to view.
Steps for constructing a frequency distribution
1. Decide how many classes should be in the distribution: there are typically between 5 and 20 classes in a frequency distribution. Several different methods can be used to determine the number of classes that will show the data most clearly. 2. Choose an appropriate class width: in some cases, the data set easily lends itself to natural divisions, such as decades or years. 3. Find the class limits (lower and upper). 4. Determine the frequency of each class: make a tally mark for each piece of data in the appropriate class. Count the marks to find the total frequency for each class.
Frequency histogram
A bar graph of a frequency distribution. To construct a frequency histogram: 1. Find the class boundaries of the frequency distribution. 2. Mark the class boundaries of every class on the horizontal axis, which is a real number line. 3. The width of the bars represents the width of each class. 4. The bars should touch since the upper class boundary of one class is the same as the lower class boundary of the next class. 5. The bars should be uniform in width; thus, histograms are only appropriate for frequency distributions that have classes of uniform width. 6. The height of each bar represents the frequency of the class; thus, frequency is graphed on the vertical axis. Although we used class boundaries to draw the histogram, it is appropriate to use either the class boundaries or the midpoints when labeling the x-axis of a frequency histogram.
Class
A category of data in a frequency distribution
Distribution
A distribution is a way to describe the structure of a particular data set or population.
Ungrouped frequency distribution
A frequency distribution where each category represents a single value and its frequencies (f), or counts of data values, are listed for each category. Letter grades (A, B, C, D, F). Each data value has its own category or class. It would be strange to group letter grades since there are only 5 classes.
Characteristics of a good graph
A good graph should be able to stand alone. -A title is important- should describe topic -Legend -Labels and scales -Source should be included
Line graph
A line graph is used to show specific trends in data, normally over time, that show how two variables are related to one another. To construct a line graph, the x-axis will represent the independent variable in the data given and the y-axis will represent the dependent variable. A point will mark where each x-value is associated with tis corresponding y-value. A line will then be used to join the data points in order.
Time series graph
A picture of how data changes over time and has a variable of time as the horizontal axis. Ex: Consumer price index between the years 1920 and 1990. Common types: line graph Time series study: A historian gathers data over the past hundred years to determine how the number of immigrants coming into the US has changed over the years.
Ogive
An ogive is another type of line graph which depicts cumulative frequency of each class from a frequency table. Begin by tabulating the cumulative frequencies for each class. Unlike creating a frequency polygon, we only include an extra class at the lower end for this graph, giving it a frequency of 0. Next, plot a point at the cumulative frequency for each class directly above its upper class boundary. The ogive is created by joining the points together with line segments.
Line graphs
Depict the change in value over time. Constructed by joining data points in order with the line segments.
Frequency distribution
Display of the values that occur in a data set and how often each value, or range of values, occurs. The objective is to provide an overview of the data.
Dot plots
Displays the data without grouping certain points together like a stem and leaf plot does. Instead, only data which are exactly the same appear together. As such, these plots are useful for identifying extreme values and clusters in data sets. Because a dot plot is just what its name suggests- a dot representing each data value on a number line- it provides an easy way to visually spot data trends.
Relative frequency
Is the fraction or percentage of the data set that falls into a particular class. It is calculated by dividing the class frequency by the sample size. Useful because fractions or percentages make it easier to quickly analyze the data set as a whole. Relative frequency = f / n where f= class frequency where fi= frequency of the ith class where n= sample size
Cross-sectional graph
Picture of the data at a given moment in time. Neither axis will have a variable of time in the case.
Stem and leaf plots
Retains the original data. The leaves are usually the last digit in each data value and the stems are the remaining digits. For example, in the number 189, 9 is the leaf and 18 is the stem. Be sure to include a legend. Steps for creating a stem and leaf plot: 1. Create two columns, one on the left for stems and one on the right for leaves. 2. List each of the stems that occur in the data set in numerical order. Each stem is normally listed only once; however, the stems are sometimes listed two or more times if splitting the leaves would make the data set's features clearer. 3. List each leaf next to its stem. Each leaf will be listed as many times as it occurs in the original data set. There should be as many leaves as data values. Be sure to line the leaves up in straight columns so that the table is visually accurate. 4. Create a key to guide interpretation of the stem and leaf plot. 5. The leaves may be put in order, if desired, to create an ordered stem and leaf plot.
Class boundaries
Split the difference in the gap between the upper limit of one class and the lower limit of the next class. The value that lies halfway between the upper limit of one class and the lower limit of the next class. To find a class boundary, add the upper limit of that class to the lower limit of the next class and divide by 2. After finding one class boundary, ADD (or subtract) the class width to find the next class boundary. The boundaries of a class are typically given in interval form; lower boundary - upper boundary
Pareto chart
The bars from largest to smallest (descending order). Pareto charts are typically used with NOMINAL data. The reason for this is that if a Pareto chart were created from ordinal or quantitative data, the values on the x-axis might seem out of order after the bars were rearranged from largest to smallest. Example: Ivy league college enrollment
Symmetrical, but not uniform distribution
The data lies evenly on both sides of the distribution. The right and left side of the curve, histogram, etc., are mirror images of each other.
Finding the class width
The difference between the lower limits or upper limits of two consecutive classes of a frequency distribution. Begin by subtracting the lowest number in the data set from the highest number in the data set and dividing the difference by the number of classes. Round up. (H - L) / # of classes = class width
Uniform distribution
The frequency of each class is relatively the same. The distribution will have a RECTANGULAR shape.
Skewed to the right
The majority of the data falls on the left of the distribution. Also, the right side of the distribution will extend father out than the extension on the left side. The definitions seem backward, thats because the names are based on what happens to the mean of the distribution, not where the majority of the data lies.
Skewed to the left
The majority of the data falls on the right of the distribution. Also, the left side of the distribution will extend farther out than the extension on the right side. The definitions seem backward, thats because the names are based on what happens to the mean of the distribution, not where the majority of the data lies.
Midpoint
The midpoint, or class mark, of a class is the sum of the lower and upper limits of the class divided by 2. The midpoints are often used for estimating the average value in each class. Class midpoint = (Lower limit + Upper Limit) / 2 Once you find the first midpoint, you can add the class width to it to find the remaining midpoints.
Frequencies
The numbers/counts of data values in the categories of a frequency distribution.
Lower class limit
The smallest number that can belong to a particular class. Using the minimum data value, or a smaller number, as the lower limit of the first class is a good place to begin. you should choose the 1st lower limit so that reasonable classes will be produced, and it should have the same number of decimal places as the largest number of decimal places in the data. After choosing the lower limit of the first class, add the desired number of lower class limits.
Cumulative frequency
The sum of frequencies of a given class and all previous classes. The cumulative frequency of the last class equals the sample size.
Relative frequency histogram
There are times at which it is beneficial to display the relative frequency of a distribution. A relative frequency histogram is identical to a regular histogram, except that the heights of the bars represent the relative frequencies of each class rather than simply the frequencies. It is appropriate to label a relative frequency histogram with either decimals or percentages.
Side-by-side graph
Used when we want to create a bar graph that compares different groups. To do so, create a bar for each class and for each category. Identify the bars in some way, such as different colors, to denote which bars represent a given class. In this type of graph, it is important to include a legend that denotes which color represents which category.
Pie charts
Useful for displaying a frequency table where the x variable is discrete. A pie chart shows how large each category is in relation to the whole; it is created from a frequency distribution by using the RELATIVE FREQUENCIES. The size, or central angle, of each wedge in the pie chart is calculated by multiplying 360 degrees by the relative frequency of each class and rounding to the nearest whole degree. First, compute the relative frequency for each item and divide it by the total expense.
Frequency polygon
Using the class midpoints, we can also construct what is called a frequency polygon. A frequency polygon is a visual display of the frequencies of each class using the midpoints from the histogram. Steps for constructing a frequency polygon: 1. Mark the class boundaries on the x-axis and the frequencies on the y-axis. Note, extra classes at the lower and upper ends will be added, each having a frequency of 0. This allows the figure to be a closed plane bounded by straight lines, which in turn is the definition of a polygon. 2. Add the midpoints to the x-axis and plot a point at the frequency of each class directly above its midpoint. 3. Join each point to the next with a line segment.
Analyzing a graph
When you are analyzing a graph, you are first trying to determine the overall pattern of the data. Is it symmetrical, but not uniform? Does the majority of the data lie to one side or the other? Is the frequency the same for all categories? We also need to determine whether a graph represents a time-series or a cross-sectional study. You must be careful when looking at graphs, because sometimes they can be misleading. If you stretch or shrink the scale on the y-axis, the shape of the graph may change dramatically. A line that rises gently on one scale might look very steep with a different scale. Choose a scale that best represents the data. If there are large differences between the data values, then the graph should accurately reflect the differences. On the other hand, if the difference in the data values is small, then the graph should reflect this as well.
Sample size
n= sample size The sample size for a frequency distribution can be found by adding all of the class frequencies together