Statistics Chapter 2 (Bentley)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Contingency Tables and Scatter Diagrams

Two methods for summarizing the data for two variables simultaneously.

Frequency Distribution

(For qualitative data) groups into categories and records the number of observations that fall into each category. Shows the frequency (or number) of items in each of several non-overlapping classes. Objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.

Relative Frequency Distribution

(Of each category) equals the proportion (fraction) of observations in each category. A category's ____________ is calculated by dividing the frequency by the total number of observations. The sum of the _______________ should equal one (or a value very close to one due to rounding). Identifies the proportion (or the fraction) of observations that falls into each class, that is it is equal to Class Frequency/Total Number of Observations.

Frequency Distribution (Guidelines for Determining the Number of Classes)

-Use between 5 and 20 classes. -Data sets with a larger number of elements usually require a larger number of classes. -Smaller data sets usually require fewer classes.

Frequency Distribution (Guidelines for Determining the Width of Each Class)

-Use classes of equal width -Approximate Class Width=(Largest Data Value-Smallest Data Value)/Number of Classes Note: Making the classes the same width reduces the chance of inappropriate interpretations.

Scatterplot Graph May Reveal...

1. A linear relationship exists between the two variables. 2. A curvilinear relationship exists between the two variables 3. No relationship exists between the two variables.

Guidelines for Constructing a Frequency Distribution

1. Classes are mutually exclusive. 2. Classes are exhaustive 3. The total number of classes in a frequency distribution usually ranges from to 20. 4. Once we choose the number of classes for a raw data set, we can then approximate the width of each class by using the formula (Large Value-Smallest Value)/Number of Classes.

Frequency Distribution (Three Steps to Define Classes with Quantitative Data)

1. Determine the number of non-overlapping classes. 2. Determine the width of each class. 3. Determine the class limits.

Cautionary Comment When Constructing or Interpreting Charts or Graphs

1. The simplest graph should be used for a given set of data. Strive for clarity and avoid unnecessary adornments. 2. Axes should be clearly marked with the numbers of their respective scales; each axis should be labeled. 3. The scale on the vertical axis should begin at zero. Moreover, the vertical axis should not be given a very high value as an upper limit.

Pie Chart

A segmented circle whose segments portray the relative frequencies of the categories of some qualitative variable. Commonly used graphical device for presenting relative frequency and percent frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.

Stem-and-Leaf Display (Leaf Units)

A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. Leaf units may be 100, 10, 1, 0.1, and so on. Where the leaf unit is not shown, it is assumed to equal 1. The leaf unit indicates how to multiply the stem-and-leaf numbers in order to approximate the original data.

Contingency Table

A tabular summary of data for two variables. Can be used when: -One variable is qualitative and the other is quantitative -Both variables are qualitative -Both variables are quantitative The left and top margin labels define the classes for the two variables.

Frequency Distribution (Guidelines for Determining the Class Limits)

Class limits must be chosen so that each data item belongs to one and only one class. The lower class limit identifies the smallest possible data value assigned to the class. The upper class limit identifies the largest possible data value assigned to the class. The appropriate values for the class limits depend on the level of accuracy of the data. Note: An open-end class requires only a lower class limit or an upper class limit.

Polygon

Connects a series of neighboring points where each point represents the midpoint of a particular class and its associated frequency or relative frequency.

Exploratory Data Analysis

Consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly.

Contingency Table: Simpson's Paradox

Data in two or more contingency tables are often aggregated to produce a summary contingency table. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated contingency table. In some cases, the conclusions based upon an aggregated contingency table can be completely reversed if we look at the unaggregated data. The reversal of conclusions based on aggregate and unaggregated data is called SImpson's paradox.

Bar Chart

Depicts the frequency or the relative frequency for each category of the qualitative variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. For the other axis, a frequency, relative frequency, or percent frequency scale can be used. Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category.

Stretched Stem-and-Leaf Display

If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display vertically by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0-4, and the second value corresponds to leaf values of 5-9.

Frequency Distribution (Note on Number of Classes and Class Width)

In practice, the number of classes and the appropriate class width are determined by trial and error. Once a possible number of classes is chosen, the appropriate class width is found. The process can be repeated for a different number of classes. Ultimately, the analyst uses judgement to determine the combination of the number of classes and class width that provides the best frequency distribution for summarizing the data.

Pareto Diagram

In quality control, bar charts are used to identify the most important causes of problems. When the bars are arranged in descending order of height from left to right (with the most frequently occurring cause appearing first). Named for its founder, Vilfredo Pareto, an Italian economist.

Classes

Intervals

Ogive

Is a graph that plots the cumulative frequency or the cumulative relative frequency of each class against the upper limit of the corresponding class. A graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are the: -cumulative frequencies, or -cumulative relative frequencies, or -cumulative percent frequencies The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines.

Scatter Diagram

Is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables.

Scatterplot

Is a graphical tool that helps in determining whether or not two quantitative variables are related in some systematic way. Each point in the diagram represents a pair of known or observed values of the two variables.

Histogram

Is a series of rectangles where the width and height of each rectangle represent the class width and frequency (or relative frequency) of the respective class. Another common graphical presentation of quantitative data. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval's frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

Percent Frequency

Is the percent (%) of observations in a category; it equals the relative frequency of the category multiplied by 100.

Dot Plot

One of the simplest graphical summaries of data. A horizontal axis shows the range of data values. Then each data value is represented by a dot placed above the axis.

Trendline

Provides an approximation of the relationship.

Cumulative Frequency Distribution

Records the number of observations that falls below the upper limit of each class.

Stem-and-Leaf Display

Shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but is has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf.

Cumulative Percent Frequency Distribution

Shows the percentage of items with values less than or equal to the upper limit of each class.

Cumulative Relative Frequency Distribution

Shows the proportion of items with values less than or equal to the upper limit of each class.

Histograms Showing Skewness

Symmetric -Left tail is the mirror image of the right tail (ex. height and weight of people) Moderately Skewed Left -A longer tail to the left (ex. exam scores) Moderately Skewed Right -A longer tail to the right (ex. housing values) Highly Skewed Right -A very long tail to the right (ex. executive salaries)

Cumulative Distributions

The last entry in a cumulative frequency distribution always equals the total number of observations. The last entry in a cumulative relative frequency distribution always equals 1.00. The last entry in a cumulative percent frequency distribution always equals 100.


Kaugnay na mga set ng pag-aaral

Japanese Hiragana (ha, hi, fu, he, ho)

View Set

PM Chapters 1, 2, 3, 4, 5, 13, 14

View Set

Art 100 Ch 14 - Ancient Mediterranean Worlds

View Set

Lesson 11 - Introduction to Managerial Accounting

View Set

8. RNA Processing Part 4: tRNA Transcription and Processing

View Set

Geovisualization I - Lecture Questions

View Set

ITE115 Module 02: Operating Systems and File Management Quiz

View Set

Family Life Education Final Exam Review Guide

View Set