Understandable Statistics - Chapter 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Pareto chart

A Pareto chart is a bar graph in which the bar height represents frequency of an event. In addition, the bars are arranged from left to right according to decreasing height. Among the many techniques used in quality-control programs (Dr. W. Edwards Deming) TQM. Pareto charts identify the frequency of events or categories in decreasing order of frequency of occurrence. Shows the order of importance.

Dotplot

A display technique somewhat similar to a histogram. In a dotplot, the data values are displayed along the horizontal axis. A dot is then plotted over each data value in the data set (stacked dots).

Frequency table

A frequency table partitions data (a large set of quantitative data) into classes or intervals and shows how many data values are in each class. The classes or intervals are constructed so that each data value falls into exactly one class. (A frequency table shows how data is distributed within set classes.) 1) decide how many classes 2) find the class width 3) determine the data range for each class (Upper and lower class limits) 4) tally the data into the classes (count data values that fall into a particular class or category.) 5) find the frequency for each class (count the number of tally marks corresponding to that class) 6) compute the midpoint (class mark) for each class. 7) Determine the class boundaries

Relative-frequency table

A relative-frequency table shows the proportion of all data values that fall into each class. 1) Make a frequency table 2) compute the relative frequency f/n, where f is the class frequency and n is the total sample size.

Stem-and-leaf display

A stem-and-leaf display is a method of EDA (exploratory data analysis) that is used to rank-order and arrange data into groups. 1) Divide the digits of each data value into two parts. The leftmost part is called the stem and the rightmost part is called the leaf. 2) Align all the stems in a vertical column from smallest to largest. Draw a vertical line to the right of all the stems. 3) Place all the leaves with the same stem in the same row as the stem, and arrange the leaves in increasing order. 4) Use a label to indicate the magnitude of the numbers in the display. We include the decimal position in the label rather than with the stems or leaves. Can see distribution shape and any outliers (data errors or simply unusual data values)

Ogive

An ogive is a graph that displays cumulative frequencies. 1) make a frequency table showing class boundaries and cumulative frequencies. 2) For each class, make a dot over the upper class boundary at the height of the cumulative class frequency. Connect these dots with line segments. 3) By convention, an ogive begins on a horizontal axis at the lower class boundary of the first class.

Robustness

Analysis that is not influenced much by extreme data values.

Clustered bar graphs

Bar graph with two or more bars representing values of the variables being displayed (such as men and women life spans)

Bar Graph

Bar graphs are graphs that can be used to display quantitative or qualitative. 1) Bars can be vertical or horizontal. 2)Bars are of uniform width and uniformly spaces. 3) The lengths of the bars represent values of the variable being displayed, the frequency of occurrence, or the % of occurrence. The same measurement scale is used for the length of each bar. 4) The graph is well annotated with title, labels for each bar, and vertical scale or actual value for the length of each bar. Bar graphs are useful for quantitative or qualitative data. With qualitative data, the frequency or percentage of occurrence can be displayed. With quantitative data, the measurement itself can be displayed, as was done in the bar graph showing life expentency.

Back-to-back stem plot

Data for one stem-and-leaf is on the right of the stem, the other is on the left of the stem.

EDA

Exploratory data analysis (John W. Tukey) Particularly useful for detecting patterns and extreme data values. Designed to help explore a data set, ask questions we had not thought of before, or pursue leads in many directions. EDA methods are especially useful when our data have been gathered for general interest and observation of subjects. We have data but no specific question in mind.

Quantitative data

Histogram, stem-and-leaf display

Probability distribution

If the random sample is large enough, then we can estimate the probability of an event by the relative frequency of the event (probability distribution).

Pie chart or circle graph

In a circle graph or pie chart, wedges of a circle visually display proportional parts of the total population that share a common characteristic. The total quantity, or 100%, is represented by the entire circle. Usually labeled with corresponding percentages of the total. Note: multiply the percent of total x 360 degrees to get the size of the wedge. Circle graphs display how a total is dispersed into several categories. The circle graph is very appropriate for qualitative data, or any data for which percentage of occurrence makes sense. Circle graphs are most effective when the number of categories or wedges is 10 or fewer.

Class

In a large set of quantitative data, organizations of smaller intervals. Five to 15 classes are usually used in a frequency table. If you use fewer than five classes, you risk losing too much information. If you use more than 15 classes, the data may not be sufficiently summarized.

Time-series graph

In a time-series graph, data are plotted in order of occurrence at regular intervals over a period of time. Time is the horizo9ntal scale and the variable being measured is the vertical scale. Connect the data points by line segments.

Outliners

Outliners in a data set are data values that are very different from other measurements in the data set. May indicate data recording errors, or may be valid but so unusual that they should be examined separately from the rest of the data. by people familiar with both the field and the purpose of the study.

Histogram

Provide effective visual display of frequency table data. The data must be quantitative. 1) Make a frequency table (including relative frequencies) 2) Place class boundaries on the horizontal axis and frequencies or relative frequencies on the vertical axis 3) For each class of the frequency table, draw a bar whose width extends between corresponding class boundaries. For histograms, the height of each bar is the corresponding class frequency. If class boundaries look awkward as labels, sometimes lower class limits will be the labels, with the convention that a data value falling on the class limit is included in the next higher class. Another way is the label is the midpoint instead of class boundaries. If data came from a random sample, the histogram will have a distribution shape that is reasonably similar to that of the population.

Relative-frequency histogram

Provide effective visual display of relative-frequency table data. The data must be quantitative. 1) Make a frequency table (including relative frequencies) 2) Place class boundaries on the horizontal axis and frequencies or relative frequencies on the vertical axis 3) For each class of the frequency table, draw a bar whose width extends between corresponding class boundaries. For relative-frequency histograms, the height of each bar is the corresponding class relative frequency.

Skewed left

Refers to a histogram in which one tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. So, if the longer tail is on the left, we say the histogram is skewed to the left.

Skewed right

Refers to a histogram in which one tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. So, if the longer tail is on the right, we say the histogram is skewed to the right.

Relative frequency of a class

Relative frequency of a class is the proportion of all data values that fall into that class. Relative frequency = f / n (class frequency / total of all frequencies)

Class midpoint (class mark)

The center of each class is called the midpoint (or class mark). The midpoint is often used as a representative value of the entire class. Add the lower and upper limits of one class and divide by 2.

Class frequency

The class frequency for a class is the number of tally marks corresponding to that class.

Class width

The class width is the difference between the lower class limit of one class and the lower class limit of the next class. (largest data value - smallest data value) / desired number of classes; increase the computed value to the next highest whole number.

Cumulative frequency

The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes.

Class, lower limit, upper limit

The lower class limit is the lowest data value that can fit in a class. The upper class limit is the highest data value that can fit in a class. Use the smallest data value as the lower class limit of the first class. Add the class width to that value to find the lower class limit for the second class. Follow this pattern to establish all the lower class limits. Then fill in the upper class limits so that the classes span the entire range of data. Class limits are possible data values.

Class boundaries

There is a space between the upper limit of one class and the lower limit of the next class. The halfway points of these intervals are called class boundaries. Class boundaries are not possible data values.

Mound-shaped symmetric distribution

This term refers to a histogram in which both sides are (more or less) the same when the graph is folded vertically down the middle.

Uniform distribution

This term refers to a histogram in which every class has equal frequency. From one point of view, a uniform distribution is symmetrical with the added property that the bars are of the same height.

Bimodal distribution

This term refers to a histogram in which the two classes with the largest frequencies are separated by at least one class. The top two frequencies of these classes may have slightly different values. This type of situation sometimes indicates that we are sampling from two different populations.

Time series

Time-series data consist of measurements of the same variable for the same subject taken at regular intervals over a period of time.

Stem; leaf

To make a stem-and-leaf display, we break the digits of each data value into two parts. The left group of digits is called a stem, and the remaining group of digits on the right is called a leaf. We are free to choose the number of digits to be included in the stem.

How to split a stem

When a stem has many leaves, it is useful to split the stem into two lines or more. For two lines per stem, place leaves 0 to 4 on the first line and leaves 5 to 9 on the next line.

Changing scale

Whenever you use a change in scale in a graphic, warn the viewer by using a squiggle on the changed axis. Sometimes, if a single bar is unusually long, the bar length is compressed with a squiggle in the bar itself.

~

approximately equal to, use this symbol when we round the relative frequency. The total of the relative frequencies should be 1. However, rounded results may make the total slightly higher or lower than 1.

Qualitative data

bar graph, Pareto chart, pie chart


Kaugnay na mga set ng pag-aaral

BULE 303 - Worksheet 16.2: Duties of Agents and Principals and Agent's Authority

View Set

EMT - Chapter 17: Neurologic Emergencies - Questions (MFD)

View Set

International Business Chapter 4

View Set

Chapter Exam - Medical Expense Insurance

View Set

Medical Terminology: Orientation for PA School

View Set

Principles of Management chapter 12

View Set

Ticketing and Sponsorship Sales Midterm

View Set

Supply Chain Chapter 10 Questions

View Set

The Business Environment- Questions

View Set