Chapter 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Guidelines for constructing good graphics:

- Title and liable the graphic axes clearly, providing explanations if needed; Include units of measurement and a data source when appropriate - Avoid distortion and never lie about the data -Minimize the amount of white space in the graph; use the available space to let the data stand out; if scales are truncated, be sure to clearly indicate this to the reader -Avoid clutter, such as excessive gridlines and unnecessary backgrounds or pictures so as to not distract the reader - Avoid 3D, which may look nice but distract the reader and often leas to misinterpretation - Don't use more than 1 design in the same graphic; sometimes graphs use a different design in one portion of the graph to draw attention to that are; don't try to force the reader to any specific part of the graph; let the data speak for itself - Avoid relative graphs that are devoid of data or scales

2 most commonly used tactics to focusing on the "wow factor" instead of the data

3-D graphs and pictograms (graphs that use pictures to represent the data)

When constructing frequency distributions, we typically want the number of classes to be between

5 and 20

Pareto chart

A bar graph whose bars are drawn in decreasing order of frequency or relative frequency. Help prioritize categories for decision making purposes in areas such as quality control, human resources, and marketing. Example:

Pie chart

A circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category. Typically used to present the relative frequency of qualitative data. In most cases the data is nominal, but ordinal data can also be displayed. To hand create the chart, determine the degrees (multiply frequency and 360) then draw using a protractor. Can only be created if all the categories of the variable under consideration are represented, so that all the data is represented. Example:

Since quantitative data can be ordered (written in ascending or descending order), they can be summarized in:

A cumulative frequency distribution and a cumulative relative frequency distribution

Construction of a stem-and-leaf plot

(1) The stem of the data value will consist of the digits to the left of the rightmost digit. The leaf of a data value will be the rightmost digit (2) Write the stems in a vertical column in increasing order. Draw a vertical line to the right of the stems (3) Write each leaf corresponding to the stems to the right of the vertical line (4) Within each stem, rearrange the leaves in ascending order, title the plot, and provide a legend to indicate the the values represent

Ogive

A graph that represents the cumulative frequency or cumulative relative frequency for the class. It is constructed by plotting points whose x-coordinates are the upper class limits and whose y-coordinates are the cumulative frequencies or cumulative relative frequencies. After the points for each class are plotted, line segments are drawn connecting consecutive points. An additional line segment is drawn connecting the point for the first class to the horizontal axis at a location representing the upper limit of the class that would precede the first class (if it existed)

Time-series data

Data where the vale of a variable is measured at different points in time

Determining the class width

Decide on the number of classes. Generally, there should be between 5 and 20 classes. The smaller the data set, the fewer classes you should have. Use the following formula, then round the value up to a convenient number. Rounding up may result in fewer classes than were originally intended.

Cumulative frequency distribution

Displays aggregate frequency of the category. For discrete data, it displays the total number of observations less than or equal to the category. For continuous data, it displays the total number of observations less than or equal to the upper class limit of a class

Classes

The categories by which data are grouped

If the data are discrete, but there are many different values of the variables or if the values are continuous, then the categories of data (classes):

must be created using intervals of numbers

When a data set consists of a large number of different discrete data values or when the data set consists of continuous data:

no predetermined classes for the corresponding frequency distribution exists, so the classes must be created by using intervals of numbers, each interval represents a class.

Skewed left distribution

The tail tot he left of the peak is longer than the tail to the right of the peak

Frequency polygons

A graph that uses points, connected by line segments, to represent the frequencies for the classes. It is constructed by plotting a point above each class midpoint on a horizontal axis at a height equal to the frequency of the class. After the points for each class are plotted, line segments are drawn connecting consecutive points. 2 Additional line segments are drawn connecting each end of the graph with the horizontal axis. Provides all the same info as histograms.

Stem-and-leaf plot

A way to represent quantitative data graphically. We use the digits to the left and the rightmost digit to form the stem. Each rightmost digit forms a leaf.

Graphs

Allow us to see the data and get a sense of what the data are saying about the individuals in the study. Pictures represent data better than tables, because they are easier to remember.

Choosing the lower class limit on the first class

Choose the smallest observation in the data set or a convenient number slightly lower than the smallest observation in the data set.

Side-by-side bar graph

Compares 2 sets of data. Data sets are compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies is difficult or misleading. When making comparisons, relative frequencies alone are not sufficient, so sample size should also be considered (30,000 out of 40,000--75% is more convincing representative than 3 out of 4---75%). Example:

Bar graph

Constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency. The most common devices for graphically representing qualitative data, where both nominal and ordinal data can be easily displayed.

Cumulative relative frequency distribution

Displays the proportion (or percentage) of observations less than or equal to the category for discrete data and the proportion (or percentage) of observations less than or equal to the upper class limit for continuous data.

Dot plots

Drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed. Limited in usefulness, but can be used to quickly visualize the data

Class midpoint

Found by adding consecutive lower class limits and dividing the result by 2. Used to draw frequency polygons.

Histogram

Is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other. Used to represent quantitative data (similar to a bar graph used to represent qualitative data).

Frequency distribution

Lists each category of data and the number of occurrences for each category of data. Data is still qualitative although numerical. Sum of frequency column should equal the total number of observations. Example:

Relative frequency distribution

Lists each category of data together with the relative frequency. All relative frequencies should equal 1 (may differ slightly from 1 in decimal form due to rounding). Example:

Disadvantage of stem and leaf plot

Loses its usefulness when data sets are large are when they consist of a large range of values. The steps listed for creating them sometimes must be modified to meet all the needs in the data (each leaf must be a single digit), which could result in altered data due to rounding to meet the needs of the graph. Also, you could be forced to use a class width of 1.0 even though a larger "width" may be more desirable.

The most common graphical misrepresentation of data is accomplishes thru

Manipulation of the scale of the graph, typically in the form of an inconsistent scale or a misplaced origin.

Horizontal bars

May be preferred when category names are lengthy. Example:

Time-series plot

Obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points. Useful in identifying trends in data over time

Qualitative (categorical) data

Provides measures that categorize or classify an individual. When collected, we are often interested in determining the number of individuals observed within each category

Split stem graphs

Rather than using one stem for the class of data 10-19, two stems are used, one for 10-14 interval and the second for the 15-19 interval. Reduces bunching of data and reveals the distribution of data better.

Symmetric

Split the histogram down the middle, the right and left sides being mirror images. Can be uniform or bell-shaped.

Why is the use of 3-D effects strongly discouraged?

Such graphs are often difficult to read, add little value to the graph, and distract the reader from the data

What is the advantage of having uniform width in bars and classes?

The area of the bar is then proportional to its height, so we can simply compare the heights of the bars for the different quantities

Advantage of stem and plot graph

The continuous raw data can be retrieved once the frequency distribution is created, unlike in a histogram

Class width

The difference between consecutive lower class limits. Should be the same for each class, with exception to an open ended table

Relative frequency

The proportion (or percent) of observations within a category, and is found using the formula:

Open ended table

The first class has no lower class limit or the last class does not have an upper class limit

Uniform distribution

The frequency of each value of the variable is evenly spread out across the values of the variable

Bell-shaped distribution

The highest frequency occurs in the middle and frequencies tail off the left and right of the middle

Upper class limit

The largest value within the class

Lower class limit

The smallest value within the class

Skewed right distribution

The tail to the right of the peak is longer than the tail to the right

Distributions shapes

The way through which a variable can be described. Classified as symmetric, skewed left, or skewed right.

Deceptive graphs

They purposely attempt to create an incorrect impression

Misleading graphs

They unintentionally create an incorrect impression

Response bias

When wording of a survey question affects the response of the individual

How to prevent misinterpretation of data in a graph:

increments between tick marks should remain constant, and scales for comparative graphs should be the same. Readers will usually assume that the baseline, or zero point, is at the bottom of the graph. Starting the graph at a higher or lower value can be misleading.

When a data set consists of a relatively small number of different discrete data values:

the classes for the corresponding frequency distribution are predetermined to be those data values

If the data are discrete and there are relatively few different values of the variable, then the categories of data (called classes):

will be the observations (as in qualitative data).


Ensembles d'études connexes

Chapter 3 Board of Directors: Duties and Liability

View Set

Chapter 17 (Lecture/Quiz/eBook Quiz)

View Set

Geography Module 3: Chapters 8 - 11

View Set

Chapter 13 Psychosocial and cognitive factors in adulthood

View Set