Stat: Chapter 2 Organizing Data
Continuous variable
A variable whose possible values form some interval of numbers.
What are groups of quantitative data organized into?
classes, categories or bins
Discrete variable
A variable whose possible values can be listed, even though the list may continue indefinitely. A discrete variable usually involves a count of something.
Stem-and-Leaf Diagrams
Also known as a stemplot. Is often easier to construct than either a frequency distribution or a histogram and generally displays more information.
Dotplot
Another type of graphical display for quantitative data. These are particularly useful for showing the relative positions of the data in a data set or for comparing two or more data sets.
variable
Any characteristic whose value may change from one person or thing to another
data set
Collection of all observations for a particular vaiable
Cutpoint Grouping
Consists of a range of values. The smallest value that could go in a class is called the lower cutpoint of the class, and the smallest value that could go in the next-higher class is called the upper cutpoint of the class. Note that the lower cutpoint of a class is the same as its lower limit and that the upper cutpoint of a class is the same as the lower limit of the next higher class. This method is particularly useful when the data are continuous and are expressed with decimals.
Histogram
Displays the classes of the quantitative data on a horizontal axis and the frequencies (relative frequencies, percents) of each class on a vertical axis. The frequency (relative frequency, percent) of each class is represented by a vertical bar whose height is equal to the frequency (relative frequency, percent) of that class. The bars should be positioned so that they touch each other. * For single-value grouping,use the distinct values of the observations to label the bars, with each such value centered under its bar. * For limit grouping or cutpoint grouping, we use the lower class limits (or, equivalently, lower class cutpoints) to label the bars. Note: Some statisticians and technologies use class marks or class midpoints centered under the bars.
bar char
Displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percents) of those values on a vertical axis. The relative frequency of each distinct value is represented by a vertical bar whose height is equal to the relative frequency of that value. The bar should be positioned so that they do not touch each other.
observation
Each individual piece of data
Population and Sample Distributions
For a simple random sample, the sample distribution approximates the population distribution (i.e., the distribution of the variable under consideration). The larger the sample size, the better the approximation tends to be. There is only one population distribution. Sample distributions will vary from sample to sample. Population data does not change, but sample data will.
truncated graphs
Graphs where part of the one of the axes has been cut off or truncated. In a bar graph, this truncation causes the bars to be out of proportion and hence creates a misleading impression.
frequency distribution of qualitative data
Is a listing of the distinct values and their frequencies.
relative-frequency distribution of qualitative data
Is a listing of the distinct values and their relative frequencies.
Distribution of a data set
Is a table, graph, or formula that provides the values of the observations and how often they occur. An important aspect of the distribution of quantitative data set is its shape.
relative-frequency
Is the ratio of the frequency to the total number of observation. Is equal to the frequency divided by the total number of observations.
Why is relative-frequency distributions better than frequency distributions for comparing two data sets?
Relative frequencies always lie between 0 and 1 and hence provide a standard for comparison.
Single-Value Grouping
Suitable when there are only a small number of distinct values. Used in which each class represents a single possible value. Particularly suitable for discrete data in which there are only a small number of distinct values.
Class mark
The average of the two class limits of a class.
Class midpoint
The average of the two cutpoints of a class.
Class width
The difference between the cutpoints of a class.
Class width
The difference between the lower limit of a class and the lower limit of the next-higher class.
population distribution
The distribution of population data.
sample distribution
The distribution of sample data.
Qualitative (Categorical) variable
The individual observations are categorical responses. The observations basically yield non-numerical information. If the observations are numbers, it doesn't make sense to add, subtract, multiply, or divide them.
Quantitative (Numerical) variable
The individual observations yield numerical information where numerical operations generally have meaning. Two type: Discrete or Continuous
Upper class limit
The largest value that could go in a class.
frequency
The number of times a particular distinct value occurs (count).
Lower class cutpoint
The smallest value that could go in a class.
Lower class limit
The smallest value that could go in a class.
Upper class cutpoint
The smallest value that could go in the next-higher class (equivalent to the lower cutpoint of the next-higher class).
Sample data
The values of a variable for a sample of the population.
Population data
The values of a variable for the entire population. Also known as census data.
What are some methods used to group quantitative data into classes?
Three of the most common methods are: single-value grouping, limit grouping, and cutpoint grouping.
data
Values of a variable.
Modality
When considering the shape of a distribution, you should observe its number of peaks (highest points). A distribution is unimodal if it has one peak; bimodal if it has two peaks; and multimodal if it has three or more peaks.
Limit Grouping
With this method, each class consists of a range of values. The smallest value that could go in a class is called the lower limit of the class, and the largest value that could go into the class is called the upper limit of the class. It is particularly useful when the data are expressed as a whole numbers and there are too many distinct values to employ single-value grouping.
pictogram
a symbol representing an object or concept by illustration
percentage
an expression of relative frequency expressed as a decimal
improper scaling
gives a misleading visual impression