Statistics: Chapter 2
Quantitative data
data that is numerical in nature but there is no natural categories that data can be organized into interval called classes. Classes are defined in order in order to group the data.
Qualitative data
data that is organized into categories. Data items are then grouped into their respective categories.
Organizing data
group similar data items into groups which can be different categories for qualitative data or intervals for quantitative data.
We organize our data to get a feel for
how the data is distributed.
A histogram is symmetric if
it's right half is a mirror image of its left half.
Examples of quantitative data:
miles per gallon a vehicle gets; numerical grades.
Frequency distribution
organizes data by partitioning the data into categories or classes and lists the frequency of each category or class.
Relative frequency distribution
organizes data by partitioning the data into classes & lists the relative frequency of each class.
Class width
the difference between two consecutive lower class limits. Class width is a constant. Simply determine the class width by taking the difference between any two consecutive lower class limits.
Upper class limit
the largest possible data value that can belong to that class.
The frequency of a category or class is
the number of data values in that category or class.
The relative frequency of a category or class is
the proportion of data values in the category or class relative to the total number of data values.
Lower class limit
the smallest possible data value that can belong to that class.
Raw data
when data is collected in its original form from an observation or experiment it is referred to as raw data. It is very difficult to draw information which in such a state.
When do you stop creating classes?
when you are sure that the largest data value falls into the last class.
Examples of qualitative data:
Letter grades: A, B, C, D. Quality of a product: poor, average, good, great. On a scale of 1 to 10, how do you feel?
Example of classes: 0-10 11-21 22-32 33-43
The class width is 11-0=11, 22-11 = 11 33-22=11
Reasons for constructing a frequency distribution:
To organize data in a meaningful way. To enable the reader to determine the nature or shape of how data is distributed. To facilitate computational procedures for measure of average & spread. To enable researcher to draw charts & graphs. To make comparisons among different data sets.
Categories or classes do not overlap.
Any given piece of data falls into exactly one.
Rules for classes
Class width is constant. All data falls into exactly one class. There are no gaps between classes. Class limits should have same decimal places as the actual data.
How do we determine classes?
First determine class width. This will be given. Next, decide on the first lower class limit. This can be some "convenient" number that is less than or equal to the smallest data value. Next, determine the remaining lower class limits by successively adding the class width to each lower class limit to obtain the next lower class limit. Finally, determine the upper class limit.
How to create frequency & relative distributions for qualitative data:
For each qualitative data, identify the categories & determine the frequency or relative frequency of each category.
How to create frequency & relative frequency distributions for quantitative data:
When data is quantitative in nature, we don't have clearly defined categories to group data. So we determine the classes or intervals in which we will group the data.
Why a histogram?
a histogram can give us a visual idea of how the data is distributed. Knowing how data is distributed can help us answer questions about that data and in certain cases may help us make inferences about the data. Allows us to visually compare classes.
Histogram
a type of graph used to visualize quantitative data. This is basically a graph where each class is identified with a bar whose height is based on how many data values fall in that class.
skewed
being heavily weighted to one side or visualize a tail. The tail points in the direction of skewdness.
If working with qualitative data, the term to use is
category.