Stats- displaying data
What is meant by modality?
Modality is the term used to describe whether a distribution of scores has one or more peaks or high points (the interval or score which occurs most often). If there is one high point or peak (interval or score which occurs most often), the distribution is said to be unimodal. If there are two distinct high points or peaks, the distribution is said to be bimodal.
Assume the following are the heights in inches of a sample of 20 people. Are these scores unimodal or bimodal, symmetrical or skewed and if skewed, positive or negative? 64, 63, 65, 61, 64, 67, 64, 64, 65, 63, 59, 69, 62, 68, 64, 65, 63, 64, 60, 66
Unimodal and symmetrical
What is the mid-point?
When presenting the data, it often is easier to identify the intervals by the midpoint rather than by presenting the actual intervals. The midpoint is exactly what the term implies, the middle score of the interval.
Assume the following are the heights of another sample of 20 people. Are these scores unimodal or bimodal, symmetrical or skewed and if skewed, positive or negative?
64, 67, 64, 64, 71, 63, 62, 65, 64, 66, 69, 64, 68, 64, 65, 66, 68, 67, 69, 70 Ans: Unimodal and skewed (The skew is positive since the tail is to the right.)
Assume the following are the heights of yet another sample of 20 people. Are these scores unimodal or bimodal, symmetrical or skewed and if skewed, positive or negative?:68, 63, 64, 64, 64, 65, 67, 66, 69, 67, 74, 78, 76, 75, 70, 67, 71, 72, 73, 77
: Bimodal and skewed (The skew again is positive since the tail is to the right)
What are the real upper and lower limits?
: Often, data are not rounded to even numbers. In the above example, the time to run the 200 yards might have been recorded to the half, tenth, or even the hundredth of a second. This requires that the upper and lower real limits be established. In the above example, those values would be set as the 'half-way point' between each interval.
Is an outlier always an extreme point separated from the rest of distribution? Can an outlier be either larger or smaller? Can there be more than one outlier?
Ans: An outlier is the term used to describe any score (larger or smaller) that is separated by an 'unusual distance' from the rest of the distribution. Usually, the term 'outlier' refers to a single score, but, on occasion, the term will be used in the plural to describe a group of scores that seem to be from a different population. For example, if a researcher had a large number of women's heights recorded and noticed that 4 or 5 scores were much higher (taller), he might correctly assume that the 4 or 5 outliers were male heights accidentally included in the data.
Do upper and lower real limits apply only if the data are recorded with decimal places?
Ans: No, because with most measurement data, it is up to the experimenter to decide whether the data will be collected in (rounded to) whole numbers (eg. 2 cm., 3 cm., 4 cm.) or decimals (eg. 2.10 cm., 3.32 cm. 4.29cm. etc.). Thus, the upper and lower real limits give an indication of what the whole number intervals would look like if the continuous data were recorded in decimal form.
Why would an interval size of 3 but not 5 or 7 be used if the range is, say, 14 and the sample size is 30?
Ans: The number of intervals is a decision made by the researcher. There is no hard and fast rule so intervals of 3, 5 or 7 could be used. Of the two, range and sample size, the more important factor in deciding the number of intervals is the range of the scores. Thus, if the range were only 10 but the sample size large, the interval size chosen would probably be 2, resulting in 5 intervals. In the example used for the above question, the sample size is small (30) and the range is 14 so an interval size of 3 is probably most appropriate. 5 intervals of 3 size = 15 which is slightly larger than the range.
What are some of the most common terms used to describe the shapes of distributions?
Distributions of scores can fall into any number of sizes and shapes. For this course, we will discuss only two properties, symmetry and modality.
What is meant by a symmetrical distribution?
If the distribution of scores has the same shape on both sides of the center, it is considered symmetrical.
What are positively and negatively skewed distributions?
If the scores tail off to one side or the other, it is a skewed distribution. If it tails off to the right, the distribution is positively skewed. If it tails off to the left, the distribution is negatively skewed.
What should be done if the score (say, 29.5 in the following example) is exactly the same value of the real upper/lower limits? In which interval should we put 29.5?
Intervals 26.5-29.5 29.5-32.5 32.5-35.5 Ans: It is very rare that a value will fall exactly on the interval limit, but if it does, follow general convention and round to the even number, or, just flip a coin to determine whether that score goes in the higher or lower interval.
What might the final summary of the data look like? What are some alternate ways to present data?
Many times the researcher wishes to present the full summary of the data in table form, including additional information such as cumulative frequency, percentage in each interval and cumulative percentage. Such a summary table would look as follows: In sum, there are many other ways to summarize and present data. There are two general guidelines that are often used for deciding the number of intervals: 10 intervals if there is a large amount of data, or; approximately the sq.rt. of the number of scores recorded. However, the overriding factors always should be: 1) use natural breaks in the number system (0-9; 10-19 etc.) that make the presentation as simple as possible and 2) present the data and information in a manner that makes the data interpretable, (aids understanding and communicates with the reader).
What are the 'rules' regarding the size of intervals?
Often, researchers wish to group the data into a smaller number of intervals so that a more concise summary can be obtained. This is especially useful if the range and number of scores recorded is large. Note: There is no single 'correct' number of intervals since communication with the reader is the ultimate determinant. However, when there is a large number of raw scores, the rule of thumb is to group the data into approximately 10 intervals. Another rule of thumb is to take the square root of the number of scores recorded and use that value as a guide for the number of intervals. Thus, if 256 scores were recorded, approximately 16 intervals would be recommended. If you decided that the best way to present the above data (27 scores) would be to group it into five intervals (Note: the sq.rt. of 27 is approximately 5), the grouped frequency distribution would be as follows. The intervals must all be the same size and include the highest and lowest scores. In this example, the best interval size will be three. (Again, if the range overall spread of the scores was larger, the interval size chosen might have been 4 or 5 or more).
Construct a frequency distribution using the above data
The easiest way is to list by one-second categories, starting with the shortest time and increasing by one second until you get to the longest. After listing the full range of times recorded, then count the number (frequency) of children who ran the distance in that number of seconds. (Note: It does not matter whether you start at the 'top' or 'bottom' with the shortest time. We'll start at the 'top' to be consistent with your text.)
What is a histogram?
The final step is to draw a picture, or histogram, of the data that have been collected. The scores recorded (in this example, the time to run 200 yds.) are presented on the horizontal (X) axis and the frequency of occurrence on the vertical axis (Y).