Descriptive statistics

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

CURVES AND DISTRIBUTIONS: Each distribution has at least 3 important parts

▪ The peak, or highest point ▪ The tails, or the ends of the curve, and ▪ The baseline, or the zero cases location ▪ Symmetrical ▸ both sides are identical ▪ Asymptotic ▸ tails of the curve never touch the baseline

Skewness & Kurtosis Statistics: Certain statistics assume that the data being used are normally distributed. If this assumption is incorrect, the results from these statistics will have some degree of error. The amount of error depends on how much the distribution is non-normal.

For both skewness and kurtosis statistics the results revolve around a value for the normal curve. The normal curve is set to zero (0), then a calculation could be positive (+) or negative (−) if the curve isn't perfectly normal. Positive or negative signs simply tell us direction, not the "amount" of non-normality. The coefficient describes how far from normality the shape of the curve is Since curves are rarely perfectly normal (or zero), some leeway must be given in using statistics that assume normality, else we would rarely be able to use them. This means that we must find a point (±) around zero that can be interpreted as "the maximum difference we can tolerate" This means a cut-off point must be established Everything beyond that point (further from zero) can be defined as "not normal"

Kurtosis

Kurtosis is the relative height of the center of the curve. There are three basic kinds of kurtosis: mesokurtic (normal curve) curve is right in the middle of the other two leptokurtic (more peaked than the normal curve) (more homogenous) curve is higher, with less variability among the scores platykurtic (less peaked than the normal curve) (more heterogenous) is flatter, with more variability among the scores

Describing the distribution by shape/normal curve:

One common assumption among many statistical tests is the assumption of normality We must examine the data and determine if it meets this assumption when the test we use has this as an assumption

Skewness:

Skewness is the lack of symmetry in a curve. There are two kinds of skewness: positive skew (longest tail goes to the right) scores are clustering around the low end of the distribution and few are at the high end. negative skew (longest tail goes to the left) scores clustering around the high end of the distribution and few are at the low end.

Talking about Dispersion: Homogeneity v. Heterogeneity

When cases are generally close to each other they are homogeneous. A dispersion measure score of zero would mean all cases are the same. When they are wide apart, they are heterogeneous. A large dispersion measure score means that cases are dissimilar. [The size of the dispersion score is relative, there is no real maximum.]

HOW DO YOU KNOW IF A DISTRIBUTION IS NOT NORMAL?

While there are many forms of curves that do not resemble the normal curve, there are a few major ways in which a curve can differ from the normal curve shape. The number of tails in a distribution The unequal lengths of the two tails (or asymmetry) - called skewness The relative height (or peakedness) of the curve - called kurtosis.

Individual scores can be given a position in a normal curve relative to their group mean so they can be compared This concept is called "Standard Scores" or "Z-scores"

Z-scores are used to compare the relative position of individual scores in a group or multiple groups. They express the deviation of a score (X) from the mean (M) in terms of the standard deviation (s).

Comparison of Different Variables with Different Scales: The "Coefficient of Relative Variation" The problem of comparing variables with different scales can be resolved by dividing the S by its group mean

Example: Worry about being a crime victim: M = 3.75 S = 2.27 2.27 ÷ 3.75 = 0.606 Personal interaction learning: M = 165.92 S = 41.67 41.67 ÷ 165.92 = 0.251 Personal interaction learning actually has more homogeneous scores than worry about crime, even though the original standard deviation was much larger.

For the most part, you won't see a measure of variability like standard deviation reported by itself. It is almost always reported with the mean. In the narrative of a report you might see: Children who viewed the violent cartoon displayed more aggressive responses (M = 12.45, SD = 3.7) than those who viewed the control cartoon (M = 4.22, SD = 1.04).

From the above excerpt you will notice that there is greater variability in the group that saw the violent cartoon (SD = 3.7 vs. SD = 1.04); their scores are not as tightly packed together as the scores in the control group. We interpret that as suggesting the real mean in the population for the violent group is likely between 8.75 and 16.15 (12.45 + 3.7) and the real mean for those in the control group is likely between 3.18 and 5.26 (4.22 + 1.04)—a much smaller interval than the violent group.

Anything larger than ±1.0 is, by definition, no longer characteristic of a normal curve The closer to zero the score is, the closer to perfectly normal the curve is

Statistical Results Kurtosis: A leptokurtic curve would have a score larger than 1.0 A platykurtic curve would have a score larger than -1.0 (Again, don't use the negative sign as "size") Skewness: A curve that is positively skewed would have a score beyond +1.0 A negatively skewed curve would have a score beyond -1.0

Range Since we are trying to describe how spread out the data is, one simple way to accomplish this is by locating the two furthest endpoints and measuring the distance between them. It is like measuring the length of a board. This is what the range does. First, you identify the two most end points. Second, we have to find the upper real limit of the highest score (X-max) and lower real limit of the lowest score (X-min). Third, subtract the lower real limit of X-min from the upper real limit of X-max. The result is the range.

The strength of using the range to describe variability is that it is a straightforward and simple calculation to perform. The weakness of using the range to describe variability is that it only relies on two scores in the data, completely ignoring all of the other values. So, if the two endpoints were very extreme values but all the data in the middle were tightly packed together, then the range would not be a good measure of describing variability. Because it ignores all the scores in the middle, it is considered a "crude" measure of variability.

Standard Deviation/variation: Not every score is equal to the mean; in fact, most scores are not equal to the mean. So, there is some distance between a score and the mean. That distance between a score and the mean is called a deviation. Different scores will have different deviations because they are either closer to or further from the mean than one another. These two measures, the variance and standard deviation, are trying to report an average—the average distance that a score is from the mean. One way to get rid of negative signs is to square a value (you can also just use the absolute value, then add the total and then divide by the number of cases) When you multiply a score by itself, the negative sign goes away. So, that's what we'll do next, square the deviations. Remember, the goal is to find an average distance from the mean. Now, we can average those scores. We simply add up the scores in the (X - μ)2 column ( = 38) and divide by 4 (because N = 4). Σ (X - μ)2 = 38 N = 4 Σ (X - μ)2 / N 38 / 4 9.5

This new average, the average of the squared distances, is called the variance. The question is, how do we interpret this? We would say that this represents the average squared distance (deviation) from the mean. It doesn't really make a lot of sense, talking about the average distance in terms of squared values. So, the way we can undo a square is by taking the square root. By taking the square root of the 9.5 we will be talking about the average distance from the mean NOT the average squared distance from the mean. So, the square root of 9.5 equals 3.08. [We are undoing what we did above] When we take the square root of the variance, this is called the standard deviation. Standard deviation is a creative way of saying the average distance from the mean. The upshot of these two measures is that the use every score in coming up with their measures, unlike range and interquartile range. So, the standard deviation is generally considered to be the superior measure of variability when it is available.

Which of the following choices represents the typical range of scores for the following data, where M = 60 and SD = 13? a. 53.5 - 66.5 b. 47 - 73 <<<< c. 34 - 86

To figure this out, we use the standard deviation, which is the average distance that any score is from the mean. So, a typical score would be found some where around the mean within the boundaries of the standard deviation. Given that the SD = 13, the typical score would be somewhere between 60 + 13. This translates to a range of 47 to 73.

Median: • Defined: the mid-point of an ordered distribution • Used for locating the middle case(s). • Must be ordered from least to greatest and the middle score is the median, if there are two middle scores than those two scores must be averaged • The best measure of central tendency for ordinal level variables, you can also use it for interval and ratio level data. • Used for locating central tendency when there are extreme values in the data for continuous data

When reporting the median we follow the same general rules as with the mean. When referring to directional differences in the medians between groups we simply use the word "median" in the narrative of the report/paper. When we present the median with a numerical value we use the italicized abbreviation Mdn inside parentheses.

Mean: • Defined: the arithmetic average of a distribution • Used for finding individuals' share OR the balance point between numerical values • Sensitive to extreme scores (greatly influence the mean) • It is the preferred measure used to describe central tendency with continuous data, interval and ratio level variables

When writing a report it is important to follow important conventions. We follow the style set forth in the APA Publication Manual. For instance when describing the mean for a group you would write the word if describing the mean in the narrative and use an italicized M in parentheses when presenting it with an actual value. For instance, if you were describing the gender differences between how many violent victimizations were experienced by sampled respondents you might say the following: According to the survey responses, men were more likely to experience violent victimizations than women as their means were higher (M = 4.1 for men, M = 3.6 for women).

Determining variability/Measuring Dispersion: Range Standard Deviation Variance

While measures of central tendency are informative, they sometimes do not adequately represent a distribution and can give misleading results. This is particularly true when two distributions are being compared. particularly rely on measures of dispersion when making comparisons and looking for relationships between multiple groups or between two variables.

Mode When discussing the mode, there are no approved abbreviations in the APA Publication Manual. Generally, they are reported in the narrative of the text and not in parentheses. Typically, we discuss the mode in reference to a frequency distribution table that has been presented in the report/paper.

• Defined: the category with the largest number of cases • Used to locate the dominant or most pervasive category/value • Nominal level measures can only rely on the mode as a measure of central tendency • Because the mode is an analytical feature of nominal level measures, this means that the mode can be used by all other levels of measurement

MEASURES OF CENTRAL TENDENCY

• Mode • Median • Mean

How does this relate to scoring?

▪ The peak of a normal curve has the highest probability of a score occurring ▪ Each end of a normal curve has a low probability of a score occurring ▪ Scores between the ends and the peak of a normal curve have an increasing probability of occurring as the point on the curve moves toward the peak


संबंधित स्टडी सेट्स

Chapter 27: Surface Processing Operations

View Set

How do cells make and use energy?

View Set

Who were the radical Republicans

View Set