Quiz 1
Probability of error
, if there is a 68% chance of being correct, there is also a 32% chance of being incorrect. This is referred to as the probability of error and is written as p < .32 (the probability of error is less than .32). (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Confidence interval (application)
An alternative to the null hypothesis statistical test (NHST) is to use confidence intervals to make inferences about parameters. A confidence interval is an interval that is constructed around a statistic. The approach is based on the same underly-ing statistical model as the NHST, but instead of making a binary decision about the acceptability of H0 , the analyst simply calculates an interval around which it is estimated that the population value truly exists. Recall earlier in this chapter when the standard error of the mean (SEM equation 7.01, SEM 10 = 50 = 1.4. From the normal curve, we can construct a 95% CI using a Z score of 1.96 such that 95% CI = X - ± 1.96 SEM . In this example, the 95% CI = 35 ± 1.96 (1.4) = 32.3 to 37.7 centimeters. We estimate then, with 95% confidence, that the true population mean lies somewhere between 32.3 and 37.7 centimeters. This is an inferential calculation because we are estimating a parameter from a statistic. (In practice, we will use a statistic called t instead of Z, which will be explained in chapter 10.) We can apply the same logic to construct confidence intervals about other statistics. Imagine that the mean difference in BMI between biomechanists and exercise physiologists is 3 kilograms per square meter and that the standard error of mean differences is 2 kilograms per square meter. The 95% CI about the mean difference is 3 ± 1.96 (2) = −0.9 to 6.9 kilograms per square meter. Notice that the 95% CI includes zero, which indicates that we are not confident that the true population mean difference is different than zero. This is tantamount to saying that we fail to reject H0 value will result in a failure to reject H0 . Data that results in a 95% CI that includes the null hypothesis when the comparable NHST is performed. Similarly, data that result in a 95% CI that excludes the null value will result in a rejection of H0 when the comparable NHST is performed Why worry about confidence intervals? First, confidence intervals are useful even when performing a NHST because they provide more information than just a simple binary decision on the null hypothesis. Second, many statisticians and theorists argue that the NHST is based on flawed logic. When we calculate a p value, the p does not represent the probability that H0 is false (which is what we would really like for it to mean). Instead, p reflects the probability that we could have obtained data this extreme, or more extreme, if H0 equivalent statements about p and H0 . Further, many argue that all null hypotheses are false. That is, all variables are related, even if only in a trivially small way. The researcher's job then is simply to estimate the size of the effect and to construct confidence limits about the estimate. Academic arguments about the validity of the NHST have been going on for many decades. (For example, see the provocatively titled book The Cult of Statistical Significance by Ziliak and McCloskey, 2008.) We will not enter that fight but rather will simply note that, at a minimum, one needs to understand both the NHST and the use of confidence intervals because both are used in the kinesiology literature and a competent consumer of the literature must be familiar with these ideas.
Class of data-ordinal data
An ordinal scale, sometimes called a rank order scale, gives quantitative order to the variables but does not indicate how much better one score is than another. In a physical education class, placement on a ladder tournament is an example of an ordinal scale. The person on top of the ladder tournament has performed better than the person ranked second, but no indication is given of how much better. The top two persons may be very close in skill, and both may be considerably more skilled than the person in third place, but the ordinal scale does not provide that information. It renders only the order of the players, not their absolute abilities. Differences between the positions on an ordinal scale may be unequal. If 10 people are placed in order from short to tall, then numbered 1 (shortest) to 10 (tallest), the values of 1 to 10 would represent ordinal data. Ten is taller than 9, and 9 is taller (Vincent)than 8, but the data do not reveal how much taller. Clinicians often use 0-to-10 ordinal scales to quantify pain. Similarly, in exercise physiology the 6-to-20 Borg scale is an ordinal scale that is used to quantify the rating of perceived exertion during exercise. An interval scale h (Vincent)
Variance
Both the range and the interquartile range consider only two points of data in determining variability. They do not include the values of the scores between the high and low, or between the Q1 and Q3 data points. Another way to assess vari-ability that does consider the values of each data point is to determine the distance of each raw score from the mean of the data. This distance is called deviation (d). The computation of deviation scores for two sets of data, each of which has a mean of 25, is demonstrated in table 5.1. The sum of the deviations around the mean will always equal zero. This is true regardless of the size of the scores. This is one way to verify the accuracy of the mean. If the deviations do not sum to zero, the mean is incorrect. In table 5.1 the sum of all deviations around the mean equals zero in both the X and the Y examples; therefore, we can be assured that our calculations of the means are correct. In the second example (Y), the deviation scores are larger and the range is larger (RX = 27 − 23 = 4 and RY = 35 − 15 = 20). In fact, as a comparison of the ranges shows, the Y data are five times more variable than the X data. The interquartile range comparisons confirm this conclusion: IQRX = 26 − 24 = 2 and IQRY = 30 − 20 = 10. The relationship between the range and the interquartile range is always consistent when the data from the two sets are normally distributed. If we sum the absolute values of the deviations (ignoring the direction of the deviation) from the mean, we find that the total deviation of X is 6 and the total deviation of Y is 30, or five times more variable than X. However, using the sum of the absolute values to quantify variability is problematic in a couple ways. First, absolute values do not lend themselves to the proper algebraic manipulations that are required in subsequent chapters. Second, in these examples the signs of the deviations have meaning; they indicate whether the raw score is above or below the mean. Because we need to know this, we cannot ignore the signs without losing information about the data. However, negative signs can be eliminated in another way. If we simply square the deviations, the squared values are all positive. Then we can calculate the aver-age of the squared deviations. This process forms the basis for how variability is typically quantified statistically and is more useful than the range, interquar-tile range, and sum of absolute deviations. The variance is the average of the squared deviations from the mean. The symbol V is used for variance in this text; other texts may use the symbol S2 as follows: V =Σ(x-line over x)^2 / X . Variance is represented in algebraic terms 2 (5.02) Table 5.2 shows how the variance is determined for the previous examples of X and Y. Note that this is a sums of squares type of calculation. That is, as part of the calculation, we add up squared values. This is common in many statistical calculations, and we revisit it in subsequent chapters. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Simple Frequency Distributions
Determining the Percentile From the Score What is the percentile rank of a student who made seven baskets? To answer this question, we need to compute the fraction of scores that are equal to or lessthan 7. We do so by adding the numbers in the frequency column (f) from the score (X) of 7 down to the bottom. If several percentile calculations are to be performed, it is helpful to create a cumulative frequency column. The cumulative frequency column in table 3.2 indi-cates that 43 people made seven or fewer baskets. Sixty persons took the test, so 43/60 of the students made seven or fewer baskets. Converted to decimals, 43/60 = .716, and .716 × 100 = 71.6. Therefore, a person who scored seven baskets ranks equal to or better than about 72% of those who took the test.
External validity
External validity refers to the ability to generalize the results of the experiment to the population from which the samples were drawn. If a sample is not random, then it may not represent the population from which it was drawn. It is also possible that other factors (intervening variables) that were controlled in the experiment may not be controlled in the population. The very fact that the experiment was tightly controlled may make it difficult to generalize the results to an actual situation in which these variables are left free to influence performance. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Internal validity
Internal validity refers to the design of the study itself; it is a measure of the control within the experiment to ascertain that the results are due to the treatment that was applied. Sometimes when people take motor skill tests, they improve their performance simply by taking the test twice. If a pretest-treatment-posttest design is conducted, the subjects may show improvements on the posttest because they learned specific test-taking techniques while taking the pretest. If these improvements are attributed to treatment when in fact they are due to practice on the pretest, an error has been made. To find out if the changes are due to the treatment or to practice on the test, the researchers could use a control group. The control group would also take both the pre-and posttest but would not receive the treatment; the control subjects may or may not show posttest improvement. Analyzing the posttest differences between the experimental group and the control group helps us sort out how much improvement is due to (a) the treatment and (b) the learning effect from the pretest. A design of (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Parametric
Interval and ratio scales are classified as parametric.
Nonparametric
Nominal and ordinal scales are called nonparametric because they do not meet the assumption of normality. I (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Two tail test
One way to state the null hypothesis is to predict that the difference between two population means is zero (X - 1 − X - 2= 0) and that small differences in either direction (plus or minus) on the sample means are considered to be chance occurrences. The direction, or sign, of the difference is not important because we do not know before we collect data which mean will be larger. We are simply looking for differences in either direction. Under these conditions, the null hypothesis is tested with a two-tailed test (see figure 7.4). If α = .05, the 5% rejection area is divided between the two tails of the curve; each tail includes 2.5% of the area under the curve (α/2). Use the null hypoth-esis and a two-tailed test when prior research or logical reasoning does not clearly indicate that a significant difference between the mean values should be expected.
Rectangular curves
Rectangular curves occur when the frequency for each of the scores in the middle of the data set is the same (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Skewed curve
Sometimes the data result in a curve that is not normal; that is, the tails of the curve are not symmetrical. When a disproportionate number of the subjects score toward one end of the scale, the curve is skewed. The data in table 2.6 for paral-lel bar dips show that a larger number of the subjects scored at the bottom of the scale than at the top. A few stronger subjects raised the average by performing 30 or more dips. When the data from table 2.6 are plotted, as in figure 2.7, the hump or mode of the curve is pushed to the left, and the tail on the right is longer than the tail on the left. The curve has a positive skew because the long tail points in a positive direction on the abscissa.negative skew because the long tail points in the negative direction on the abscissa. Chapter 6 presents a method for calculating the amount of skewness in a data set. It is fairly (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
One tail test formula !
Sometimes the review of literature or reasoned logic strongly suggest that a dif-ference does exist between two mean values. The researcher is confident that the direction of the difference is well established but is not sure of the size of the differ-ence. In this case, the researcher may test the research hypothesis (H1 ), but such a situation is rare. The evidence suggesting the direction of the mean difference must be strong to justify testing H1 . The opinion of the investigator alone is not sufficient. - The researcher predicts that two population means are not equal. By conven-tion, the mean expected to be larger is designated as X 1 . Because the first mean is predicted to be greater than the second, the direction of the difference is established as positive Because the difference to be tested is always positive (X1>X2) we are only interested in the positive side of the normal curve. If an observational comparison of sample means shows X2 to be larger than (even by the slightest amount) or equal 2 is rejected. The one-tailed test places the full 5% of the alpha area representing error at one end of the curve (see figure 7.5). The Z score that represents this point (1.65) is lower than the Z score for a two-tailed test (1.96). Therefore, it is easier to find significant differences when a one-tailed test is used. For this reason, the one-tailed test is more powerful, or more likely to find significant differences, than the two-tailed test if the direction of your hypothesis is correct. If X1<X2 you have zero power to detect that difference. In practice, journal editors and reviewers generally frown upon one-tailed tests, and most tests are two-tailed. However, most research is not conducted in a theoreti-cal vacuum and researchers typically think in a one-tailed way. Further, sometimes researchers may care only whether one treatment works better than another. Say a therapist is examining a new therapy and comparing it with the current standard of care. The new therapy would be adopted only if it was shown to be superior to the old therapy. An equal or worse performance by the new therapy would not lead to its adoption. Here a one-tailed test seems defensible.
Class of data- frequency data
Subjects are simply classified into one of the categories and then counted. Data grouped this way are sometimes called frequency data because the scale indicates the frequency, or the number of times an event happens, for each category. For example, a teacher classified students as male or female and then counted each category. The results were 17 males and 19 females. The values 17 and 19 represent the frequencies of the two categories, male and female.
Standard deviation
The calculation of variance shown in table 5.2 suggests that Y is 25 times more variable than X (50/2 = 25), whereas we previously concluded from the range and interquartile range that Y was only 5 times more variable than X. This discrepancy is the result of squaring the deviation scores. To bring the value for the variance in line with other measures of variability (and with the unit values of the original raw data), we compute the square root of the variance. The resulting value is called the standard deviation because it is standardized with the unit values of the original raw data. The standard deviation is the square root of the average of the squared deviations from the mean (i.e., it is the square root of the variance). This definition applies to a population of scores; the standard deviation of a sample is discussed later in this chapter. These values are now consistent with the range and interquartile range because the standard deviation of Y is five times as large as the standard deviation of X (1.4 × 5 = ~7.1). This statistic gives an accurate and mathematically correct description of the variability of the group while considering each data point and its deviation from the mean. The standard deviation is very useful because many advanced statistical techniques are based on a comparison of the mean as the mea-sure of central tendency and the standard deviation as the measure of variability. The method just described for calculating the standard deviation is called the definition method; it is derived from the verbal definition of standard deviation. But this is a cumbersome and lengthy procedure, especially when N is large and the mean is not a whole number. Under these conditions a high probability of math-ematical error exists during the calculation. When N is large, the use of a computer is necessary to save time and eliminate arithmetic errors. Calculating Standard Deviation for (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Central Limit Theorem
The central limit theorem can be defined in a variety of ways. However, explained simply, a sum of random numbers becomes normally distributed as more and more of the random numbers are added together (Smith, 1997). To see how this works, imagine that a random number generator spit out 20 numbers and you calculated and saved the mean value of those 20 numbers. Then you repeated this process over and over so you had a lot of those means sitting in a pot. If you then made a frequency distribution of those means, it would be about normal (Gaussian). The ubiquity of the normal distribution stems from the idea that randomness leads to Gaussian distributions. The central limit theorem is a key underpinning of our use of statistics. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Class of data-ratio
The most complete scale of measurement is the ratio scale. This scale is based on order, has equal distance between scale points, and uses zero to represent the absence of value. All units are equidistant from each other, and proportional, or ratio, comparisons are appropriate. All measurements of distance, force, or time are based on ratio scales. Twenty kilograms is twice the mass of 10 kilograms, and 100 meters is twice as far as 50 meters. A negative score is not possible on a ratio scale. A person cannot run a race in negative seconds, weigh less than 0 kilograms, or score (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Normal curve
The most widely known curve in statistics is the normal curve. This uniquely shaped curve, which was first described by mathematician Karl Gauss (1777-1855), is sometimes referred to as a Gaussian curve, or a bell-shaped curve. Nature generally behaves according to rule. Karl Friedrich Gauss discovered that fact and formulated his discovery in a mathematical expression of normal distribution . . . and this curve has ever since become the sine qua non of the statistician. (Leedy, 1980, p. 25) A normal curve is characterized by symmetrical distribution of data about the center of the curve in a special manner. The mean (average), the median (50th percentile), and the mode (score with the highest frequency) are all located at the middle of the curve. The frequency of scores declines in a predictable manner as the scores deviate farther and farther from the center of the curve. All normal curves are bilaterally symmetrical and are usually shaped like a bell, but not all symmetrical curves are normal. When data are identified as normal, the special characteristics of the normal curve may be used to make statements about the distribution of the scores. Many of the variables measured in kinesiology are normally distributed, so kinesiology researchers need to understand the norm curve and how it is used. Chapter 6 discusses the special characteristics of the normal curve in detail. A typical normal curve is presented in figure 2.4. Notice that the two ends, or E5322/Vincent/404953/Fig. 2.4/JG/R3-kh tails, of the curve are symmetrical and that they represent the scores at the low and high extremes of the scale. When scores approach the extreme values on the data scale, the frequency declines. This is demonstrated by the data represented in table 2.4. Most subjects score in the middle range. We extensively use the normal distribution in statistics, which suggests that the distribution turns up a lot in nature. Of course, we could be using the normal distribution when we shouldn't; that is, maybe things that we think are Gaussian really aren't. Nonetheless, the commonness of the normal distribution in statistics is based on the central limit theorem (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Type 1 error
Traditional statistical hypothesis testing sets up a binary decision process. We either accept or reject the null hypothesis. When we perform our calculations, we get a p value; this value tells us the probability that we could have obtained the data that we did if H0 is true. We then reject H0 small is too small? That is defined a priori by your alpha level. Because we are making our decision regarding H0 if we get a p value that is too small. How based upon probabilities, we know that we may be wrong. Table 7.2 is the Orwellian-sounding truth table. For any statistical decision, either we make a correct decision or we make an error. If we reject H0 Correct decision Type I error (α) , we have either correctly rejected H0 H0 false Type II error (β) Correct decision or we have committed what is called a type I error. If we commit a type I error, we have said that the independent variable affects the dependent variable, or that a relationship between the inde-pendent variable and the dependent variable exists, when in reality no such effect or relationship exists. This is a false positive
U-shaped curve
U-shaped curve is the result of a high frequency of values at the extremes of the scale and a low frequency in the middle. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Mesokurtic
When most scores fall in the midrange and the frequency of the scores tapers off symmetrically toward the tails, the familiar bell-shaped curve occurs. This is referred to as a mesokurtic (meso meaning middle and kurtic meaning curve) curve. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Alpha
area under the normal curve that represents the probability of error is called alpha (α). Alpha is the level of chance occurrence. In statistics, this is sometimes called the error factor (i.e., the probability of being wrong because of chance occurrences that are not controlled). Alpha is directly related to Z because it is the area under the normal curve that extends beyond a given Z value. Remember that standard error of the mean is a standard deviation on a normal curve. From our previous sit-and-reach example, the standard error was 1.4 centi-meters. If we include 2 SE above and below μ [35 ± (2 × 1.4), or 32.2-37.8 cm], we increase our level of confidence from about 68% to better than 95% and decrease the error factor from about 32% (p < .32) to about 5% (p < .05). To be completely accurate when we use the 95% LOC, or p < .05, we should not go quite as far as 2 Z scores away from the mean. In table A.1 in appendix A, the value in the center of the table that represents the 95% confidence inter-val is 47.50 (95/2 = 47.50, because table A.1 represents only half of the curve). This corresponds to a Z score of 1.96. The value 1.96 is the number of Z scores above and below the sample mean that accurately represents the 95% LOC, or p = .05. The correct estimate of μ at p = .05 is 35 ± (1.96 × 1.4), or 35 ± 2.7 centimeters (32.3-37.7). (Vincent)
Floor effect
can occur when there is a limit on how low scores can go and scores tend to bunch around that limit. For example, grades in a graduate-level statistics course cannot be higher than 100% (assuming no extra credit), and we would expect most stu-dents to score above 80% on a typical exam, but (hopefully) only a small number of students would score below 70%. Here we might anticipate ceiling effects as scores tend to bunch above 80% and produce a negative skew to the data. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Confidence interval formulas!!!
confidence interval (CI). In our example, the 68% confi-dence interval is about 33.6 to 36.4 centimeters and the 95% CI is about 32.3 to 37.7 centimeters. A similar calculation could be made for the 99% LOC by using table A.1 to find the value that reads 49.5 (99/2 = 49.5). This exact value is not found in the table. Because 49.5 is halfway between 49.49 and 49.51 in the table, we choose the higher value (49.51), which gives us slightly better odds. The Z score cor-relate of 49.51 is 2.58. To achieve the 99% LOC, we multiply the standard error of the mean by ±2.58. The estimate of the population mean at the 99% LOC (p = .01) is 35 ± (2.58 × 1.4) or 35 ± 3.6 centimeters. Thus, the 99% CI is 31.4 to 38.6 centimeters. This may be expressed as 31.4 ≤ μ ≤ 38.6, p = .01. Likewise, we could establish the 90% LOC by looking up 45% (90/2 = 45) in table A.1. The Z score correlate of 45% is 1.65, so μ = 35 ± 1.65 × 1.4, or 35 ± 2.3 centimeters (p = .10). Note that percent values from table A.1 are rounded to thousandths. The level of confidence (chances of being correct) and probability of error (chances of being incorrect) always add to 100%, but by tradition the level of confidence is reported as a percentage and the probability of error (p) is reported as a decimal. The Z values to determine p at the most common levels of confidence are listed in table 7.1. By far, the most common level of confidence used is the 95% CI. Other values may be determined for any level of confidence by referring to table A.1 in appendix A. The generalized equation for determining the limits of a population mean based on one sample for any level of confidence is as follows: μ = X - ± Z (SEM ), (7.02) where Z is a Z score that will produce the desired probability of error (i.e., Z = 1.65 for p = .10, 1.96 for p = .05, and 2.58 for p = .01).
Skewness and kurtlsis
d. Consequently, it is critical that we know whether the data deviate from normality. Skewness is a measure of the bilateral symmetry of the data, and kurtosis is a measure of the relative peakedness of the curve of the data. By observing a graph of the data and identifying the three measures of central tendency, we can get a general idea of the skewness of the data; however, this method is not exact (see figure 4.1 on p. 56). Using Z scores, we can obtain a numerical value that indicates the amount of skewness or kurtosis in any set of data. Because Z scores are a standardized measure of the deviation of each raw score from the mean, we can use Z scores to determine whether the raw scores are equally distributed around the mean. When the data are completely normal, or bilaterally symmetrical, the sum of the Z scores above the mean is equal but opposite in sign to the sum of the Z scores below the mean. The positive and negative values cancel each other out, and the grand sum of the Z scores is zero. If we take the third moment (the cube of the Z scores, or Z3 ), we can accentu-ate the extreme values of Z, but the signs of the Z values remain the same. This places greater weight on the extreme scores and permits a numeric evaluation of the amount of skewness. Computing the average of the Z3 scores produces a raw score value for skewness. The formula for calculating the raw value for skewness is skewness =ΣZ^3/N When the Z^3 mean is zero, the data are normal .When Z^3 mean is positve, the data are skewed positive. When Z^3 is negative, the data are skewed negative. This effect can be seen by examining the data presented in table 6.1. Notice that the data are skewed negative. When these data are graphed (see figure 6.7), the skewness is easily observed. Kurtosis may also be calculated from Z scores. By taking the fourth moment of the Z scores, the extreme Z values are again accentuated but the signs are all converted to positive. When the average of the Z4 Z4 To make the units equal for both skewness and kurtosis, the mean of Z4 kurtosis = ΣZ value is 3.0, the curve is normal. is typically reduced by 3.0. The formula for calculating the raw value for kurtosis is ={ΣZ^4/N}-3.0 A score of 0 indicates complete normal kurtosis, or a mesokurtic curve, just as a score of 0 for skewness indicates complete bilateral symmetry. When the raw score for kurtosis is greater than 0.0, the curve is leptokurtic (more peaked than normal), and when the raw score is less than 0.0, the curve is platykurtic (more flat than normal). Raw skewness and kurtosis scores are not easily interpreted because a raw score alone does not indicate a position on a known scale. But when raw scores are converted to Z scores, they are easy to interpret. To convert the raw scores for skewness (equation 6.04) or kurtosis (equation 6.05) to Z scores for skewness or kurtosis, we divide the raw scores by a factor called the standard error. Standard error is a type of standard deviation. (We explore standard error in more detail in subsequent chapters.) Therefore, when we divide the raw skewness and kurtosis scores by the appropriate standard error, the result is a Z score the standard error (SE) for skewness is SEskew =Square root of 6/N and the standard error for kurtosis is SEkurt = Square root of 24/N
Percentile
defined as a point or position on a continuous scale of 100 theoretical divisions such that a certain fraction of the population of raw scores lies at or below that point. A score at the 75th percentile is equal to or surpasses three-fourths of the other scores in the raw data set. The 33rd percentile is equal to or better than about one-third of the scores but is surpassed by two-thirds of the scores. A student's percentile score on a test indicates how the student's score compares with a perfect score of 100%. Percentiles ar (Vincent 38) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Dependent data
dependent variable is dependent on the effects of one or more other variables. In an experimental study, the dependent variable is that which is measured. Under these conditions, the researcher examines the effect of the independent variable on the dependent variable. In the creatine study, the dependent variable might be peak power from a Wingate anaerobic power test. The researcher then might test the hypothesis that creatine supplementation causes an increase in anaerobic power. If the study is well designed and controlled, and if the creatine-supplemented subjects score higher (on average) than the placebo subjects, then the hypothesis is supported. For the stretching example in the previous paragraph, the dependent variable might be lower back and hamstring flexibility as measured by the sit-and-reach test. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Independent data
e independent variable refers to the variable that is manipulated or controlled by the researcher. This is sometimes called the treatment. A sports nutrition researcher might randomize subjects to receive either a creatine supplement or placebo. Here we might call this independent variable "supplement condition" and say that it has two levels (creatine vs. placebo). Similarly, a researcher might be interested in examining the effect of stretching duration on range of motion of the lower back and hamstrings. Subjects could be randomized to one of three groups that perform a hamstring stretch for 15, 30, or 60 seconds. The independent variable might be called stretching duration and it has three levels. A dependent variable is dependent on the effects of one or more other (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Determing percentile from the score
hat is the percentile rank for a raw score of 6 baskets? Remember, a percentile is a point on a continuous scale of 100 theoretical divisions such that a certain fraction of the population of raw scores lies at or below that point. To calculate the percentile for a score of 6 baskets, we need to determine what fraction of scores falls at or below 6 baskets. Percentiles are not based on the value of the individual scores but on the order of the scores. In table 3.1, the value of the bottom score could be changed to 2 without affecting the percentile divisions because it would still be in the same order in the group. The question we must ask is this: How many scores fall at or below 6 in the ordered list of scores? Counting from the score of 6 down, we note that nine scores fall at or below 6. There are 15 scores, so the person who scored 6 baskets is 9/15 of the way from the bottom of the scale to the top. This fraction is first converted into a decimal (9/15 = .60), and then the decimal is multiplied by 100 to convert it into a percent (.60 × 100 = 60%). Thus, a score of 6 on this test falls 60% of the way from the bottom of the scale to the top and is classified as the 60th percentile. Any single score could be converted in a similar manner, but several scores were obtained by more than one person. For example, three persons each received a score of 5. What is their percentile rank? Using the method described previously, we could calculate that the first score of 5, the sixth score from the bottom, rep-resents the 40th percentile (6/15 = .40, .40 × 100 = 40%). But the top score of 5 is eight scores from the bottom and represents approximately the 53rd percentile (8/15 × 100 = 53.3%). Do we then conclude that the persons who each made five baskets scored between the 40th and the 53rd percentile ranks? No. Because we define percentile as a fraction at or below a given score, we conclude that all are equal to or below a score of 5. And because they all performed equally, they should all receive the same percentile as a fraction of scores at or below a given score. Therefore, all three have a percentile score of 53.3.
Class of data-interval
interval scale has equal units, or intervals, of measurement—that is, the same distance exists between each division of the scale—but has no absolute zero point. Because zero does not represent the absence of value, it is not appropriate to say that one point on the scale is twice, three times, or half as large as another point. The Fahrenheit scale for measuring temperature is an example of an interval scale: 60° is 10° hotter than 50° and 10° cooler than 70° (the intervals between the data points are equal), but 100° is not twice as hot as 50°. This is because 0° does not indicate the complete absence of heat. In athletics, interval scores are used to judge performances in sports such as ice skating, gymnastics, diving, and synchronized swimming. A 9.0 in gymnastics is halfway between 10.0 and 8.0, but it is not necessarily twice as good as a 4.5. A score of 0 does not mean the absence of skill; it means that the performance did not contain sufficient skill to be awarded any points. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
level of confidence .
level of confidence (LOC) is a percentage figure that establishes the probability that a statement is correct. It is based on the characteristics of the normal curve. In the previous example, the estimate of the population mean (μ) is accurate at the 68% LOC because we included 1 SEM (i.e., 1 Z) above and 1 SEM below the predicted population mean. However, if there is a 68 (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Measurement through Objectivity
means that the data are collected without bias by the investigator. Bias can be detected by comparing an investigator's scores against those of an expert or panel of experts. Objectivity is sometimes referred to as inter-rater reliability
bimodial curve
mode is the score with the highest frequency. On the normal curve, a single mode is always in the middle, but some distributions of data have two or more modes and hence are called bimodal, or multimodal. If one mode is higher than the other, the modes are referred to as the major and minor modes. When such data distributions are plotted, they have two or more humps representing the cluster-ing of scores. Bimodal curves are not normal curves. A bimodal curve is shown in figure 2.6. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
DDetermining the Score From the Percentile
most common computation in working with percentiles is calculating the per-centile from the raw score, but sometimes the opposite calculation is required. If a coach determines that the 60th percentile is the cutoff point on a test for selecting athletes for a varsity team, the coach needs to know what raw score is represented by the 60th percentile. Frequently, grades or class divisions are made on the basis of percentiles. A class may be divided into thirds, with the top third receiving one practice schedule, themiddle third another, and the bottom third still another. To determine which students should be included in each group, we must establish the raw scores equivalent to the percentile points. The technique of determining the raw score that matches a given percentile point is the opposite of that for finding the percentile from the raw score. For example, the basketball coach who collected the free throw data (see table 3.1) can take the top two-thirds of the team to away games. If the coach uses the free throw score as the criterion for determining who makes the traveling squad, how many baskets must a player make to qualify? The coach must ask, "Which score defines the bottom third of the team?" All players with scores above that point are in the top two-thirds and qualify for the traveling team. This is a simple problem with 15 players because it is easy to determine that the top 10 players represent the upper two-thirds of the team. But by thinking through the process with simple data we learn the concepts that may be applied to more difficult data. The percentile equivalent to 1/3 is found by converting the fraction to a deci-mal (1/3 = .333) and multiplying this decimal equivalent by the total number of subjects in the group (.333 × 15 = 5). This value (5) is the number of scores from the bottom, not the raw score value of 5 baskets made. To determine the raw score that is equivalent to a given percentile (P), convert the percentile to a decimal, multiply the decimal equivalent by the number of scores in the group (N), and count that many scores from the bottom up. Counting up five scores from the bottom, we find that a score of 4 free throws represents a percentile score of 33.3. Any player who made four or fewer free throws is included in the bottom third of the group, and any player scoring more than four free throws is included in the top two-thirds of the group. So five free throws is the criterion for making the traveling squad. With small values of N and discrete data, it may be necessary to find the score closest to a given percentile if none of the raw scores falls exactly at that point. If the product of P × N is not an integer, round off to the nearest integer to determine the count from the bottom. In table 3.1, the 50th percentile is 7.5 scores from the bottom (0.50 × 15 = 7.5), but there are no half scores. So we round 7.5 to 8 and count 8 scores from the bottom. The eighth score up is 5. Therefore, 5 is the closest score to the 50th percentile.
Class of data-nominal scales
nominal scales, subjects are grouped into mutually exclusive categories without qualitative differentiation between the categories. Subjects are simply classified into one of the categories and then counted. (Vincent) Some nominal scales have only two categories, such as male or female, or yes and no. Others, such as an ethnicity scale, have more than two divisions. Nominal scales do not place qualitative value differences on the categories of the variable. However, numbers are assigned to the categories. The size of the number does not indicate an amount, but rather is used to indicate category assignment in a data file. For example, we might assign the number 0 to males and 1 to females. Here, the choice of number is completely arbitrary.
Ceiling effect
occurs when there is a limit on how high scores can be and scores tend to bunch around that limi (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Leptokuric
opposite of the platykurtic curve is a leptokurtic curve, which results when the range of the group is limited and many scores are close to the middle. The differences among mesokurtic, platykurtic, and leptokurtic curves are shown in figure 2.5. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Z scores
raw score expressed in standard deviation units. If the standard deviation of the scores in figure 6.2 is 25, then 1 standard deviation unit is equivalent to 25 pounds on the raw score scale. A score of 200 lies 25 raw score units, or 1 standard devia-tion unit or 1 Z score, above the mean. The raw score of 200 is equivalent to a Z score of +1. Likewise, a raw score of 150 (25 raw units, or 1 standard deviation unit, below the mean) has a Z score of −1. A unique characteristic of the normal curve is that the percentage of area under the curve between the mean and any Z score is known and constant. When the Z score has been calculated, the percentage of the area (which is the same as the percentage of raw scores) between the mean and the Z score can be determined. In any normal curve, 34.13% of the scores lies between the mean and 1 Z score in either the positive or negative direction. Therefore, when we say that most of the population falls between the mean and ±1 Z score, we are really saying that 68.26% (2 × 34.13 = 68.26), or about two-thirds, of the population falls between these two limits. This is true for any variable on any data provided that the distribution is normal. Figure 6.3 demonstrates this concept.
J-curve
results when frequency is high at one end of the scale, decreases rapidly, and then flattens and reduces to almost zero at the other end of the scale. This curve is different from a straight line, in which frequency decreases uniformly from one end of the scale to the other. J-curves can be positive or negative in direc-tion, depending on the orientation of the lower tail of the curve. If the tail points to the positive end of the X-axis, the curve is positive. J-curves may also be inverted. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Standard score
standard score is a score that is derived from raw data and has a known basis for comparison. In percents, the center is 50% and the range is 0 to 100%. A middle-aged male may be able to consume 40 milliliters of oxygen per kilogram of body weight per minute. This is the raw score. Without more information, the score is difficult to evaluate. But if we compare this value with other values in the population of all males of like age and calculate a percentile score of 65, then we know that the man's oxygen consumption is equal to or better than that of 65% of people in that population. Raw scores are measured values. Standard scores, which are derived from raw scores, provide more information than do raw scores. Standard scores allow us to (a) evaluate raw scores and (b) compare two sets of data that are based on different units of measurement. Which is the better score, 150 feet on the softball throw or 30 sit-ups in a minute? With only this information, it is impossible to tell. We must ask, What was the range? What was the average? But when we compare a percentile score of 57 on the softball throw with a per-centile score of 35 on the sit-up test, the answer is clear. Not only are the relativevalues of the scores apparent, but we also know the middle score (50) and the range (0-100). The student who received these scores is better on the softball throw than on sit-ups when compared with the other students in the class. The conversion of raw scores to standard scores, of which percentiles are just one example, is a common technique in statistics. It is most useful in evaluating data. (Chapter 6 discusses other standard scores.) Percentiles may present a problem of interpretation when we consider the extreme ends of the scale for data sets with large numbers of scores. The problem results from imposing a percentile scale on interval or ratio data. For example, figure 3.1 shows a graph of the number of sit-ups performed in 1 minute by 1,000 seventh-grade boys. Both the number of sit-ups (raw scores) and the percentile divisions have been plotted. Note that a score of 30 sit-ups is equal to the 50th percentile. If a boy scores 30 sit-ups on the first test and increases his raw score to 35 on a second trial, his percentile score would increase from 50 to about 75, or 25 percentage points. However, if he was already at the upper end of the scale and made the same 5 sit-up improvement from 45 to 50, his percentile would increase only about three points, from 97 to near 100. This phenomenon is an example of the ceiling effect. Teachers and coaches must account for this when they grade on improvement. It is much easier to reduce the time it takes to run the 100-meter dash from 12.0 seconds to 11.5 seconds than it is to reduce it from 10.5 seconds to 10.0 seconds. An improvement of 0.5 seconds by a runner at the higher level of performance (i.e., lower-time scores) represents more effort and achievement than does the same time improvement in a runner at the middle of the scale. Learning curves typically start slow, accelerate in the middle, and plateau as we approach high-level performance. The ceiling effect is demonstrated by the plateau in the curve. It is more difficult to improve at the top of the learning curve than at the beginning or in the middle. This difficulty in interpretation should not deter us from using percentiles, but we must recognize the ceiling effect when considering scores that represent high levels of performance, particularly if improvement is the basis for evaluation. (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file. Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.
Type 2 error
statistical decision may have been to fail to reject H0 . This may be correct, or you may have com-mitted what is called a type II error. A type II error occurs when you accept H0 when H0 is false. This is a false negative. You miss an effect or relationship when one exists.
Type 1 and 2 error
statistical decision may have been to fail to reject H0 . This may be correct, or you may have com-mitted what is called a type II error. A type II error occurs when you accept H0 when H0 is false. This is a false negative. You miss an effect or relationship when one exists. Keep in mind a couple of points when thinking about type I and type II errors. First, you can commit only one type of statistical error for any statistical decision. If your statistical decision is to reject H0 . Similarly, if your statistical decision is to fail to reject H0 . If you accepted H0 , either you have made a correct decision , then either . Second, were right or you made a type II error. If you rejected H0 , you know that either you , you know that either you or you have committed a type I error. You cannot commit a type II error if you have rejected H0 you have made a correct decision or you have made a type II error. You cannot commit a type I error if your statistical decision was to fail to reject H0 you will not know whether you have committed an error. All you will know is that you either accepted or rejected H0 were right or you committed a type I error. Notice in table 7.2 that type I error is associated with the term α and type II error is associated with the term β. These are probabilities. α was introduced earlier this chapter as a probability of error. More specifically, it is the probability, or risk, of committing a type I error, if the null hypothesis is true. Beta (β) is the probability, or risk, of committing a type II error, if the null hypothesis is false. Remember that you set alpha, which is typically set at .05. When you set alpha at .05, you indicate that you are willing to take a 5% chance of committing a type I error (if H0 is true). You can also set beta; we will come back to this later. Statistical power, related to β, is the probability of rejecting H0 when H0 is false. Power is the probability of finding an effect or relationship if one exists. It is mathematically defined as 1 − β. Power is a good thing. You can do two things to increase power in a study. First, you can control the noise in the data; we talk about noise and ways to minimize it in subsequent chapters. Second, you can increase sample size. The larger the N, the more statistical power you will have. Here is an overview of how the process works. You set α (typically at .05, but no rule says that it has to be .05). You run your statistical analysis and calculate a p value. The p value is the probability that you would have obtained the data that you did if the null hypothesis is true. If your p value is less than or equal to α, your statistical decision is to reject H0 . In statistical jargon, when H0 is rejected we is true is often say that the result is statistically significant, or sometimes just say that it is significant. That is, the probability of obtaining the data that you did if H0 less than or equal to the risk you were willing to make of committing a type I error. In contrast, if your p value is greater than α, you accept H0 . Here we might say that the result is not statistically significant. As is discussed in subsequent chapters, just because a result is statistically significant does not necessarily mean that the effect is of practical significance. An effect may be real in the sense that is it greater than zero, but it might still be trivially small in terms of usefulness. Neophyte researchers are sometimes accused of making type I errors in their zeal to find significant differences. But failure to find a difference does not render the research worthless. It is just as important to know that differences do not exist as it is to know that differences do exist. The experienced and competent researcher is honestly seeking the truth, and a correct conclusion from a quality research project is valuable even if the research hypothesis is rejected. Table 7.3 demonstrates the conditions under which type I and type II errors may be made. The dilemma facing the researcher is that one can never absolutely know which, if either, type of error is being made. The experimental design provides the means to determine the odds of each type of error, but complete assurance is never possible. The researcher must decide which type of error is the most costly and then protect against it. If concluding that a difference exists when it does not (a type I error) is likely to risk human life or commit large amounts of resources to a false conclusion, then this is an expensive error and should be avoided. But if differ-ences do exist, and the study fails to find them (a type II error), consumers of the research will never be able to take advantage of knowledge that may be helpful in solving problems. TABLE 7.3 (Vincent)
platykuric curve
the results have a very wide range of scores with low frequencies in the midrange, the curve is called platykurtic, or flat. The opposite of the platykurtic curve is a leptokurtic curv (Vincent) Vincent, William J. Statistics in Kinesiology, 4th Edition. Human Kinetics, Inc., 2012-04-01. VitalBook file.