Descriptive Statistics (Continued)
Considering the area under the normal curve in terms of probabilities is
very useful for researchers.
Many kinds of standard scores exist other than
z and T scores.
Box plots
(sometimes called box and whiskers diagrams) - can graphically display the five-number summary of the distribution.
outliner
scores or measurements that differ by such large amounts from those of other individuals in a group that they must be given carefully consideration as special cases.
The normal curve is
symmetrical and bell shaped .
The median is the most appropriate average to calculate when
the data result in a skewed distribution.
scatter plots illustrate
the degree of relationship between variables.
Consists of the lowest score, Q1, the median, Q3, and the highest score. The inter-quartile range (IQR) is
the difference between the third and first quartiles (Q3 - Q1 = IQR).
In the normal distribution, the mean, median, and mode are identical so the mean falls at
the exact center of the curve.
In all 99.7%of the observations fall within
three standard deviations of the mean.
Five number summary
useful way to describe a skewed distribution.
What would the z score be for a raw score of 52 in a distribution with a mean raw score f 50 and a standard deviation of 2?
z = +1
Modes do not and is not
- do not present very much useful information . - is not often used in educational research.
Median in short the
mid point
Negative
relationship indicated when high scores on one variable are accompanied by low scores on the other variable or when low scores on one are accompanied by high scores on the other variable.
When a distribution is skewed, the shape of the distribution and the variability can be described by
reporting several percentiles.
in Norma distribution
the large majority of the scores are concentrated in the middle and the scores decrease in frequency the further away they are.
Almost all the scores in a normal distribution lie between
the mean and plus or minus three standard deviations.
The standard deviation and its "brother",
the measure the spread of scores from the mean
Median
the point below and above which 50 percent of the scores in a distribution fall.
If a distribution is normal and we know the mean and the standard deviation of the distribution,
we can determine the percentage of scores that lie above or below any given score.
example of Percentile :
you receive a score of 630 on the GRE and receive a percentile of 84 meaning that 84% of those who took this test scored lower than you did.
e.g :We have previously shown that approximately 34 percent of the areas in a normal distribution lie between the mean and 1 SD. Because 50 percent of the scores fall above the mean, roughly 16 percent of the scores lie above 1 SD (50-34 = 16).
If we express 16 percent as a decimal and interpret it a s a probability, we can say that the probability of randomly selecting an individual from the population who has a score 1 SD or more above the mean is .16.
Here we do not just connect a series of dots that represent the actual frequencies of the observed distribution.
Instead, show a generalized distribution of scores not limited to one specific set of data
Look at the following scores: A: 19, 20, 25, 32, 39 B: 2, 3, 25, 30, 75
Mean in both is 27 . Median in both is 25.
The total area under the normal curve represents
all the scores in a normal distribution.
Probability
another important characteristic of the normal distribution is that the percentages associational with the areas under the curve can be thought of as probabilities.
The distribution of some human traits (e.g., height, weight)
approximates this normal curve.
Many other human traits, such as spatial ability, manual dexterity and creativity
are often assumed to do so.
Correlation coefficients
designated by the symbol r expresses the degree of relationship that exists between two sets of scores.
Researchers need show some way to
determine whether a relationship exists in data
Box plots Especially useful for
displaying two or more distributions.
Correlation coefficients range
from -1.00 to + 1.00. The closer we get to either of these extremes the stronger the relationship between the two variables. The closer we get to .00, the weaker the relationship.
If a set of scores is normally distributed, we can interpret any score if we know
how far, in standard deviation units, it is from the mean.
example of Probability :
if there is a probability that an event will occur 25 percent of the time, it is said to have a probability of .25.
An important point about the standard deviation is that
in a normal distribution of scores, the mean plus or minus three standard deviations will encompass about 99 percent of all the scores in the distribution.
pearson product-moment correlation
most commonly used correlation coefficient represented by lower case r.
Mode
most frequent score in a distribution. Score attained by more students than any other score.
Standard Deviation
most useful estimate of variability. It is a single number that represents the spread of a distribution.
To convert a z score to a T score:
multiply the z score by 10 and add 50.
Raw scores below the mean in a distribution convert to
negative z scores, which become awkward.
This is crucial to converting z scores to
percentages and probabilities.
scatter plot
pictorial representation of the relationship between two quantitative variables.
In a distribution that contains uneven number of scores, the median is
the middlemost score (provided the scores are listed in order).
Doing this is based on the assumption that
the trait being scored does distribute according to the normal curve.
Measures of central tendency are useful for summarizing scores in a distribution; but
they are not sufficient.
When actual data do not approximate the curve,
they can be changed to do so.
Because the curve is symmetrical,
50 percent of the scores fall either left or right of the mean.
The median is the
50th Percentile
In a distribution that contains even number of scores, the median is the point halfway between the two middlemost scores. For example in the distribution, 70, 74, 82, 86, 88, 90 the median is
84
The distribution , however, differ considerably in what statisticians call variability so
Averages Can Be Misleading
the Mode ,Median and Mean.
Each of these represents a type of average or typical score attained by a group of individuals on some measure.
Can also compare how an individual's score compares to all the other scores in a normal distribution.
For example: if a person's score lies exactly one standard deviation above the mean, we know that approximately 85 percent of all the other scores in the distribution fall below the individual's score.
Some ways to summarize categorical data:
Frequency Table
In any normal distribution, 68 percent of the scores fall within one standard deviation of the mean.
Half of these, 34 percent, will fall within one standard deviation above the mean and the other half (34 percent) will fall within one standard deviation below the mean.
There needs to be a way to measure the
Spread or variability that exists within a distribution.
Percentile in a set of numbers is a value such that
a certain percentage of the numbers fall below and the rest of the numbers fall above it.
A more common way to describe a numerical distribution is
a combination of the mean (a measure of center) and the standard deviation (a measure of spread).
T scores have
a mean of 50 and a standard deviation of 10.
measures of central tendency (averages) -
enable researchers to communicate scores in a frequency distribution with a single number.
Standard Deviation As with the mean
every score in the distribution is used in its calculation.
All of the percentages associated with areas under a normal curve can be
expressed in decimal form and viewed as probability statements.
Boxplots illustrate another way that
graphs can effectively convey information
Z scores Example: Not all Z scores fall exactly at one or two, etc. standard deviations from the mean. Can use the following formula to calculate these kinds of z scores:
z score = raw score - mean on standard deviation
Another 27 percent of the observed scores fall between one and two standard deviations from the mean.
Hence 95 percent (68 plus 27 percent) fall within two standard deviations of the mean.
Two very different distributions might have the same median: 98, 90, 84, 82, 76 90, 87, 84, 64, 41
Here the median is 84
example of Range
subtract 11 (lowest score) from 89 (highest score) to get a range of 78.
Percentages Under the Normal Curve
This is one of the most useful characteristics of the normal distribution.
A Probability is
a percent stated in decimal form and refers to the likelihood of an event occurring.
The normal curve is based on .
a precise mathematical equation
The Range gives
a quick but rough estimate of variability.
Mean
add up all the scores in a distribution and divide this sum by the total number of scores
One way to avoid negative z scores is to
convert them to T scores.
Researchers often draw a smooth curve instead of .
a series of straight lines in a frequency polygon
A raw score that is exactly one standard deviation above the mean represents
a z score of +1.
A raw score that is exactly at the mean represents
a z score of zero.
Standard Scores
derived score that uses a common scale to indicate how an individual compares to other individuals in a group.
T Scores For example:
z = -2. To convert to a T score multiply -2 by 10 = -20 and add 50 = 30. T score = 30.
the Mode ,Median and Mean.Which is best?
It depends. Mean - only one of the three that uses all the information in a distribution and is generally preferred over the other two. However, the mean is unduly influenced by extreme scores. In these cases the median gives a more accurate indication of the typical score in a distribution.
For example: If the mean of a normal distribution is 100 and the standard deviation is 15, what would be the scores that lay one standard deviation above the mean and one standard deviation below the mean.
One standard deviation above the mean - 115 . One standard deviation below the mean - 85 .
Scores in a distribution might have identical means and medians but
be quite different in other ways.
Range
distance between the highest and lowest scores in a distribution.
The smooth curve is known as
distribution curves.
bimodal distribution -
distribution with two modes
Norma distribution
distributions of data that tend to follow a certain specific shape of distribution.
Researchers use such probability statements to
precisely state the probability of an observed score relative to other scores in a normal distribution.
Z scores permit comparison of
raw scores on different tests.
Positive
relationship indicated when high scores on one variable are accompanied by high scores on the other variable or when low scores on one are accompanied by low scores on the other variable.
Most randomly selected samples will have scores that
resemble the normal distribution.
Frequency Table
shows the frequency with which each type of category is mentioned, for example, on a questionnaire. Frequency and Percentage of Responses to Questionnaire
Z Scores
simplest of the standard scores and represents how far the raw score is from the mean in standard deviation units.
T Scores
simply z scores expressed in a different form.s
Other important percentiles are
the 25th percentile, also known as the first quartile (Q1) and the 75th percentile, the third quartile (Q3).
Three most commonmeasures of central tendency (averages) are
the Mode ,Median and Mean.