SSCI 210-4

¡Supera tus tareas y exámenes ahora con Quizwiz!

Xi

("X-sub-i"), which refers to any single score—the "ith" score.

computing z scores

1. Subtract the value of the mean from the value of the score . 2. Divide the quantity found in step 1 by the value of the standard deviation (s). The result is the Z-score equivalent for this raw score. When we compute Z scores, we convert the original units of measurement (IQ scores, inches, dollars, etc.) to Z scores and, thus, "standardize" the normal curve to a distribution that has a mean of 0 and a standard deviation of 1. The mean of the empirical normal distribution will be converted to 0, its standard deviation to 1, and all values will be expressed in Z-score form.

categories of variables must:

1.Be mutually exclusive (Each case must fit into one and only one category.) 2.Be exhaustive (There must be a category for every case.) 3.Include elements that are similar (The cases in each category must be similar to each other.)

limitations of Q and R

A basic limitation of both Q and R is that they use only two scores and, thus, do not use all of the available information. Also, neither statistic provides any information on how far the scores are from each other or from some central point, such as the mean.

samples

A carefully chosen subset of a population. In inferential statistics, information is gathered from a sample and then generalized to a population.

theory

A generalized explanation of the relationship between two or more variables.

pie chart

A graph used for discrete variables with only a few categories. A circle (the pie) is divided into segments proportional in size to the percentage of cases in each category of the variable. an excellent way of displaying the relative sizes of the categories of a variable The pie chart divides a circle into "slices" proportional to the relative frequencies of the categories.

bar charts

A graph used for discrete variables. Categories are represented by bars of equal width, the height of each corresponding to the number (or percentage) of cases in the category. The categories of the variable are arrayed along the horizontal axis (or abscissa) and frequencies, or percentages if you prefer, along the vertical axis The heights of the bars are proportional to the relative frequencies of the categories and, like pie charts, bar charts are visual equivalents of frequency distributions. if a variable has more than four or five categories, the bar chart would be preferred, since the pie chart gets very crowded with many categories Bar charts are particularly effective ways to display the relative frequencies for two or more categories of a variable when you want to emphasize some comparisons.

histograms

A graph used for interval-ratio variables. Class intervals are represented by contiguous bars of equal width (equal to the class limits), the height of each corresponding to the number (or percentage) of cases in the interval. look a lot like bar charts and, in fact, are constructed in much the same way However, histograms use real limits rather than stated limits, and the categories or scores of the variable border each other, as if they merged into each other in a continuous series.Therefore, these graphs are most appropriate for continuous interval-ratio-level variables, but they are commonly used for discrete interval-ratio-level variables as well.

Line charts (or frequency polygons)

A graph used for interval-ratio variables. Class intervals are represented by dots placed over the midpoints, the height of each corresponding to the number (or percentage) of cases in the interval. All dots are connected by straight lines. similar to histograms but use midpoints rather than real limits to represent the frequencies Dots are placed above the midpoint of each interval, and the height of the dot reflects the number of cases in the interval. Because the line connecting the dots is continuous, these graphs are especially appropriate for continuous interval-ratio-level variables but are frequently used with discrete interval-ratio-level variables as well.

percentile

A point below which a specific percentage of the cases fall. commonly used statistic for reporting position To find a percentile, first arrange the scores in order. Next, multiply the number of cases (N) by the proportional value of the percentile. The resultant value identifies the number of the case that marks the percentile. In terms of percentiles, the median is the 50th percentile

statistics

A set of mathematical techniques for organizing and analyzing data.

hypothesis

A specific statement, derived from a theory, about the relationship between variables.

percentage change

A statistic that expresses the magnitude of change in a variable from time 1 to time 2. useful statistic in measuring social change tells us how much a variable has increased or decreased over a certain span of time To compute this statistic, we need the scores of a variable at two different points in time The scores could be in the form of frequencies, rates, or percentages. percentage change =(f2-f1/f1)*100

normal curve

A theoretical distribution of scores that is symmetrical, unimodal, and bell-shaped. The standard normal curve always has a mean of 0 and a standard deviation of 1. In combination with the mean and standard deviation, it is used to make precise descriptive statements about empirical distributions. Also, the normal curve is central to the theory that underlies inferential statistics. the normal curve is often called the "bell curve," and grading "on the curve" means that the instructor wants the scores to follow a specific pattern: The modal grade is a C and there will be equal numbers of As and Fs, Bs and Ds. In other words, the distribution of grades should look like a bell or a mound. one of the most important uses of the normal curve—the description of empirical distributions based on our knowledge of the theoretical normal curve.

independent variable

A variable that is identified as a cause. The independent variable is thought to cause the dependent variable.

dependent variable

A variable that is identified as an effect or outcome. The dependent variable is thought to be caused by the independent variable.

discrete

A variable with a basic unit of measurement that cannot be subdivided. ex: number of people per household The scores of discrete variables will be 0, 1, 2, 3, or some other whole integer

continuous

A variable with a unit of measurement that can be subdivided infinitely. ex: time we are always approximating and rounding off the scores Because we cannot work with infinitely long numbers, we must report the scores on continuous variables as if they were discrete.

cumulative frequency

An optional column in a frequency distribution that displays the number of cases within an interval and all preceding intervals. These columns allow us to tell at a glance how many cases fall below a given score or class interval in the distribution. These cumulative columns are quite useful in situations where the researcher wants to make a point about how cases are spread across the range of scores

cumulative percentage

An optional column in a frequency distribution that displays the percentage of cases within an interval and all preceding intervals. These columns allow us to tell at a glance how many cases fall below a given score or class interval in the distribution. These cumulative columns are quite useful in situations where the researcher wants to make a point about how cases are spread across the range of scores

research

Any process of gathering information systematically and carefully to answer questions or test theories. Statistics are useful for research projects that collect numerical information or data. Research can take numerous forms, and statistics are relevant for quantitative research projects, or projects that collect information in the form of numbers or data

variable

Any trait that can change values from case to case. examples include gender, age, income, and political party affiliation

frequency distributions

Count the number of times each category or score of the variable occurs and display the frequencies in table format. Note that the table has a title and clearly labeled categories, and it reports the total number of cases (N) at the bottom of the frequency column. These items must be included in all frequency distributions.

Deciles

Deciles divide the distribution of scores into tenths. The first decile is the point below which 10% of the cases fall and is equivalent to the 10th percentile . The fifth decile is also the same as the 50th percentile , which is the same as the median. multiplying N by the proportional value of the percentile, decile, or quartile identifies the case, and it's the score of the case that actually marks the location

statistics can be used to

Demonstrate the connection between smoking and cancer Measure political preferences, including the popularity of specific candidates for office Track attitudes about gay marriage, abortion, and other controversial issues over time Compare the cost of living (housing prices and rents, the cost of groceries and gas, health care, and so forth) between different localities (cities, states, and even nations)

range limitations

First, almost any sizeable distribution will contain some scores that are atypically high and/or low (outliers) compared to most of the scores. Thus, R might exaggerate the amount of dispersion for most of the scores in the distribution. Also, R yields no information about the variation of the scores between the highest and lowest scores.

data

Information expressed as numbers.

Quartiles

Quartiles divide the distribution into quarters, and the first quartile is the same as the 25th percentile , the second quartile is the 50th percentile , and the third quartile is the 75th percentile. multiplying N by the proportional value of the percentile, decile, or quartile identifies the case, and it's the score of the case that actually marks the location

quantitative research

Research projects that collect data or information in the form of numbers.

outliers

Scores that are very high or very low compared to most scores. Distributions with outliers have a skew.

z scores

Standard scores; the way scores are expressed after they have been standardized to the theoretical normal curve. To work with values that are not exact multiples of the standard deviation, we must express the original scores in units of the standard deviation or convert them The original scores could be in any unit of measurement (feet, IQ, dollars), but Z scores always have the same values for their mean (0) and standard deviation (1).

measures of dispersion

Statistics that indicate the amount of variety, or heterogeneity, in a distribution of scores. describe the variety, diversity, or heterogeneity of a distribution of scores

measures of central tendency

Statistics that summarize a distribution of scores by reporting the most typical or representative value of the distribution.

measures of association

Statistics that summarize the strength and direction of the relationship between variables. help us disentangle the connections between variables and trace the ways in which some variables might affect others predict scores on one variable from scores on another. Measures of association can give us valuable information about relationships between variables and help us understand how one variable causes another.

data reduction

Summarizing many scores with a few statistics. basic goal of univariate descriptive statistics

mode

The most common value in a distribution or the largest category of a variable. The mode is a simple statistic, most useful when you are interested in the most common score and when you are working with nominal-level variables. the mode is the only measure of central tendency for nominal-level variables You should note that the mode has several limitations. First, distributions can have no mode at all (when all scores have the same frequency) or so many modes that the statistic becomes meaningless. Second, the modal score of ordinal or interval-ratio variables may not be central to the distribution as a whole (most common does not necessarily identify the center of the distribution).

rates

The number of actual occurrences of some phenomenon or trait divided by the number of possible occurrences per some unit of time. Rates are usually multiplied by some power of 10.

three characteristics of the mean

The Mean Balances All the Scores. The Mean Minimizes the Variation of the Scores. called the "least-squares" principle, a characteristic that is expressed in the statement or: The mean is the point in a distribution around which the variation of the scores (as indicated by the squared differences) is minimized. If the differences between the scores and the mean are squared and then added, the resultant sum will be less than the sum of the squared differences between the scores and any other point in the distribution. This least-squares principle underlines the fact that the mean is closer to all of the scores than the other measures of central tendency. Also, this characteristic of the mean is important for the statistical techniques of correlation and regression, topics we take up toward the end of this text. The Mean is Affected by All Scores and Can Be Misleading if the Distribution Has "Outliers." The final important characteristic of the mean is that all the scores in a distribution are included in its calculation ("to find the mean, add up all the scores and divide by N") this characteristic is an advantage because the mean utilizes all the available information. On the other hand, when a distribution has outliers or some extremely high or low scores, the mean may become misleading: It may not represent the central or typical score. The point to remember is that the mean will be pulled in the direction of the outlying scores relative to the median. With a positive skew, the mean will be greater in value than the median, and just the opposite will occur with a negative skew. Why is this problematic? Because the median uses only the middle cases, it will always reflect the center of the distribution. The mean, because it uses all cases (including outliers), may be much higher or lower than the bulk of the scores and give a false impression of centrality.

dispersion

The amount of variety, or heterogeneity, in a distribution of scores.

descriptive statistics

The branch of statistics concerned with (1) summarizing the distribution of a single variable (univariate) or (2) measuring the relationship between two or more variables (bivariate or multivariate).

inferential statistics

The branch of statistics concerned with making generalizations from samples to populations. social scientists almost never have the resources or time to test every case in a population so they use this This class of techniques involves using information from samples to make inferences about population.

class intervals

The categories used in the frequency distributions for interval-ratio variables.

real class limits

The class intervals of a frequency distribution when stated as continuous categories. .5 before the number

stated class limits

The class intervals of a frequency distribution when stated as discrete categories. separated by a distance of one unit

deviations

The distance between the score and the mean. The value of the deviations will increase as the differences between the scores and the mean increase. If the scores are more clustered around the mean the deviations will be small. If the scores are more spread out, or more varied the deviations will be larger. the sum of the deviations is a logical basis for a measure of dispersion, and statisticians have developed a way around the fact that the positive deviations always equal the negative deviations. If we square each of the deviations, all values will be positive because a negative number multiplied by itself becomes positive. Thus, a statistic based on the sum of the squared deviations will have the properties we want in a good measure of dispersion. . The sum of the squared deviations will increase with sample size: The larger the number of scores, the greater the value of the measure. This would make it very difficult to compare the relative variability of distributions based on samples of different sizes. We can solve this problem by dividing the sum of the squared deviations by N (sample size) and thus standardizing for samples of different sizes.

interquartile range (Q)

The distance from the third quartile to the first quartile. Q= Q3-Q1 It avoids some of the problems associated with R by considering only the middle 50% of the cases in a distribution. To find Q, first arrange the scores from highest to lowest and then divide the distribution into quarters (as distinct from halves when locating the median). The first quartile is the point below which 25% of the cases fall and above which 75% of the cases fall. The second quartile divides the distribution into halves (thus, is equal to the median). The third quartile is the point below which 75% of the cases fall and above which 25% of the cases fall. The interquartile range essentially extracts the middle 50% of the distribution and, like R, is based on only two scores. Q is interpreted in the same way as R: The greater its value, the greater the dispersion. Unlike the range, Q avoids the problem of being based on the most extreme scores, but it also fails to yield any information about the variation of the scores other than the two on which it is based.

skew

The extent to which a distribution of scores has a few scores that are extremely high (positive skew) or extremely low (negative skew).

meaning of measure dispersion is expressed in three ways:

The first and most important involves the normal curve. A second way of thinking about the standard deviation is as an index of dispersion that increases in value as the distribution becomes more variable. In other words, the standard deviation is higher for more diverse distributions and lower for less diverse distributions. A third way to get a feel for the meaning of the standard deviation is by comparing one distribution with another. You might also do this when comparing one group with another or the same variable at two different times.

range (R)

The highest score minus the lowest score. R= High Score - Low Score The range is easy to calculate and is perhaps most useful as a quick and general indicator of variability The statistic is also easy to interpret: The greater the value of the range, the greater the distance from high to low score, and the greater the dispersion in the distribution.

level of measurement

The mathematical characteristic of a variable and the major criterion for selecting statistical techniques. Variables can be measured at any of three levels, each permitting certain mathematical operations and statistical techniques. Level of measurement is crucial because our statistical analysis must match the mathematical characteristics of our variables. three levels: nominal, ordinal, interval-ratio

Univariate Descriptive Statistics

percentages, averages, and graphs

percentage

The number of cases in a category of a variable divided by the number of cases in all categories of the variable, the entire quantity multiplied by 100. Percentages are extremely useful statistics because they supply a frame of reference by standardizing the raw frequencies to the base 100. The mathematical definition of a percentage is percentage%=(f/n)*100 Percentages are easier to read and comprehend than raw frequencies, and a column for percentages is commonly added to frequency distributions for variables at all levels of measurement. The preference for percentages is based solely on ease of communication

ratios

The number of cases in one category divided by the number of cases in some other category. computed by dividing the frequency in one category by the frequency in another ratio=f1/f2 Ratios are especially useful for comparing the relative sizes of different categories of a variable Ratios can be very economical ways of expressing relative size ratios tell us exactly how much one category outnumbers the other Ratios are often multiplied by some power of 10 to eliminate decimal points To ensure clarity, the comparison units for the ratio are often expressed as well.

proportions

The number of cases in one category of a variable divided by the number of cases in all categories of the variable. Proportions vary from 0.00 to 1.00: They standardize results to a base of 1.00 instead of to the base of 100 used for percentages. A proportion is the same as a percentage except that we do not multiply by 100. proportion=(f/n) proportions are used less frequently, generally when we are working with probabilities

Finding the Total Area above and below a Score

The plus sign of the Z score indicates that the score should be placed above (to the right of) the mean. To find the area below a positive Z score, the area between the score and the mean (see column b) must be added to the area below the mean. As we noted earlier, the normal curve is symmetrical (unskewed) and its mean will be equal to its median. Therefore, the area below the mean (just like the median) will be 50%. To find the area below a negative score, we use the right-hand column or "Area Beyond Z." The area in which we are interested is depicted in Figure 5.5, and we must determine the size of the shaded area

midpoints

The point exactly halfway between the upper and lower limits of a class interval. can be found by dividing the sum of the upper and lower limits by 2.

median

The point in a distribution of scores above and below which exactly half of the cases fall. always at the exact center of a distribution of scores With an even number of cases, however, there will be two middle cases and, in this situation, the median is defined as the score exactly halfway between the scores of the two middle cases. The median cannot be calculated for variables measured at the nominal level because it requires that scores be ranked from high to low, and nominal-level variables cannot be ordered or ranked. The median can be found for either ordinal or interval-ratio data but is generally more appropriate for the former. the median belongs to a class of statistics that measure position or location it is sometimes useful to locate other points as well: the scores that split the distribution into thirds or fourths, or the point below which a certain percentage of the cases fall (standardized tests)

interval-ratio level of measurement

The scores of variables measured at the interval-ratio level are actual numbers that can be analyzed with all possible statistical techniques. This means that we can add or multiply the scores, compute averages or square roots, or perform any other mathematical operation. interval-ratio-level variables have equal intervals from score to score interval-ratio variables have true zero points.

standard deviation

The statistic computed by summing the squared deviations of the scores around the mean, dividing by N, and, finally, taking the square root of the result. The most important and useful descriptive measure of dispersion; s represents the standard deviation of a sample; o2 represents the standard deviation of a population

variance

The sum of the squared deviations of the scores around the mean, divided by N. A measure of dispersion used primarily in inferential statistics and also in correlation and regression techniques; s2 represents the variance of a sample; o2 represents the variance of a population. The variance is used primarily in inferential statistics, although it is a central concept in the design of some measures of association

population

The total collection of all cases in which the researcher is interested. Populations can theoretically range from enormous ("all humanity") to quite small (all sophomores on your campus) but are usually fairly large.

using graphs to present data

These devices are particularly useful for communicating the overall shape of a distribution and for highlighting any clustering of cases in a particular range of scores. pie charts and bar charts, are appropriate for discrete variables at any level of measurement histograms and line charts (or frequency polygons), are used with both discrete and continuous interval-ratio variables but are particularly appropriate for the latter

Using the Normal Curve to Estimate Probabilities

To estimate the probability of an event, we must first be able to define what would constitute a "success." To determine a probability, a fraction must be established, with the numerator equaling the number of events that would constitute a success and the denominator equaling the total number of possible events where a success could theoretically occur: probability = # successes/#events In the social sciences, probabilities are usually expressed as proportions probabilities have an exact meaning: Over the long run, the events that we define as successes will bear a certain proportional relationship to the total number of events Like proportions, probabilities range from 0.00 (meaning that the event has absolutely no chance of occurrence) to 1.00 (a certainty). As the value of the probability increases, the likelihood that the defined event will occur also increases.

a good measure of dispersion should:

Use all the scores in the distribution. The statistic should use all the information available. Describe the average or typical deviation of the scores. The statistic should give us an idea about how far the scores are from each other or from the center of the distribution. Increase in value as the scores became more diverse. This would be a very handy feature when comparing distributions because it would permit us to tell at a glance which was more variable: The higher the numerical value of the statistic, the greater the dispersion.

nominal level of measurement

Variables measured at the nominal level (such as gender) have non-numerical "scores" or categories Statistical analysis with nominal-level variables is limited to comparing the relative sizes of the categories Nominal variables are rudimentary, but there are criteria that we need to observe in order to measure them adequately

ordinal level of measurement

Variables measured at the ordinal level are more sophisticated than nominal-level variables. They have scores or categories that can be ranked from high to low, so, in addition to classifying cases into categories, we can describe the categories in terms of "more or less" with respect to each other. The major limitation of the ordinal level of measurement is that the scores have no absolute or objective meaning: They only represent position with respect to other scores. Our options for statistical analysis with ordinal-level variables are limited by the fact that we don't know the exact distances from score to score. statistics such as average or mean are not permitted The most sophisticated mathematical operation fully justified with an ordinal variable is ranking categories and cases

Finding Areas between Two Scores

When the scores are on opposite sides of the mean, the area between them can be found by adding the areas between each score and the mean. When the scores of interest are on the same side of the mean, a different procedure must be followed to determine the area between them. To find the area between two scores on the same side of the mean, find the area between each score and the mean, and then subtract the smaller area from the larger. The same technique would be followed if both scores had been below the mean.

Choosing a Measure of Central Tendency

You should consider two main criteria when choosing a measure of central tendency. First, make sure that you know the level of measurement of the variable in question. This will generally tell you whether you should report the mode, median, or mean. Second, consider the definitions of the three measures of central tendency and remember that they provide different types of information. They will be the same value only under certain specific conditions (namely, for symmetrical distributions with one mode), and each has its own message to report. In many circumstances, you might want to report all three.

normal curve table

a detailed description of the area between a Z score and the mean of any standardized normal distribution. The normal curve table consists of three columns, with Z scores in the left-hand column (column a), areas between the Z score and the mean in the middle (column b), and areas beyond the Z score in the right-hand column (column c). To find the area between any Z score and the mean, go down the Z-score column until you find the score The table presents areas in the form of proportions, but we can easily translate these into percentages by multiplying them by 100 The third column in the table presents "Areas Beyond Z." These are areas above positive scores or below negative scores. Remember that the areas in Appendix A will be the same for Z scores of the same numerical value regardless of sign.

probability and the normal curve

allows us to estimate the likelihood of selecting a case that has a score within a certain range. Normally, we would next establish a fraction with the numerator equal to the number of subjects with scores in the defined range and the denominator equal to the total number of subjects. However, if the empirical distribution is normal in form, we can skip this step, because the probabilities, in proportion form The probability is very high that any case randomly selected from a normal distribution will have a score close in value to the mean. The shape of the normal curve is such that most cases are clustered around the mean and decline in frequency as we move farther away—either to the right or to the left—from the mean value. In fact, given what we know about the normal curve, the probability that a randomly selected case will have a score within ±1 standard deviation of the mean is 0.6826. Rounding off, we can say that 68 out of 100 cases—or a little more than two-thirds of all cases—selected over the long run will have a score between ±1 standard deviation, or Z score, of the mean. The probabilities are high that any randomly selected case will have a score close in value to the mean. In contrast, the probability of the case having a score beyond 3 standard deviations from the mean is very low The general point to remember is that cases with scores close to the mean are common, and cases with scores that are far above or below the mean are rare.The general point to remember is that cases with scores close to the mean are common, and cases with scores that are far above or below the mean are rare.n

mean

arithmetic average it is by far the most commonly used measure of central tendency It reports the average score of a distribution, and it is calculated by dividing the sum of the scores by the number of scores (N). Because computation of the mean requires addition and division, it should be used with variables measured at the interval-ratio level researchers do calculate the mean for variables measured at the ordinal level, because the mean is much more flexible than the median and is a central feature of many interesting and powerful advanced statistical techniques. Thus, if the researcher plans to do any more than merely describe his or her data, the mean will probably be the preferred measure of central tendency even for ordinal-level variables.

Frequency distributions for ordinal-level variables

constructed in the same way as for nominal-level variables

Bivariate and Multivariate Descriptive Statistics

designed to help us understand the relationship between two or more variables We can use these statistics to investigate two matters of central theoretical and practical importance to any science: causation and prediction

three commonly used measures of central tendency

mode, median, and mean They summarize a distribution of scores by describing the most common score (the mode), the score of the middle case (the median), or the average score (the mean) of that distribution they can reduce huge arrays of data to a single, easily understood number Even though they share a common purpose, the three measures of central tendency are different statistics and they will have the same value only under certain conditions. They vary in terms of level of measurement and, perhaps more importantly, in terms of how they define central tendency. They will not necessarily identify the same score or case as "typical." Thus, your choice of an appropriate measure of central tendency will depend, in part, on what exact information you want to convey.

Frequency Distributions for Interval-Ratio-Level Variables

more complex than for nominal and ordinal variables Interval-ratio variables usually have a wide range of scores, and this means that the researcher must collapse or group categories to produce reasonably compact tables.

Ratios, rates, and percentage change

statistics that are used to summarize results simply and clearly They may be used independently or with frequency distributions, and they may be computed for variables at any level of measurement

(uppercase Greek letter sigma)

the summation of

open-ended intervals

the table omits information about the exact scores included in the open-ended interval—so this technique should not be used indiscriminately.

frequency distribution and graphs benefits

they summarize the overall shape of a distribution of scores in a way that can be quickly comprehended two additional kinds of statistics are extremely useful: some idea of the typical or average case in the distribution and some idea of how much variety there is in the distribution

Properties of a normal curve

unimodal (i.e., has a single mode, or peak), perfectly smooth, and symmetrical (unskewed), so its mean, median, and mode are all exactly the same value bell-shaped, and its tails extend infinitely in both directions The crucial point about the normal curve is that distances along the horizontal axis, when measured in standard deviations from the mean, always encompass the same proportion of the total area under the curve. In other words, the distance from any point to the mean—when measured in standard deviations—will cut off exactly the same part of the area under the curve. on any normal curve, distances along the horizontal axis, when measured in standard deviations, always encompass exactly the same proportion of the total area under the curve. the distance between one standard deviation above the mean and one standard deviation below the mean (or ±1 standard deviation) encompasses exactly 68.26% of the total area under the curve On any normal distribution, 68.26% of the total area will always fall between ±1 standard deviation, regardless of the trait being measured and the numerical values of the mean and standard deviation. We can describe empirical distributions that are at least approximately normal using these relationships between distance from the mean and area. The position of individual scores can be described with respect to the mean, the distribution as a whole, or any other score in the distribution.

boxplot or a box and whiskers plot

visual representation of dispersion A graph that presents information about the central tendency and dispersion of a variable. provides a helpful way to visualize and analyze dispersion and gives us an opportunity to apply some of our growing array of statistical tools Boxplots use the median, range (R), and interquartile range (Q) to depict both central tendency and variability. They also display any outliers or extreme scores that might be included in the distribution. The box stretches from the 3rd quartile at the top to the 1st quartile at the bottom . Thus, the height of the box reflects the value of Q or the interquartile range. The horizontal line through the box is drawn at the median (Md). The T-shaped lines (or "whiskers") reflect the range of the scores. They extend 1.5 times the height of the box or to the high and low scores, whichever is closer to the box. In a boxplot, an "outlier" is defined as any score outside the "whiskers," or any score beyond 1.5 times the height of the box in either direction. Scores that are beyond three times the height of the box in either direction are called "extreme outliers." Boxplots are especially useful when we want to compare the distribution of a variable across different conditions or times. boxplots provide useful visual and analytical information about dispersion and central tendency. Like R, Q, and s, boxplots are most useful when comparing variables across different conditions.


Conjuntos de estudio relacionados

Introduction to Joint Multi-TDL Network (MTN) Operations JT101 (FOUO) (Link-16 US Members Student Course)(20 hrs)

View Set

ANTH 1003 Week 9 *Quiz Questions* (Auburn University)

View Set

General Insurance Concepts - Exam

View Set