Planning Quiz 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Individual

An object described by data

Boxplot

Another graphical option for quantitative data is a boxplot (sometimes called a box-and-whisker plot) Boxplots provide a quick summary of the center and variability of a distribution Boxplots do not display each individual value in a distribution. And boxplots don't show GAPS, CLUSTERS, or PEAKS Boxplots are especially effective for comparing the distribution of a quantitative variable in two or more groups A boxplot summarizes a distribution by displaying the location of 5 important values within the distribution, known as its 5-NUMBER SUMMARY A boxplot is a visual representation of the five-number summary If there are no outliers you can draw the whiskers to the Max and Min data values

Key Characteristics of a Data Set

-Who? What cases do the data describe? How many cases does a data set have? -What? How many variables does the data set have? What are the exact definitions of these variables? What are the units of measurement for each quantitative variable? -Why? What purpose do the data have? Do the data contain the information needed to answer the questions of interest?

Displaying distributions with graphs

1.) Exploratory Data Analysis 2.)Graphs for categorical variables -Bar graphs -Pie charts 3.)Graphs for quantitative variables -Histograms -Stem plots In any graph, look for the overall pattern and for clear departures from that pattern.

Examining distributions 2

A distribution is SYTEMMATIC if the right and left sides of the graph are approximately mirror images of each other A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side.

Examining distributions

In any graph of data, look for the OCERALL PATTERN and for striking DEVIATIONS from that pattern. You can describe the overall pattern by its SHAPE, CENTER, AND SPREAD An important kind of deviation is an outlier, an individual that falls outside the overall pattern

To construct a stem plot

Separate each observation into a stem(all but the rightmost digit) and a leaf(the remaining digit). Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired

Frequency Table

Shows the count of individuals having each data value

Relative Frequency Table

Shows the proportion or percent of individuals having each data value

Quantitative Variable

Takes numerical values for which arithmetic operations make sense You can use a dotplot, stemplot, or histogram to display the distribution of a quantitative variable. The distribution of a quantitative variable tells us what values the variable takes on and how often it takes those values. You can use a bar chart or pie chart to display categorical data. A dot plot is the simplest graph for displaying quantitative data You can describe the overall pattern of a distribution by its shape, center, and variability. An important kind of departure is an outlier, a value that falls outside the overall pattern.

Standard Deviation

The standard deviation measures the typical distance of the values in a distribution from the mean Sx measures variation about the mean To find the standard deviation (Sx) of a quantitative data set with n values: 1. Find the mean of the distribution. 2. Calculate the deviation of each value from the mean- deviation = value − mean 3. Square each of the deviations. 4. Add all the squared deviations, divide by n − 1, and take the square root. If we summarize the center of a distribution with the mean, we should use the standard deviation to describe the variation of data values around the mean

Distribution of a Variable

To examine a single variable, we graphically display its distribution The distribution of a variable tells us what values it takes and how often it takes these values. Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable.

3rd Quartile (Q3)

the median of the data values that are to the right of the median in the ordered list

1st Quartile (Q1)

the median of the data values that are to the left of the median in the ordered list

Cases

the objects described by a set of data. Cases may be customers, companies, subjects in a study, units in an experiment, or other objects.

Statistics

the science and art of collecting, analyzing, and drawing conclusions from data

Measuring Center cont...

A measure of center (or variability) is resistant if it isn't influenced by unusually large or unusually small values in a distribution The mean is not resistant to outliers. The median is resistant to outliers (robust) Which measure—the mean or the median—should we report as the center of a distribution? That depends on both the shape of the distribution and whether there are any outliers. a distribution of quantitative data is roughly symmetric and has no outliers, use the mean to measure center. • If the distribution is strongly skewed or has outliers, use the median to measure center. If a distribution of quantitative data is roughly symmetric and has no outliers, use the mean to measure center. If the distribution is strongly skewed or has outliers, use the median to measure center.

Standardized Score (Z-Score)

A percentile is one way to describe the location of an individual in a distribution of quantitative data Another way is to give the standardized score (z-score) for the observed value The standardized score (z-score) for an individual value in a distribution tells us how many standard deviations from the mean the value falls, and in what direction To find the standardized score (Z-score), compute Z= (Value-Mean) / (Standard Deviation) -Values larger than the mean have positive Z scores -Values smaller than the mean have negative Z scores

Statistical Problem-Solving Process

Ask Questions: Clarify the research problem and ask one or more valid statistics questions Collect Data: Design and carry out an appropriate plan to collect the data Analyze Data: Use appropriate graphical and numerical methods to analyze the data. Interpret Results: Draw conclusions based on the data analysis. Be sure to answer the research question(s)

Exploratory Data Analysis

Begin by examining each variable by itself. Then move on to study the relationships among the variables. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

Measuring Variability

Being able to describe the shape and center of a distribution is a great start. However, two distributions can have the same shape and center, but still look quite different 2 distributions cam be symmetric and single-peaked, with similar centers. But the variability of these two distributions can also differ

Properties of the standard deviation as a measure of variability

Sx is always greater than or equal to 0. Sx = 0 only when there is no variability, that is, when all values in a distribution are the same Larger values of Sx indicate greater variation from the mean of a distribution Sx is NOT resistant. The use of squared deviations makes Sx even more sensitive than x to extreme values in a distribution Sx measures variation about the mean. It should be used only when the mean is chosen as the measure of center

Histogram cont...

Histograms can be used to compare the distribution of a quantitative variable in two or more groups. t's a good idea to use RELATIVE FREQUENCIES (percents or proportions) when comparing, especially if the groups have different sizes. Be sure to use the same intervals when making comparative histograms so the graphs can be drawn using a common horizontal axis scale.

Comparing Mean and Median

The mean and median of a roughly SYMMETRIC distribution are close together If the distribution is exactly symmetric, the mean and median are exactly the same In a skewed distribution, the mean is usually farther out in the long tail than is the median. The mean and median measure center in different ways, and both are useful.

Choosing measures of center and variability

The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers Use the mean and Sx only for roughly symmetric distributions that don't have outliers

How to make a boxplot.

The median and quartiles divide the distribution roughly into quarters This leads to a new way to display quantitative data, the boxplot 1.)Draw and label a number line that includes the range of the distribution 2.)Draw a central box from Q1 to Q3 3.)Note the median M inside the box 4.)Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers.

Quartiles

The quartiles of a distribution divide the ordered data set into four groups having roughly the same number of values. the values that divide the distribution into four groups of roughly equal size To find the quartiles, arrange the data values from smallest to largest and find the median -Q1 lies one-quarter of the way up the list -Q2 is the median, halfway up the list -Q3 lies three-quarters of the way up the list

Range

The range of a distribution is the distance between the minimum value and the maximum value. Range= Max - Min The range is NOT a resistant measure of variability. It depends only on the maximum and minimum values, which may be outliers.

Measuring Center

The two most common ways to measure center are the median and the mean The most common measure of center is the mean

Variability

There are several ways to measure the variability of a distribution The 3 most common are -range -interquartile range -standard deviation We can avoid the impact of extreme values on our measure of variability by focusing on the middle of the distribution. (Order the data values from smallest to largest) Find the quartiles

Cumulative Relative Frequency Graph

There are some interesting graphs that can be made with percentiles. One of the most common starts with a Frequency table for a Quantitative variable and expands it to include cumulative frequency and cumulative relative frequency A cumulative relative frequency graph plots a point corresponding to the cumulative relative frequency in each interval at the smallest value of the next interval, starting with a point at a height of 0% at the smallest value of the first interval. Consecutive points are then connected with a line segment to form the graph A cumulative relative frequency graph can be used to describe the position of an individual within a distribution or to locate a specified percentile of the distribution

Splitting Stems

We can get a better picture of the data by splitting stems. Now we can see the shape of the distribution more clearly You can use a back-to-back stem plot with common stems to compare the distribution of a quantitative variable in two groups. The leaves on each side are placed in order on each side of the common stem.

Variable

a special characteristic of a case (Characteristic of the individual) We can summarize a variable's distribution with a Frequency Table or a Relative Frequency Table Different cases can have different values of a variable We construct a set of data by first deciding which cases or units we want to study. For each case, we record information about characteristics that we call variables

Label

a special variable used in some data sets to distinguish the different cases.

Outlier

a value that falls outside the overall pattern. observations that lie outside the overall pattern of a distribution

Interquartile Range (IQR)

measures the variability in the middle half of the distribution the distance between the 1st and 3rd quartiles of a distribution. In symbols, IQR = Q3 - Q1 Ex. Q1 = 5.5 days, Q3 = 21.5 days, so IQR = 16 days The quartiles and the interquartile range are resistant because they are not affected by a few extreme values In addition to serving as a measure of spread, the IQR is used as part of a rule of thumb for identifying outliers Besides serving as a measure of variability, the interquartile range (IQR) is used as a ruler for identifying outliers.

Bar Graph

represent categories as bars whose heights show the category counts or percents.

Stem plots

separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. A stem plot shows each data value separated into two parts: -a stem, which consists of all but the final digit -a leaf, the final digit The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems Another simple type of graph for displaying quantitative data is a stemplot(also called a stem-and-leaf plot).

Pie Chart

show the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percentsfor the categories.

Histogram

show the distribution of a quantitative variable by using bars. The height of a bar represents the number of individuals whose values fall within the corresponding class. shows each interval as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval We often get a cleaner picture of the distribution by grouping nearby values For large data sets, this can make it difficult to see the overall pattern in the graph Histograms with more intervals show more detail but may have a less clear overall pattern The choice of intervals in a histogram can affect the appearance of a distribution

Time plot

shows behavior over time. Time is always on the horizontal axis, and the variable being measured is on the vertical axis. Look for an overall pattern (trend) and deviations from this trend. Connecting the data points by lines may emphasize this trend. Look for patterns that repeat at known regular intervals (seasonal variations).

Dot Plot

shows each data value as a dot above its location on a number line

Median

the midpoint of a distribution, the number such that about half the observations are smaller and about half are larger The median is resistant to outliers (robust) We could report the value in the "middle" of a distribution as its center. That's the idea of the median To find the median, arrange the data values from smallest to largest. If the number n of data values is odd, the median is the middle value in the ordered list. If the number n of data values is even, the median is the average of the two middle values in the ordered list

Mean

x (pronounced "x-bar") of a quantitative set of data is the average of all n data values Sum of all data values / n The mean is not resistant to outliers In mathematics, the capital Greek letter sigma, Σ, is short for "add them all up." Therefore, the formula for the mean can be written in more compact notation

5-Number Summary (Boxplot)

The five-number summary of a distribution of quantitative data consists of the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest (Min-Q1-M-Q3-Max) The minimum and maximum values alone tell us little about the distribution as a whole Likewise, the median and quartiles tell us little about the tails of a distribution To get a quick summary of both center and spread, combine all 5 numbers

The 1.5 x IQR Rule

Call an observation an outlier if it falls more than 1.5 × IQR above Q3 or below Q1 Low outliers < Q1 - 1.5 x IQR High outliers > Q3 + 1.5 x IQR

Percentile

Describes location in a distribution (One way to describe the distribution is to calculate a percentile) An individual's percentile is the percent of values in a distribution that are less than the individual's data value Ex. Because 21 of the 25 observations (84%) are below her score, Jenny is at the 84th percentile in the class's test score distribution. Be careful with your language when describing percentiles. Percentiles are specific locations in a distribution, so an observation isn't "IN" the 84th percentile. Rather, it is "AT" the 84th percentile

large datasets and/or quantitative variables that take many value

Divide the possible values into CLASSES or intervals of equal widths. Count how many observations fall into each interval. Instead of counts, one may also use percents. Draw a picture representing the distribution, each bar height is equal to the number (or percent) of observations in its interval

Categorical Variable

Places individual into one of several groups or categories The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into each category.

Distribution

The distribution of a variable tells us what values the variable takes and how often it takes these values A variable generally takes values that vary from one individual to another. The distribution of a variable describes the pattern of variation of these values.


Ensembles d'études connexes

GRADE 8-UNIT 7 (PART 1 w/ 10 words)

View Set

Chinese Philosophies-- Buddhism, Confucianism, Daoism and Legalism

View Set