statistics mod 4 part 1

Ace your homework & exams now with Quizwiz!

Stem plots, also called stem-and-leaf plots, are another way to show a data set and its distribution or shape.

A stem plot is constructed by separating each data value into a stem (usually the left-most digit) and a leaf (usually the right-most digit). For example, if 36 is a data point, the 3 would be the stem and the 6 would be the leaf. The data is arranged in two columns, stems and leaves, with a vertical line separating the columns.

Using this information we can now calculate the values that 68% of the data will fall between:

Answer: 68% of the values will fall between 267 and 293 days.

Are there any outliers in this data set? Enter your answer is "yes" or "no".

Any values that are greater than 233, or less than 25, are outliers. As there are no data values that meet either of those criteria, there are no outliers in this data set.

The Standard Deviation (Empirical) Rule

Approximately 68% of all values are within 1 standard deviation of the mean Approximately 95% of all values are within 2 standard deviations of the mean Approximately 99.7% of all values are within 3 standard deviations of the mean From the Standard Deviation Rule, we can calculate all of the parts of the bell-curve. It is important to memorize the Standard Deviation Rule. You should also know how to calculate the other percentages, or you can memorize those.

Distributions are often categorized into two different type

Data Distributions: Symmetry versus Skewness

Find the interquartile range ( IQR ) of the following data set: {1, 1, 7, 8, 17, 18, 21, 27, 42, 45, 45, 46, 46, 54, 57, 78}

IQR=Q3−Q1=46−12.5=33.5 .

Measures of spread* (variability),

Measures of spread* (variability), including the range and the standard deviation, can tell you this. As previously discussed, the mean, median, and mode describe the center of a data set. Similarly, we use the range and the standard deviation to describe the spread of a data set.

Take a look at the histogram below, which presents a range of test scores, and decipher which bar in the graph displays the mode.

Notice how the bar above Score of 90% is the tallest. With the given data, we can assume 12 students scored a 90% on the test—more than any other test score achieved. Therefore, the mode of this data would be Scores of 90% because that value appears most often.

measure of central tendency*

Numerical summaries can be used to define a measure of central tendency*, or the measure of the center of a data set. Measures of the center of a data set are the mean*, median*, and mode*.

What is the value above which any data values are outliers?

Outliers are defined as any points that are more than 1.5× IQR above Q3 or below Q1. To find the value above which any data values are outliers, multiply the interquartile range (IQR) by 1.5. (IQR) ×1.5=52×1.5=78. Add 78 to the third quartile =(Q3)+78=155+78=233.

What is the interquartile range ( IQR ) of this data set?

Subtract the first quartile (Q1) from third Quartile (Q3) to determine the interquartile range, or (IQR) . (Q3)−(Q1)=155−103=52.

Refer to the stem plot above. What is the most frequent age represented in this data set?

The age that was most frequent in this sample is 18. The stem 1 has four leaves with a value of 8. No other value has four leaves

In a population, suppose that: the mean resting body temperature is 98.6 degrees and the standard deviation is .8 degrees. Assuming a normal distribution: The temperatures that fall within two standard deviations from the mean will range from __________ to _____________ ? 97.8 degrees to 99.4 degrees 97 degrees to 100.2 degrees 97.8 degrees to 100.2 degrees 97 degrees to 99.4 degrees

We know that the standard deviation is .8 degrees, so two standard deviations is equal to 1.6 degrees (.8×2=1.6). To obtain the values that are two standard deviations above and below the mean, we do the following: To obtain the first value, we will subtract 1.6 degrees from the mean: 98.6 degrees − 1.6 degrees = 97 degrees. To obtain the second value we will add 1.6 degrees to the mean: 98.6 degrees + 1.6 degrees = 100.2 degrees. Therefore, the temperatures that will fall within 2 standard deviations of the mean will range from 97 degrees to 100.2 degrees.

data set

data set is any collection of numerical values, such as measurements, observations, or survey responses. For example, if we measure the heights (in centimeters) of ten randomly selected people, we could have the following data set:

dentify the five-number summary for the following data set: { 10, 50, 14, 49, 81}

a. The minimum of the data-set is 10 , the first quartile is 12 , the median is 49 , the third quartile is 65.5 , and the maximum is 81 .

The difference between both of these values and the mean is equal to 26 . Next, we divide this value by our standard deviation of 13 . The result is equal to 2 . Therefore, 254 and 306 fall two standard deviations away from our mean of 280 .

.

Refer to the dot plot above. What is the minimum value in the data set?

10

Refer to the dot plot above. How many patients recorded a cholesterol level of 60mg/dl ?

5

Refer to the dot plot above. What is the maximum value in this data set?

90

In a population, suppose that: the mean resting body temperature is 98.6 degrees and the standard deviation is .8 degrees. Assuming a normal distribution: What percent of this population would you expect to have a temperature below 96.2 ?

96.2 degrees is 2.4 degrees lower than the mean temperature of 98.6. (98.6−96.2=2.4). We know that the standard deviation is .8 degrees, so dividing the temperature difference of 2.4 degrees by .8, we get a value of 3. This tells us that 96.2 degrees is three standard deviations lower than the mean. Using the Standard Deviation Rule, we know that 0.15% of the population falls below three standard deviations from the mean. Therefore, the percent of the population that we would expect to have a temperature below 96.2 degrees is .15%.

bell curve

A bell curve, or normal distribution* has a very specific distribution. Normal distributions will be discussed later in this module. For now, it is important to note that just because a histogram is symmetric does not make it normal!

Now that we know that these two values are two standard deviations from the mean of 280 days, we can use the Standard Deviation Rule to determine what percentage of pregnancies will last between 254 and 306 days.

As illustrated in the distribution curve above, the Standard Deviation Rule states that 13.5% of the data falls between one and two standard deviations from the mean. Knowing this information, we can now calculate the percentage of data that falls between two standard deviations (or between 254 and 306 days) from the mean by adding up the percentage values in the shaded areas of the curve. add the percentage values 13.5 + 34 + 34 + 13.5 = 95% Answer: 95% of the values will fall between 254 and 306 days. In other words, the Standard Deviation Rule tells us that 95% of all data will fall between two standard deviations from the mean.

c. What is the median of the set of values above?

Correct. The median is the middle value of a dataset. After reordering the above values in order from lowest to greatest, the two middle values are 21 and 25. So, the equation reads (21+25)÷2=(46)÷2=23. numbers must be inorder to do median

The data set below displays the LDL cholesterol levels for 19 subjects. To complete the exercise, fill in the blanks with the corresponding answer. Enter only numerical digit(s) such as " 5 " in the blank. What is second quartile ( Q2 ) of this data set?

First arrange the data from least to greatest: 78,85,87,90,103,104,113,114,117, 117,123,135,145,154,155,156,159,170,179. Find the median, or mid-point, of the data set: 78,85,87,90,103,104,113,114,117 | 117 | 123,135,,145,154,155,156,159,170,179. As there are an odd number of data points in this data set, the median is 117. The median, also called the 2nd quartile (Q2), is equal to 117.

Refer to the histogram above. How many people have heights greater than 170 cm?

From the histogram we can determine that there are 6 adults that have a height between 170 cm and 179 cm and there are 2 adults with heights between 180 cm and 189 cm. Therefore, there are 8 total adults with heights greater than 170 cm.

What is the value below which any data values are outliers?

Outliers are defined as any points that are more than 1.5× IQR above Q3 or below Q1. To find the value below which any data values are outliers, multiply the interquartile range (IQR) by 1.5. (IQR) ×1.5=52×1.5=78. Subtract 78 from the first quartile =(Q1)−78=103−78=25.

The 1.5 IQR Criterion: Identifying Outliers

Outliers are defined to be any points that are more than 1.5 × IQR above Q3 or below Q1 . This rule that is used to identify outliers in a data set is called the 1.5 IQR Criterion Rule . The following graphic illustrates how outliers fall either to the left of Q1 − 1.5 IQR , or to the right of Q3 + 1.5 IQR .

Find the second quartile ( Q2 ) for the following data set: {41, 76, 16, 8}

Q2 , also known as the median, is the midpoint of the data set. Here, with an even number of values, Q2 falls midway between 16 and 41 , which is 28.5 .

Stem Plot* (Stem-and-leaf Plot)

Quantitative data useful to display The distribution or shape of data according to place values.

Which of the following most accurately describes a quartile? a) The difference between Q3 and Q1. b) Four equally sized groups of numbers within a set. c) Values that provide a cut-off between four equally sized groups of numbers within a set. d) There are four in a set: the minimum, the mean, the median, and the maximum.

Quartiles are values that divide a set into four equally sized groups. c

What is the chance that the pregnancy will last longer than 306 days?

Since we know that 100% of the data is included in the bell curve and that the curve is symmetric, we can also find the percentages for parts of the curve. From the previous question, we know that 306 is two standard deviations from the mean. To answer we want to calculate the percentages of the parts of the curve that are greater than two standard deviations from the mean. The percentage values under the curve that are greater than two standard deviations from the mean are illustrated in the distribution curve below. add the percentages 2.35 + 0.15 = 2.5 Answer: 2.5% of pregnancies will last longer than 306 days.

Follow these steps to find the interquartile range of a data set.

Steps to Find the Interquartile Range Put the data set in order from least to greatest. 1,1,2,3,4,4,5,6,7,9,9,9 Find the median, or midpoint, of the data set. This can also be called the second quartile ( Q2 ). 1,1,2,3,4,4 | 5,6,7,9,9,9 Q2=4.5 Identify the median of the lower half of the data set and label it as Q1 (the first quartile). 1,1,2 | 3,4,4 | 5,6,7,9,9,9 In this case, the median of the lower half of the data set is midway between 2 and 3 , which averages to 2.5 . Q1=2.5 Identify the median of the upper half of the data set and label it as Q3 (the third quartile). 1,1,2 | 3,4,4 | 5,6,7 | 9,9,9 In this case, the median of the upper half of the data set is midway between 7 and 9 , which averages to 8 . Q3=8 Subtract Q1 from Q3 to determine the interquartile range, or IQR. 1,1,2, | 3,4,4, | 5,6,7, | 9,9,9 IQR=Q3 − Q1 Q3=8 Q1=2.5 IQR =8−2.5=5.5 IQR =5.5

Steps to Identify Outliers in a Data set

Steps to Identify Outliers in a Data set Recall the data set from the previous example, placed in order from least to greatest. 1,1,2,3,4,4,5,6,7,9,9,9 The interquartile range, or IQR , for this data set is: Q3=8 Q1=2.5 IQR= Q3 − Q1=8−2.5=5.5 IQR=5.5 Multiply the IQR by 1.5 . IQR×1.5= 5.5×1.5=8.25 Add the result to Q3 : Q3=8 8+8.25=16.25 Any values greater than this number, 16.25 , are outliers. Subtract the result from Q1 : Q1=2.5 2.5−8.25=−5.75 Any values less than this number, −5.75 , are outliers. Review the data set: 1,1,2,3,4,4,5,6,7,9,9,9 Given there are no data points less than −5.75 , or greater than 16.25 , this data set does not contain any outliers.

How would you describe the distribution above: A. Bimodal B. Skewed left C. Skewed right D. Symmetric

The answer is a. The distribution above is bimodal.

Refer to the histogram above. This histogram represents data that is: a. Skewed left b. Skewed right c. Uniformly distributed d. Symmetrically distributed

The answer is d. Based on the shape of the data in this histogram, the data is symmetrically distributed.

Having recently been acquainted with data by the Lewin Group on the need for emergency supplies, your medical center ER has asked you to order units of red blood cells for emergency surgery. You want to be sure you have enough units of red blood cells for worst-case scenarios. What statistic from the Lewin Group data will you use as a guide?

The answer is e. Mean, minimum, and standard deviation are incorrect as you are concerned with "worst-case scenarios." Range is not correct as it is influenced by the minimum. Therefore, maximum is correct in determining if you will have enough blood for "worst-case scenarios" as it will provide a benchmark for the most units of blood that were used in a medical emergency or treatment for one patient.

Find the first quartile ( Q1 ) for the following data set: {73, 80, 77, 8, 39, 90, 63, 18, 66}

The correct answer is a. Q1 is the median of the "lower half" of the data set. Here, Q1 falls midway between 18 and 39 , which is 28.5 .

Find the third quartile ( Q3 ) for the following data set: {13, 19, 71, 71, 76, 53, 22, 64, 56, 61, 24}

The correct answer is d. Q3 is the median of the "upper half" of the data set. Here, Q3 falls on the value 71 .

The average length of human gestation is 40 weeks, or 280 days, with a standard deviation of 13 days (assume a normal distribution). Knowing this information, and using the Standard Deviation Rule, let's answer the following questions. 68% of the data is between what two values?

The curve above illustrates this distribution. Using the Standard Deviation Rule, we know that 34% of the data is one standard deviation above the mean ( +1 SD), and 34% of the data is one standard deviation below the mean ( −1 SD). Therefore, according to the Standard Deviation Rule, 68% of the data falls between one standard deviation from the mean. Using this information we can now calculate the values that 68% of the data will fall between:

Five-number Summary

The five-number summary* lists the minimum, first quartile, median, third quartile, and maximum in a data set. The five number summary will be represented by a graphical display that we will learn about later in Module 4 called a box plot. The following table is a five-number summary of systolic blood pressure data taken from a sample population of patients. Five-number Summary Statistic Data Value Minimum 105 Q1 119.5 Q2 (Median) 126.5 Q3 138.5 Maximum 169

The average female height in the U.S. in 2010 was 63.8 inches, with a standard deviation of 2.7 inches. Assuming a normal distribution: \ 95% of the data is between what values? Enter the letter that corresponds with your answer choice. 55.8 inches and 62.9 inches 55.8 inches and 69.2 inches 58.4 inches and 69.2 inches 58.4 inches and 62.9 inches

The key to solving this problem is using what we know from the Standard Deviation Rule, specifically that 95% of the data will fall between 2 standard deviations from the mean. Since we know that the standard deviation is 2.7 inches, we can calculate that 2 standard deviations is equal to 5.4 inches (2.7×2). Now knowing this value, we can determine the values that 95% of the data will fall between. To obtain the first value, we will subtract 5.4 inches from the mean: 63.8 inches − 5.4 inches = 58.4. To obtain the second value we will add 5.4 inches to the mean: 63.8 inches + 5.4 inches = 69.2 inches. Therefore, 95% of the data will fall between 58.4 inches and 69.2 inches.

What is the maximum pulse rate in this data set?

The maximum pulse rate in this data set is 98.

Refer to the stem plot above. What is the maximum value in this data set?

The maximum value in this stem plot is 62, represented by the last value in the bottom row of the stem plot.

What is the mean of the set of values above?

The mean of a set of values equals the sum of all the values divided by how many values there are in the dataset. In this case, the mean is equal to 8÷16, which equals 0.5

The data set below displays the pulse rates for 15 women. To complete the exercise, fill in the blanks with the corresponding answer. Enter only decimal points and numerical digit(s) such as " 5 " in the blank. What is the minimum pulse rate in this data set?

The minimum pulse rate in this data set is 62.

Median

The second measure of central tendency is the median*. The median is the "halfway" point of a set of values; an equal number of values will fall above and below the median of a data set. Unlike the mean, the median is not overly influenced by extreme values in the data set, so we can use the median when the data is skewed. Therefore, we say that the median is a resistant measure of center. To properly find the median, values must be first sorted from smallest to largest. Odd Number of Values in a Data Set Below is a set of values representing pulse rates.

Standard Deviation

The standard deviation* tells you how far, on average, the data points are from the mean. In this course, we will not focus on how to calculate the standard deviation (we can use computers to do this for us) but rather on building an intuition for using the standard deviation to measure how spread out the data is in a dataset. Standard deviation is a measurement that is used for symmetric data.

The average female height in the U.S. in 2010 was 63.8 inches, with a standard deviation of 2.7 inches. Assuming a normal distribution: What percent of the data is between 61.1 and 66.5 inches?

These two values, 61.1 and 66.5 inches, are values that represent one standard deviation below and one standard deviation above the mean. Using the Standard Deviation Rule, we know that 68% of the data falls between one standard deviation from the mean. Therefore, the answer is 68%.

True or False? The standard deviation for data points above the mean is different from data points below the mean.

This is a false statement. There is one standard deviation for a data set. You might think of it as the average distance of all the data points from the mean, both above and below.

A histogram where more data falls to the right of the peak of the histogram is also referred to as positively skewed. True or False?

This is a true statement. A distribution that is skewed right is where the long tail of the curve is on the positive side of the peak. This type of distribution is also referred to as positively skewed.

You are constructing a stem plot. The value of one of your data points is 82 . The stem for this data point would be 8 , leaf would be 2 . True or false?

This is a true statement. A stem plot is constructed by separating each data value into a stem (usually the left-most digit) and a leaf (usually the right-most digit).

The average of the smallest two numbers in a data set of seven numbers is the first quartile. True or false? a. True

This statement is false. In a data set of seven numbers, the second number is the value at the first quartile, not the average of the first two numbers.

Which of the following is true for a female whose height is at least 3 standard deviations from the mean? Her height is below 71.9 inches Her height is above 55.7 inches Her height is either above 55.7 inches or below 71.9 inches Her height is either below 55.7 inches or above 71.9 inches

Three standard deviations is equal to 8.1 inches (2.7×3). Now knowing this value, we can determine the values that are three standard deviations from the mean. To obtain the first value, we will subtract 8.1 inches from the mean: 63.8 inches − 8.1 inches =55.7. To obtain the second value we will add 8.1 inches to the mean: 63.8 inches + 8.1 inches =71.9 inches. Therefore, the height of this female will either be below 55.7 inches or above 71.9 inches.

What percentage of pregnancies will range between 254 and 306 days?

To answer this question, first, we have to figure out how far both 254 and 306 are from the mean.

What is the first quartile ( Q1 ) of this data set?

To find the first quartile (Q1) identify the median of the lower half of the data set. 78,85,87,90 | 103 | 104,113,114,117 | 117 | 123,135,145,154,155,156,159,170,179. As this point is 103, the median of the lower half of the data set, or first quartile (Q1) , is 103.

What is the third quartile ( Q3 ) of this data set?

To find the third quartile (Q3) identify the median of the upper half of the data set. 78,85,87,90,103,104,113,114,117 | 117 | 123,135,145,154 | 155 | 156,159,170,179. The median of the upper half of the data set, or third quartile (Q3) , is 155.

In a population, suppose that: the mean resting body temperature is 98.6 degrees and the standard deviation is .8 degrees. Assuming a normal distribution: What percent of the values will fall within one standard deviation of the mean?

Using the Standard Deviation Rule, we know that 68% of the data falls between one standard deviation from the mean. Therefore, the answer is 68%.

In a dot plot, depending on the data collected, every data value or groups of data values can be represented by a dot. True or False?

his is a false statement. In a dot plot, each data value is represented as a point, or dot, on the graph.

According to the Lewin Group, victims of car accidents need from 4 to 40 units of red blood cells when receiving medical treatment. Based on their data the range of units required is 36 . What do you think the standard deviation is? (Source: The Lewin Group, Inc. cited Jeffrey McCullough, M.D., Center for Molecular and Cellular Therapy, University of Minnesota.)

he answer is c. The standard deviation cannot be larger than 36 because the standard deviation has to be smaller than the range. The standard deviation cannot be the same, because only if all the data points have the same value can the standard deviation and range be the same. The standard deviation cannot be less than 0 because standard deviation is never negative. Therefore, for this set of data, the standard of deviation must be less than 36

Refer to the stem plot above. How many stems are represented in this stem plot?

here are a total of 6 stems in this stem plot.

What is the range of pulse rates in this data set?

rrect × Correct. The range is the difference between the smallest and greatest values (minimum and maximum) of a data set. The minimum value is 62; the maximum is 98. 98−62=36

As displayed in the graph below, in a bell-shaped curve, also known as a normal distribution, the Standard Deviation Rule* states

that 68% percent of the data will fall within 1 standard deviation of the mean, 95% of the data will fall within 2 standard deviations of the mean, and 99.7% of the data will fall within 3 standard deviations of the mean. A greater standard deviation means that the data is more spread out.

to calculate the mean,

the values in a data set are simply added together and divided by the number of available values. Let's use the ten heights from the example on the previous page. Step 1 To find the mean, the first step is to add all of the values together. 172.7+168.3+182.9+167.6+189.2+177.8+ 185.4+166.4+193.7+165.1=1769.1. Step 2 Next, you divide that sum ( 1769.1 ) by the number of values in the data set ( 10 ). 1769.1÷10=176.91 The mean of the above data set equals 176.91 .

Dot plot

type of data quantitative useful to display The distribution of data, particularly clusters, gaps, and outliers*. Most useful for smaller data sets.

valid data* is

valid data* is data resulting from a test that accurately measures what it is intended to measure. For instance, if a test reflects an accurate measurement of a student's abilities, it is said to be valid.

Quartiles*

values that divide a data set into four equally sized groups

Symmetric Distributions

A symmetric distribution is a common type of frequency distribution. As the name indicates, symmetric distributions are symmetrical, with the left half of the histogram being roughly identical to the right half. In other words, if you cut the histogram in half, each side would be a near-perfect mirror image of each other. This symmetry*, or type of distribution, is illustrated in the histogram below. Notice how the middle value(s) is the most frequent. The values decrease in a symmetrical manner, to the right and left of the center of the histogram.

Bar Chart

bar chart measures categorical data that is distributed over groups or categories. They are a useful way to compare data among categories. For example, a bar chart would be an appropriate graphical display to show how many people are from each state, as states are an example of discrete categories (categorical data). Rather than pieces of a pie, bar charts graphically illustrate data using bars. There is a bar for each category. The height of the bar is determined by the number of values in that category. The number of values could also be the relative frequency* or the percentage. Here is an example of a bar chart to represent the number of sales for Company XYZ by each month. The categorical variable of the months of the year is along the x -axis. The y -axis represents how many sales were completed in that month.

examples of qualitative data and qualiive

ealthcare statistics such as age, height, weight, blood pressure, and blood cholesterol level are examples of quantitative (numerical) data. Iris of human eye Healthcare statistics such as gender, ethnicity, marital status, and eye color are examples of qualitative (categorical) data.

if you have difficulty determining how many intervals to use, you can refer to the rough guidelines laid out in the chart below:

fewer than 50 measurement divide data into 5 -7 interval if you have 50 to 100 measurements divide into 6 to 10 intervals if you have 100 to 250 measurements intervals divide data into 7 to 12 if you have Greater than 250 measurements divide the data into 10 to 20 intervals

Refer to the histogram above. How many more people have a cholesterol level that falls between 220 and 240 than people that have a cholesterol level that falls between 160 and 180 in this sample?

he number of people that have a cholesterol level that falls between 220 and 240 in this sample is equal to 90. The number of people that have a cholesterol level between 160 and 180 is 20. 90−20=70. Therefore there are 70 more people.

Refer to the histogram above. How many adults in this sample have a cholesterol level that is less than 180 ?

here are 30 adults in this sample that have a cholesterol level that is less than 180. 10 have a level that is less than 160, and 20 have a level that falls between 160 and 180.

histogram is a graph that displays quantitative data. The vertical bars in a histogram show the counts or numbers in each

interval*.-a set of number between 2 specific values A comparison of the intervals, or a review of the graph as a whole, helps the audience understand the information presented.

Refer to the stem plot above. What is the minimum value in the data set?

18 Correct. The minimum value in this stem plot is 18, represented by the first value in the top row of the stem plot.

The nurse manager at Dr. Winston's practice is surveying patients with regard to their level of satisfaction with the time spent waiting for their appointment. The dot plot below illustrates the survey results for last Friday. Based on the dot plot, how many patients responded to the survey on this particular Friday?

21 Correct. The number of patients who responded is equal to the number of dots in the graph.

Refer to the dot plot above. What cholesterol level was most often recorded among this group of patients?

50

The distinction between a histogram and a bar chart is an important distinction to make.

As previously discussed a bar chart measures categorical data that is distributed over groups or categories, while a histogram measures how quantitative data is distributed over various intervals. For example, a histogram would be appropriate to display how many people fall in various intervals of heights, as height is an example of quantitative data. In other words, a histogram is used to display frequencies or relative frequencies for quantitative data; in contrast, a bar chart is used to display frequencies (i.e., counts) or relative frequencies for categorical data.

using a pie chart for the same above blood typing

Data: Additional Observations We can make some additional observations about the data based on these displays: The most common type of blood is Type O+ ( 38% ), followed by Type A+ ( 34% ), Type B+ ( 9% ), and Type O- ( 7% ) Rh+ ( 84% ) is more common than Rh- ( 16% ).

Skewed Distribution

Distributions can also be asymmetric. A skewed distribution* is the term used to describe a distribution that has a "long tail" on one side of the peak. In other words, the distribution is lopsided and does not have a symmetrical shape. Skewness* is used to measure the asymmetry of a distribution. When more data falls further to the left of the peak, it is known as skewed left*. This type of distribution is also referred to as negatively skewed*.

Dot Plots example and discription

Dot Plots A dot plot shows each data value as a point, distributed along a horizontal axis. Dot plots are useful because they show the distribution of a data set, as every data value is represented by a dot. The table below shows the number of emergency room visits each day for 20 days. Construct a dot plot for the data. Step 1 Arrange the data set in order from least to greatest. 2,13,14,20,23,25,31,32,32,32,32,33,36,43,44,44,45,51,52,57 Step 2 Draw a horizontal axis with a scale* from 0 to 60 . Step 3 Place a dot above the horizontal axis for each data point in the table. Here, each "dot" is represented by the letter x. Any repeated values, (such as 32 or 44 which have red boxes around them), should be represented by a mark for each value, stacked vertically.

Graphical Displays for Quantitative Data

Dot plot stem plot box plot histogram

Refer to the histogram above. How many people have heights between 140 cm and 170 cm?

From the histogram we can determine that there are 2 adults that have a height between 140 cm and 149 cm, there are 0 adults with a height between 150 cm and 159 cm, and there are 8 adults with heights between 160 cm and 169 cm. Therefore, there are 10 total adults with heights between 140 cm and 169 cm.

Refer to the histogram above. How many adults are in the group represented by this histogram?

From the histogram, we can determine that there are 18 total adults represented in this distribution.

Refer to the histogram above. How many intervals are in this histogram?

From the histogram, we can determine there are 5 intervals. They are: 140−149, 150−159, 160−169, 170−179, and 180−189.

Refer to the stem plot above. Is the value 34 contained in this data set? (Enter Yes or No)

From the stem plot there is a stem labeled 3, and within this stem there is a leaf labeled 4. Therefore the value 34 is included in this data set.

Nursing Connections Histograms in Nursing

Healthcare facilities such as hospitals typically track the average length of stay of their patients. Histograms may be used to monitor the average length of stay of patients over a given time period. This information is typically shared at staff meetings throughout the facility. Staffing, budgets, and supplies often are determined based on a unit's average length of stay.

Histograms vs. Bar Charts

Histogram: displays frequencies or relative frequencies for quantitative data Bar Chart: displays frequencies (i.e., counts) or relative frequencies for categorical data

example graph of a histogram

Histograms allow team members and stakeholders to view a significant amount of data at one time, and to see how data is distributed across various intervals of values. The histogram's bars represent the values or intervals in the study. The height of each bar shows how many observations or events fall into each interval. The shape of the graph illustrates how the data is distributed.

Even Number of Values in a Data Set

Now, let's say we have a data set with only 14 values. In this data set, the median would fall in between 72 and 73 (indicated by the red line). In the case of an even number of total values, the median is the halfway point between the two middle values. This median can be calculated by adding those two middle values and dividing by two. (72+73)÷2 =(145)÷2 =72.5 So the median of the second data set is 72.5

Nursing Connections Dot Plots and Stem Plots in Nursing

Nurse managers have the responsibility for collecting a wide variety of information about their units. Dot plots and stem plots may be used to visualize some of this collected information. The number of patients admitted with certain medical diagnosis or the number of patients admitted by particular medical providers often are documented in a dot or stem plot. This information is typically tracked to help with budgeting and staffing of units.

Pie Charts

Pie charts, or circle graphs, are often used to show data as parts of a circle. A pie chart has sections or slices. Each slice represents a category of data, and the size of each slice corresponds to the share of the total as a percent. The sum of all of the percents should add up to 100% (or close to it because of rounding.) To create a pie chart, the percentage of the whole that each category represents must be calculated from the raw data. Consider the following data distribution from a sample population representing the number of hours exercised per week.

Communicating Calculations

Previously, we've roughly estimated the center, spread, and possible outliers of data. For more precise results, we calculate these measures. Reliable and valid data calculations give rise to more accurate and precise results. This precision allows for additional graphical displays for quantitative data. Rather than rough estimations, we can display and report precise figures that measure spread, center, and other summary values of data.

Box Plot

Quantitative data useful with The center*, spread, and outliers in a given data set.

Histogram

Quantitative data useful with The distribution (shape* and spread) of quantitative data.

Types of Data

Quantitative data* Categorical data*, also called qualitative data

Interquartile Range

Quartiles are widely used measures when dealing with data sets. Quartiles* are values that divide a data set into four equally sized groups. There is one median per dataset that splits the data into two equally sized groups. Similarly, a dataset has three quartiles that split the data into four equally sized groups. The interquartile range* measures the difference between the third quartile and the first quartile. To illustrate how to find the first and third quartiles we will use the data from the following research poll that asked individuals how many alcoholic drinks they have per week. Example Polling 12 people about how many alcoholic drinks they have per week might yield the following data: 9,1,7,5,4,3,6,2,9,1,9,4 First, from least to greatest: 1,1,2,3,4,4,5,6,7,9,9,9 To determine the first and third quartiles, order the data from lowest value to highest value. Then separate the data into four equal groups. The first quartile exists between the lowest quarter of a data sample and the top three quarters of the data sample. The second quartile is the median. The third quartile exists between the highest quarter of a data sample and the bottom three quarters of the data sample. As we have 12 values in our data set, placing the data into the four quarters results in having three data values in each quarter. As you can see in the chart above, specific numbers that have multiple occurrences are included the number of times they occur. The interquartile range is an indicator of the distribution of a sample and can also help identify any outliers. Outliers are data points (numbers) that are far away from all other data points. It is helpful to identify any outliers and determine whether they should be used.

The histogram above is known as which of the following? A. Bimodal B. Skewed left C. Uniform D. Skewed right

The answer is c. The histogram above displays a uniform distribution.

Range

Range Range is the difference between the smallest (minimum) and greatest (maximum) values of a data set. The minimum is the smallest value available in a data set. The maximum is just the opposite; it's the greatest value in a data set. Minimum, maximum, and range are measurements often used to bring clarity and scope to a set of values. Example Below is a set of data. 18,11,3,26,13,40,31,5,12,45,52,22,17,33,8 To find the minimum, maximum, and range, it is helpful to first sort the values from smallest to largest. 3,5,8,11,12,13,17,18,22,26,31,33,40,45,52 After the data is sorted, the minimum and maximum fall at either end of the data set. In the data set shown, the minimum is 3 and the maximum is 52. To find the range, simply subtract the minimum from the maximum. Range = maximum − minimum 52−3=49 The range of the above data set equals 49.

Reliable data*

Reliable data* is both consistent and repeatable. If you were to administer the same test to the same person three times and the scores were similar each time, the test could be categorized as reliable. If the results varied greatly, the test would be unreliable

skewed right*,

Similarly, a data set can be skewed right*, when more data falls further to the right of the peak of the histogram. This type of distribution is also referred to as positively skewed*.

statistics*

Statistics provides the tools that are used to analyze and summarize large quantities of numerical data. With statistics, we can interpret and present data in a concise, comprehensive, and easily understandable way.

step to find the median

Step 1 First, sort the data from lowest value to greatest value in order to find the median. Step 2 Next, count the total number of values in the data set to find the median. If there is an odd number of values, the halfway point will fall directly on a value and that will be your median. 64, 66, 69, 69, 70, 72, 72 (this data point has a red circle drawn around it), 72, 73, 74, 75, 76, 79, 80, 81 There are 15 values in total, which means the median, or halfway point of the data set, is 72 . (There are seven values below 72 , and seven values above 72 .)

Data that is skewed left is: A. Negatively skewed B. Positively skewed C. Symmetric D. Normally distributed

The answer is a. Data that is skewed left is negatively skewed.

A histogram with a symmetric distribution that has a "valley" rather than a peak is described as: A. Unimodal B. U-shaped C. Uniform D. Bell shaped

The answer is b. A histogram with a symmetric distribution has a "valley" rather than a peak is described as U-shaped.

Data that is skewed right is: A. Negatively skewed B. Positively skewed C. Symmetric D. Normally distributed

The answer is b. Data that is skewed right is positively skewed.

Which of the following does not describe a symmetric histogram? A. Bell-shaped B. U-shaped C. Positively Skewed D. Uniform

The answer is c. A symmetric distribution has a right and left half that are relatively equal. A positively skewed histogram would have two completely different halves.

Refer to the histogram above. The center of the data is approximately a. 45.0 mm Hg b. 55.0 mm Hg c. 65.0 mm Hg d. 75.0 mm Hg

The answer is c. The center of the data in this histogram is approximately 65.0 mm Hg.

What can we infer about this data? A. There are two peaks in the data set B. The data is negatively skewed C. The data is positively skewed D. All intervals in the distribution have the same number of observations

The answer is d. All intervals in the distribution have the same number of observations.

This histogram represents data that is: A. Skewed left, negatively skewed B. Skewed left, positively skewed C. Skewed right, negatively skewed D. Skewed right, positively skewed

The answer is d. The following histogram represents data that is skewed right, or positively skewed.

mean

The mean* is one of the most useful measures of central tendency. The mean, also known as the average, is a single value that represents the center of a set of data values. Mean can be substantially influenced by one or more extreme values in a data set (think skewed data), so mean is only used when the data is symmetric. Therefore, we say that the mean is not a resistant measure of center. To calculate the mean, the values in a data set are simply added together and divided by the number of available values. Let's use the ten heights from the example on the previous page. Step 1 To find the mean, the first step is to add all of the values The mean* is one of the most useful measures of central tendency. The mean, also known as the average, is a single value that represents the center of a set of data values. Mean can be substantially influenced by one or more extreme values in a data set (think skewed data), so mean is only used when the data is symmetric. Therefore, we say that the mean is not a resistant measure of center. T

What is the median of the set of values above?

The median is the middle value of a dataset. After reordering the above values in order from lowest to greatest, the two middle values are 0 and 2. So, the equation reads (2+0)÷2=2÷2=1.

Limitations of Mode

The mode has the following important limitation: There can be more than one! In fact, if all of the intervals contained the same number of scores, every value would be the mode, rendering that measure useless. For this reason, you should always graph your data. The other two measures of central tendency, median and mean, may not tell you this information about a data set's distribution as well as a graph. Below are three graphs, each with a mean of approximately 3.33 . Notice the difference in shape. The graph on the left is unimodal*, the graph in the middle has a uniform distribution, and the graph on the right is bimodal.

What is the mode of the set of values above?

The mode is the value in a dataset that occurs most frequently. In the above dataset, 7 appears twice, (more than any other value) so the mode is 7.

mode

The mode* is the third and final measurement of central tendency. The mode represents the value that occurs most often in a data set. The mode is only relevant if a data set has values that are repeated, and unlike mean and median, there can be more than one mode in a data set. Below is a set of values.

Graphical Displays: Describing Distributions

The shape of the graphical display The spread of the data The maximum and minimum values Any values that could be outliers

Refer to the histogram above. What is the size (range) of each of the intervals in this histogram?

The size (range) of each of the intervals in this histogram is 20.

What is the sum of the set of values above?

The sum of the values above is equal to 8. You need to be able to calculate the sum in order to calculate the mean of a dataset.

Describing the Distribution of Categorical Data

There are a variety of different ways to describe the distribution of a categorical variable. Consider the following bar chart that illustrates a distribution of blood types and Rh factor: Bar Chart Summary The bar chart shows the frequency distribution* for the categorical variable: In the above chart, the categorical variable is Blood Type

When constructing a histogram, whether or not you include the minimum and maximum values is left to your discretion. True or False?

This is a false statement. A histogram needs to encompass all of the data collected, therefore both the minimum and the maximum values should be represented in the graph.

The first step in building a dot plot is to draw a horizontal scale. True or False?

This is a false statement. In order to determine the scale of the horizontal axis, we need to know what the maximum and minimum values are, so the first step should be arranging the data in increasing order.

Refer to the histogram above. The distribution of this histogram is positively skewed. True or false? (Enter True or False)

This is a false statement. The distribution of this histogram is negatively skewed as the long tail of the graph is to the left of the peak.

Refer to the stem plot below. The number of leaves in this stem plot is 6 . True of False?

This is a false statement. The leaves in a stem plot are on the right side of the plot. Therefore, the number of leaves in this stem plot is 66 .

A left skewed distribution is where the tail of the distribution is on the right-hand side, and the tail is longer than the left-hand side. True or False?

This is a false statement. This describes a right skewed distribution.

To create a histogram,

To create a histogram, data is collected using a check sheet*. It is necessary to decide what intervals, or values, you are going to use and note how the data is distributed. The intervals in a histogram need to encompass all of the data collected. It is important to make sure the minimum* and maximum* values are accounted for, as well as every value in between. Additionally, make sure that the intervals are comparable and exhaustive. The intervals should run consecutively so that all data is accounted for and the visual representation of the graph is accurate. It is also important to use an appropriate number of intervals;

Nursing Connections The Importance of Recognizing Normal versus Skewed Distributions in the Field

Understanding normal versus skewed distributions of data is an important concept for nurses to comprehend. Data does not always fall in a perfect bell curve. A skewed distribution does not mean that there is something wrong with the data. The skewed information may show that the information falls to the right or left of the mean, which often is a point of significance. For example, consider that you are a nurse manager and you are collecting information on how long it takes a nurse to complete a task. You find that it takes one nurse takes 15 minutes to complete the task, while five other nurses take between 5 and 7 minutes to complete the same task. The distribution of this data you have collected is not normally distributed but rather skewed. This analysis would indicate that you may need to investigate why it took the one nurse so much longer to complete the task.

Importance of Center and Spread

When describing quantitative data, center and spread are two important characteristics. The center of a set of quantitative data is a point that represents the "middle" of the data. As we will see, this can be measured in many different ways. There are also multiple measurements that are used to describe spread. Spread, in general, is a way to describe the dispersion of quantitative data. Is all of the data clustered around one point, or is it spread out? Consider the following data set of systolic blood pressures:

Quantitative data*

also called numerical data, consists of data values that are numerical, representing quantities that can be counted or measured.

Categorical data*

also called qualitative data*, consists of data that are groups, such as names or labels, and are not necessarily numerical. It is possible for numbers to be used as categorical data. For example, the numbers on the uniforms of basketball players are categorical, because they are used to identify players, and they do not measure a quantity. Zip codes are another example of numbers that do not measure quantity but are used to categorize different locations by the postal system.

Refer to the dot plot above. Is the value of 0 in the data set? (Enter Yes or No)

no

numerical summary*

numerical summary* is a number used to describe a specific characteristic about a data set

Graphical Displays for Categorical Data

pie chart type of data catgorical useful display - different parts of whole Bar chart type of chart - categorical useful display - countd frequencies for the catagories

A histogram that is skewed right, or positively skewed indicates

that more values are far greater than the most common value, but not far less. A histogram that is skewed left, or negatively skewed, indicates that more values are far less than the most common value, but not far greater. Nursing/healthcare literature tends to use the terms of positively and negatively skewed rather than right and left skewed, so it is important that you are familiar with both ways to describe these distributions.

A histogram that has more than two modes is known as: A. Unimodal B. Uniformly distributed C. Multimodal D. Symmetric

the answer is c. A histogram with more than two modes is known as multimoda


Related study sets

Ch 4 multimedia design study guide

View Set

Psych 275 Test #1, Lecture #2 Love

View Set

HTML fill in the blank and multiple choice Ch.1

View Set