CS995 - Descriptive Statistics for a Single Variable

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Spread

A way to describe the dispersion of quantitative data is all data clustered around one point or is it spread out?

Histogram

Quantitative Distribution (shape and spread) of quantitative data

Box Plot

Quantitative The center, spread, and outliers in a given data set

Standard Deviation Rule

68% percent of data will fall within 1 standard deviation of the mean 95% of the data will fall within 2 standard deviations of the mean 99.7% of the data will fall within 3 standard deviation of the mean A greater standard deviation means that the data is more spread out

Data Distribution - Skewed Distributions

Asymmetric distribution Long tail Skewed left: more data falls further to the left of the peak Skewed right: vice versa (aka positively skewed)

What would be the best type of graph to use to display the age of all employees in a particular division in a company? a) Bar chart b) Histogram c) Scatterplot d) Pie chart

Feedback: The correct answer is b. This is quantitative data that will be grouped into ranges or bins. Therefore, a histogram is the best choice to display this data.

Anatomy of Box Plot

Four parts: First whisker Two rectangles another whisker Note: regardless of size, each part represents 25% of the data Convenient way of showing five important values: min, max, 1st quartile, median, third quartile

Histogram VS Bar char

Histogram: displays frequencies or relative frequencies for quantitative data (like how many people fall from various intervals of heights) Bar chart: frequencies for categorical data (how many people fall in different country)

Extreme values in Symmetric Distribution

If distribution is symmetric, extreme values on either side of distribution will be roughly similar Symmetric distribution will have similar extreme positive and negative values

Five Number summary

List the minimum, first quartile, Q2 (median), third quartile, maximum in a data set

Bimodal/multimodal

a distribution has two clear peaks rather than one a histogram with two or more clear peaks is called multimodal

Mean

aka average single value that represents the center of a set of data values only used when data is symmetric mean is not a resistant measure of center

1.5 IQR Criterion Rule for Outliers

any points that are more than 1.5 X IQR (IQR = Q3 - Q1) above Q3 or below Q1. Upper outlier: Q3 + 1.5 x IQR Lower outlier: Q1 - 1.5 x IQR

Reliable data and valid data

both consistent and repeatable (reliable) data is resulting from a test that accurately measures what is intended to measure (valid)

Data Distribution - Symmetric Distributions

common type of frequency distribution Left half of histogram being roughly equal to the right half NOTE: Just because a histogram is symmetric does not make it normal

Dot plot

Quantitative Distribution of data (clusters, gaps, outliers) Useful for smaller data sets

Stem Plot (Stem and leaf plot)

Quantitative Distribution or shape of data according to place values

median

halfway point of a set of values we can use median when data is skewed media is a resistant measure of center to find median: 1. sort data from smallest to largest 2. if number of values is odd, the halfway point is median 3. if number of values is even, find the center two values, and divide the sum of two values by 2

Standard deviation

tells how far, on average, the data points are from the mean. Used for symmetric data

Range

the difference between the smallest and greatest values of a data set

Interquartile range

measures the difference between the third quartile and the first quartile

Extreme values in Skewed Distribution

refer to values in a histogram that come after a gap as extreme values and possible outliers

Mode

value that occurs most in a data set there can be more than one mode in a data set

Quartiles

values divide data set into four equally sized groups. A data set has three quartiles that split the data into four equally sized groups

A study was conducted on the number of attendees each day at the state fair. You are asked to recommend a method for displaying the data graphically so that the shape of the data can be seen, and each data value is also visible. What would be the best choice among the following? (Enter the letter that corresponds with your choice.) a. Bar chart b. Histogram c. Scatter plot d. Stem plot

D stem plot Stem plot is the best choice as these types of graphs show the shape of a data set and each data value.

Extreme values in Skewed Distribution - Preferred measures of center and the measures of spread for normal and skewed distributions

Distribution - Measures of Center - Measures of Spread Skewed - Median - Range or IQR Normal Symmetric - Mean - Standard Deviation

9. A marketing researcher was investigating residential water usage in a metropolitan area for a report she was putting together for a client. She polled individual households and asked them to report their average monthly water bill. The lowest average monthly water bill was $35.17$35.17and the highest average monthly water bill was $153.20$153.20. When presenting the data, she did not want the decimal values to get lost. What display would you suggest she use? a. Histogram b. Pie Chart c. Stem plot d. Box Plot

The answer is c. A stem plot is a good choice as you can see the distribution of the data and the values are preserved.

6. You are designing a study of the number of hours worked by financial analysts working at a particular firm. You are especially interested in knowing if there are any outliers in the data, as well as the median number of hours worked and the approximate distribution of the data. Which graphical display would satisfy your needs? a. Bar chart b. Histogram c. Stem plot d. Box plot

The answer is d. A box plot is a good display to use to show the shape of a data set, as well as outliers (if any).

It is important to start the frequency scale on a bar chart at which value to be certain not to overemphasize a difference in values? a) Zero b) The lowest measured value c) The smallest frequency d) As long as the scale is even, it doesn't matter where you start it.

The correct answer is a. A vertical scale that does not start at zero can exaggerate the differences in a data set.

You are a professional trainer at a local sports academy. You ask your athletes to determine the number of grams of protein they consume for a particular meal. Which of the following would be the best choice to illustrate the shape of the data you collect? a) Bar chart b) Pie chart c) Box plot d) None of the above

The correct answer is c. As the data you are collecting is quantitative data, from the choices below a box plot would be your best choice to illustrate the shape of the data.

Of the following sets of data, which would you assume should have the smallest range? a) Price in dollars ($) of penny stocks currently being traded over-the-counter through the OTC Bulletin Board. b) Ages of stockbrokers currently on the trading floor. c) The number of trades on the NYSE on any given day. d) The ages of interns currently in the college summer internship program.

The correct answer is d. 21 years is the average age of a college student in the summer internship program. The variation from this amount is generally one or two years ++ or −-. Therefore, we can assume the data set would be 19,20,21,22,2319,20,21,22,23 years of age. The range would be equal to 23−19=423-19=4. Each of the other options has a far greater probability of having data that is more spread out.

What is the best type of graph to use where it is easiest to estimate outliers? a) Stem plot b) Histogram c) Dot plot d) Box plot

The correct answer is d. Outliers are determined by Q1Q1 and Q3Q3, which are clearly shown on a box plot. The outliers themselves are also displayed on the box plot.


Kaugnay na mga set ng pag-aaral

Network+ - Lesson 8: Network Topologies

View Set

Chapter 3 PRE and HW - without PowerPoint Notes

View Set

Pharmacology M/C NCLEX-RN review questions

View Set

Safety and Infection Control EAQ

View Set

BUSORG 1645-1120; Corporate Entrepreneurship with Paul Harper, Midterm

View Set