Statistics 2

Looking at the distribution of data can reveal a lot about the relationship between the mean, the median, and the mode. There are three types of distributions. A left (or negative) skewed distribution has a shape like Figure 2.17. A right (or positive) skewed distribution has a shape like Figure 2.18. A symmetrical distribution looks like Figure 2.16.

Histogram

a graphical representation in x-y form of the distribution of data in a data set; x represents the data and y represents the frequency, or relative frequency. The graph consists of contiguous rectangles.

Mean

a number that measures the central tendency of the data; a common name for mean is 'average.' The term 'mean' is a shortened form of 'arithmetic mean.' By definition, the mean for a sample (denoted by x¯x) is x¯ = Sum of all values in the sampleNumber of values in the samplex¯=Sum of all values in the sampleNumber of values in the sample, and the mean for a population (denoted by μ) is μ=Sum of all values in the populationNumber of values in the populationμ=Sum of all values in the populationNumber of values in the population.

Interval

also called a class interval; an interval represents a range of data and is used when displaying large data sets

Outlier

an observation that does not fit the rest of the data

Frequency Polygon

looks like a line graph but uses intervals to display ranges of large amounts of data

Interquartile Range

or IQR, is the range of the middle 50 percent of the data values; the IQR is found by subtracting the first quartile from the third quartile.

Midpoint

the mean of an interval in a frequency table

Frequency

the number of times a value of the data occurs

Quartiles

the numbers that separate the data into quarters; quartiles may or may not be part of the data. The second quartile is the median of the data.

Relative Frequency

the ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes

Median

a number that separates ordered data into halves; half the values are the same number or smaller than the median and half the values are the same number or larger than the median. The median may or may not be part of the data.

First Quartile

the value that is the median of the of the lower half of the ordered data set

Skewed

used to describe data that is not symmetrical; when the right side of a graph looks "chopped off" compared the left side, we say it is "skewed to the left." When the left side of the graph looks "chopped off" compared to the right side, we say the data is "skewed to the right." Alternatively: when the lower values of the data are more spread out, we say the data are skewed to the left. When the greater values are more spread out, the data are skewed to the right.

Mode

the value that appears most frequently in a set of data

Paired Data Set

two data sets that have a one to one relationship so that:both data sets are the same size, andeach data point in one data set is matched with exactly one point from the other set.

A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large, continuous, quantitative data sets. A frequency polygon can also be used when graphing large data sets with data points that repeat. The data usually goes on y-axis with the frequency being graphed on the x-axis. Time series graphs can be helpful when looking at large amounts of data for one variable over a period of time.

A stem-and-leaf plot is a way to plot data and look at the distribution. In a stem-and-leaf plot, all data values within a class are visible. The advantage in a stem-and-leaf plot is that all values are listed, unlike a histogram, which gives classes of data values. A line graph is often used to represent a set of data values in which a quantity varies with time. These graphs are useful for finding trends. That is, finding a general pattern in data sets including temperature, sales, employment, company profit or cost over a period of time. A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories. One axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one (grouped bar graphs), and others show the bars divided into subparts to show cumulative effect (stacked bar graphs). Bar graphs are especially useful when categorical data is being used.

Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.

The mean and the median can be calculated to help you find the "center" of a data set. The mean is the best estimate for the actual data set, but the median is the best measurement when a data set contains several outliers or extreme values. The mode will tell you the most frequently occurring datum (or data) in your data set. The mean, median, and mode are extremely helpful when you need to analyze your data, but if your data set consists of ranges which lack specific values, the mean may seem impossible to calculate. However, the mean can be approximated if you add the lower boundary with the upper boundary and divide by two to find the midpoint of each interval. Multiply each midpoint by the number of values found in the corresponding range. Divide the sum of these values by the total number of data values in the set.

The standard deviation can help you calculate the spread of data. There are different equations to use if are calculating the standard deviation of a sample or of a population. The Standard Deviation allows us to compare individual data or classes to the data set mean numerically. s = ∑(x−x¯)2n−1−−−−−−−−√∑(x−x¯)2n−1 or s = ∑f(x−x¯)2n−1−−−−−−−−−√∑f(x−x¯)2n−1 is the formula for calculating the standard deviation of a sample. To calculate the standard deviation of a population, we would use the population mean, μ, and the formula σ = ∑(x−μ)2N−−−−−−−−√∑(x−μ)2N or σ = ∑f(x−μ)2N−−−−−−−−−√∑f(x−μ)2N.

The values that divide a rank-ordered set of data into 100 equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the 50th percentile would be greater than 50 percent of the other observations in the set. Quartiles divide data into quarters. The first quartile (Q1) is the 25th percentile,the second quartile (Q2 or median) is 50th percentile, and the third quartile (Q3) is the 75th percentile. The interquartile range, or IQR, is the range of the middle 50 percent of the data values. The IQR is found by subtracting Q1 from Q3, and can help determine outliers by using the following two expressions. Q3 + IQR(1.5) Q1 - IQR(1.5)

Frequency Table

a data representation in which grouped data is displayed along with the corresponding frequencies

Box plot

a graph that gives a quick picture of the middle 50% of the data

Percentile

a number that divides ordered data into hundredths; percentiles may or may not be part of the data. The median of the data is the second quartile and the 50th percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.

Standard Deviation

a number that is equal to the square root of the variance and measures how far data values are from their mean; notation: s for sample standard deviation and σ for population standard deviation.

Variance

mean of the squared deviations from the mean, or the square of the standard deviation; for a set of data, a deviation can be represented as x - x¯x¯ where x is a value of the data and x¯x¯ is the sample mean. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and one.

Statistics 2

Kaugnay na mga set ng pag-aaral

ch. 38 review

Chapter 6: Transforming Data Models into Database Designs

OCE Chap 14

Dementia and loss

ADhlth EX

Real Estate 2.9

Media Law Prior Restraint

Authenticating RAS Clients *

Exam 2 quiz questions

integrated physics and chemistry

ATCG 413 Chapter 5

A.H. - Anatomy and Physiology

set20151101-1397

REL MIDTERM 2

MEEN 222 A7

Wilhelm Wundt and Introspection

NURS 3334 Practice Questions Exam 5

CDX Brakes midterm answers

Ortografía: Uso de la "B" y de la "V".

History of Photography