Statistics- chapter 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

empirical rule

for data with a bell shaped distribution the standard deviation has the following characteristics 1. about 68% of the data lies within one standard deviation of the mean 2. about 95% of the data lies within two standard deviations of the mean 3. about 99.7% of the data lies within three standard deviations of the mean

determining the midpoint

midpoint of a class (lower class limit + upper class limit)/ 2

trimmed mean (x%)

find 10% trimmed mean- order data delete highest and lowest 10% of data, find the mean of remaining data

population variance

σ=√σ²=√( (∑(x-µ)²) / N )

time series graph

data set is composed of quantitive entries taken at regular intervals over a period of time

deviation

difference between data entry x and the mean of the data set -population data set: a. deviation of x= x-µ -sample data set: a. deviation of x=x-xbar

deciles

divides data into 10 = parts D1, D2, D3.....D9

freq. polygon graph

- a line graph that emphasizes the continuous change in a freq.

percentiles

divides data into 100 = parts P1, P2, P3....P99

cumulative freq. graph

- line graph that displays the cumulative freq. of each class at its upper boundary -upper boundaries marked on the horizontal axis -cumulative freq. marked on vertical axis

dot plot

each data entry is plotted using a point above a horizontal axis

interpreting a box plot

- shows median and the measure of variation (range, IQR) - helps to identify shape of distribution - can identify outliers - can be used to compare 2 or more numerical data sets by side by side box plots

stem and leaf plot

each number is separated into a stem and leaf 2 | 3 5 6

comparing Z scores

-John received a 75 on a test whose class mean was 73.2 with a standard deviation of 4.5 -Samantha received a 68.6 on a test whose class mean was 65 with a standard deviation of 3.9 -Which student had the better test score? John: Z= (x-µ)/σ, (75-73.2)/ 4.5 = .4 Samantha: (68.6-65)/3.9 = .92 -John's score was .4 standard deviations higher than the mean while Samantha's score was .92 standard deviations higher than the mean. -Samantha's test score was better than Johns -neither score is unusual because both Z scores are within -2 and 2

pie chart

-a circle divided into sectors that represent categories -the area of each sector is proportional to the freq. of each category

outliers

-a value that lies very far away from the vast majority of the other values in a data set - a data value is an outlier if a. its above Q3 by an amount thats greater than 1.5 x IQR b. below Q1 by an amount that is greater than 1.5 x IQR

pareto chart

-a vertical bar graph in which the height of each bar represents freq. or relative freq. -bars are positioned in order of increasing height with the tallest bar positioned at the left

quartiles

-approximately divide an ordered data set into four equal parts -first and third quartiles are the medians of the lower and upper halves of a set -second quartile is the overall median

range rule of thumb for estimating standard deviation

-based on the principle that for many data sets the vast majority of sample values lies within two standard deviations of the mean -to roughly estimate the standard deviation from a collection of known sample use s≈range/4

mode

-data entry that occurs with greatest freq. -if no data entry is repeated the data set has no mode -if two entries occur with same freq. each entry is a mode (bimodal) -only measure of data center that can be used with nominal data

coefficient of variation

-describes the standard deviation of a data set as a % of the mean -population data set: a. CV= (σ/µ) x 100 -sample data set: a. CV= (s/xbar) x 100

interquartile range

-difference between first and third quartiles -includes middle 50% of data entries -preferred measure of variation when the data distribution is severely skewed -IQR= Q3-Q1

range

-difference between the max and minimum data entires in a set -data must be quantitative -range= max data entry- min. data entry

paired data sets

-each entry in one data set corresponds to one entry in a second data set -graph using a scatter plot -ordered pairs are graphed as points in a coordinate plane -used to show relationship between 2 quantitive variables

box and whisker plot

-exploratory data analysis tool -highlights important features of the data set -requires: 1. minimum entry 2. Q1 3. median 4. Q3 5. max entry -drawing a box plot 1. construct horizontal scale that spans range of data 2. plot the 5 numbers above the scale 3. draw a box from Q1 to Q3 and draw a vertical line at Q2 4. draw whiskers from box to min and max entries -if there is an outlier, stop whisker at number before the outlier, and put a star where the outlier value is

freq. histogram

-for numerical data -a bar graph that represents the freq. distribution -horizontal scale represents a number line and measures the data values -vertical scale measures the frequencies of the classes -consecutive bars much touch

standard score, Z score

-represents the number of standard deviations a given value x falls from the mean µ - Z= (value- mean)/ standard deviation -Z= (x-µ)/σ -Z= (x-xbar)/s -if x is above mean Z>0 -if x is below mean Z<0 -if x is the mean Z=0

relative freq. histogram

-same shape and same horizontal scale as the corresponding freq. histogram - vertical scale measures relative freq.

interpreting standrad deviation

-standard deviation is the measure of a typical amount an entry deviates from the mean -the more the entries are spread out the greater the standard deviation

mean

-sum of all the data entries divided by the number of entries -advantage of using mean: reliable because it takes into account every entry of data set -disadvantage: greatly affected by outliers

skewed to the left distribution

-tail of the graph elongates more to the left - mean is to the left of the median

skewed to the right distribution

-tail of the graph elongates more to the right -the mean is to the right of the median

median

-the data value that separates the bottom 50% from the top 50% -middle value when data values are arranged in increasing order -not affected by extreme value

class boundaries

-the numbers that separate classes without forming gaps between them - if class is 59-114 and 2nd class is 115-170 difference between those is 1, divide this in half to get class boundaries -class boundary= 58.5-114.5 and 114.5-170.5

chebychevs theorm

-the portion of any data set lying within k standard deviations of the mean is at least 1- (1/k²) ex. k=3 in any data set at least _ of the data set lie within 3 standard deviations of the mean 1- (1/3²)= 88.9%

important principles for outliers

1. an outlier can have a dramatic effect on the mean and the standard deviation 2. can have dramatic effect on the scale of a histogram so that the nature of the distribution is totally obscured 3. cannot be removed without any justification

frequency distribution

a table that shows classes or intervals of data with a count of the number of entries in each class

bell shaped distribution

a vertical line can be drawn through the middle of a graph of the distribution and resulting halves are approx. mirror images

second quartile

about one half of the data fall on or below Q2 (median)

first quartile

about one quarter the data fall on or below Q1

third quartile

about three quarters of the data fall on or below Q3

∑x

all data entries in a set

uniform distribution

all entries or classes in the distribution have equal or approx. equal freq.

fractiles

numbers that divide an ordered data set into equal parts

identifying unusual data values

ordinary: -2 ≤ Z ≤2 unusual: Z < -2 or Z > 2

relative frequency

portion or % of the data that falls into a particular class relative freq.= class freq./sample size= f/n

sample variance

s²=(∑(x-xbar)²) / (n-1)

sample standard deviation

s²=√s²=√((∑(x-xbar)²) / (n-1))

weighted mean

the mean of a data set whose entries have varying weights xbar= ∑(x × w)/∑w w= weight x= entry

cumulative frequency

the sum of the freq. for that class and all previous classes

sample mean

x bar= ∑x/n

mean of freq. distribution

xbar= ∑(x × f)/n n=∑f where x and f are the midpoints and freq. of a class

population mean

µ=∑x/ N


Ensembles d'études connexes

20.3 Diseases caused by Bacteria and Viruses

View Set

Nursing 132. Fundamentals. Potter and Perry

View Set