Statistics: Chapter 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

What are Quartiles and Fractiles:

Fractiles: Numbers that partition (divide) an ordered data set into equal parts. - Example of a fractile: median -Recall: the median divides an ordered data set into equal parts Quartiles: Divide an ordered data set into four equal parts -First Quartile, Q!: About one quarter of the data fall on or below Q1. (Middle of the bottom half of the data set) -Second Quartile, Q2: About one half of the data fall on or below Q2 (Therefore, Q2 is the same as the median) -Third Quartile, Q3: About three quarters of the data fall on or below Q3 (Q3 = middle of the "top" half of the data set.) WHEN FINDING THE QUATILES ALWAYS FIND Q2 (MEDIAN) FIRST. Then the others. ***See attached example***

What is the Midpoint of a class?

SUM (Not difference) of the lower and upper limits of the class divided by 2 - Mid point = (Lower class limit + Upper class limit)/2 -DO NOT ROUND THE MID POINT - Answer must be either a whole number or all end in .5 Any other answer means you have an error in your class limits.

What is the tally system and how can it help me with the Frequency distribution?

Tally up the frequency (f) of the variables to help you stay organized.

Range

The difference between the maximum and minimum data entry. - Range = Max - Min

Interquartile Range (IQR):

The difference between the third and first quartiles. - IQR = Q3 - Q1 -The IQR for the odd set of data would be IQR = 27.5-9 = 18.5 -The IQR for the even set of data would be IQR = 31 - 11 = 20

Standard deviation for grouped data:

You previously learned that large data sets are usually best represented by frequency distributions. The formula for the sample standard deviation for a frequency distribution is: ***SEE ATTACHED PICTURE*** where n = Σf (the number of entries in the data set) When a frequency distribution has classes, estimate the sample mean and the sample standard deviation by using the midpoint of each class. When we are comparing variation in different data sets, we can use standard deviation when: The data sets use the same units of measure The data sets have means that are about the same

Steps for constructing frequency distribution (8 Steps)

(Don't over think it.) 1. Decide the number of classes (categories) you need 2. Find the range of the data a. Range of max - min. (Max Data variable - Min. Data variable) 3. Find the class width a. Divid the answer from STEP 2 by the number of classes in STEP 1 and ROUND UP. b. IF YOUR DIVISION GIVES YOU A WHOLE NUMBER THEN YOU STILL MUST ROUND UP TO THE NEXT WHOLE NUMBER. 4. Select a starting point (Usually the lowest data value) a. This will give you the first lower class limit. 5. Add the class width from STEP 3 to STEP 4 to get the lower class limit. 6. Repeat STEP 5 for the other lower class limits until you have the desired number of classes. 7. Find the upper class limits a. You can subtract one from the next lower class limit to the find the upper class limits. b. THE LAST UPPER CLASS LIMIT MUST BE GREATER THAN OR EQUAL TO THE MAXIMUM VALUE IN THE DATA SET. 8. Add a second column to the table for the number of data entries in each class. a. Count the number of data entries (Tally!) and put the total in the second column. b. THE SUM OF THE FREQUENCIES SHOULD MATCH THE NUMBER OF ENTRIES IN THE SAMPLE DATA SET.

Sample Mean:

(Ignore the i in the formula) • x̅ (read as x bar) represents the sample mean • n represents the number of entries in a sample (sample size) • x represents the data values Recall: The uppercase Greek letter sigma Σ represents the summation of values

What are general rules for Frequency Distributions?

- The number of classes should be between 5 and 20. a. Less than 5 is not helpful informationally and over 20 is information overflow. b. All class amounts will be given in HW, quizzes, and tests. - Class must be EXHAUSTIVE. (Meaning all data provided in your study must be in your data distribution. - Classes are MUTUALLY EXHAUSTIVE (No single data item is shared between more than one data group. - Lastly Classes must be CONTINUOUS. a. Continuous means that all groups are accounted for. Even if a frequency is zero the class must be represented in the frequency distribution. b. No huge gaps.

How to find outliers with the IQR?:

-Values smaller that Q-(1.5*IQR) are outliers -Values larger than Q3+(1.5*IQR) are outliers Eg: From the last set (even set of data) Q1=11 Q3=31 IQR=20 -Lower limit = 11-(1.5*20)= 11-30 =-19 -Upper limit = 31+(1.5*20)=31+30=61 -Therefore any number less than -19 or greater than 61 are outliers.

Constructing a Pareto Chart:

1. Arrange the data from the largest to smallest according to frequency 2. Draw and label the x and y axes 3. Represent the categories (qualitative data) on the x-axis 4. Represent the frequency on the y-axis 5. Draw bars according to the frequencies (make bars the same width) - Please note some bars can be as wide as you would like them to be. But they should be the same width for each bar. -There are gaps between the bars.

How to calculate the Mean of a Frequency Distribution:

1. Create a table with four columns a. In the first column place all the class intervals b. In the second column place all the associated frequencies 2. Find the midpoint of each class and place the values in column 3. Recall: the find the midpoint you need the sum of the lower and upper class limits divided by 2. 3. Find the products of the midpoints and the frequencies. Place the values in column 4 4. Find the sum of the frequencies (column 2) 5. Find the sum of the products (column 4) 6. Divide the number found in Step 5 by the number found in Step 4

Constructing a Scatter Plot:

1. Draw a graph. Label the x and Y axes. Choose a range that includes the maximums and minimums from the given data. 2. Plot the first point on the graph. Recall an ordered pair, the first number is the x coordinate and the second number is the y coordinate. For example: if your point was (3,25) to plot this you would go 3 places to the right and 25 places up. 3. Plot the remaining points on the graph.

Constructing the Time Series Graph:

1. Draw and label the x and Y axes 2. Represent the time units on the x-axis 3. Represent the data described on the y-axis 4. Plot each point 5. Draw a line connecting the points from left to right.

How do you construct a histogram?

1. Draw and label the x and y axes (recall x= horizontal and y = vertical) 2. Represent the class boundaries or the mid-points on the x-axis 3. Represent the frequency (Whole numbers) or relative frequency (decimals) on the y - axis. 4. Draw in the bars.

Constructing and Ogive Graph

1. Find the cumulative frequency of each class in the frequency distribution 2. Draw and label the x and y axes 3. Represent the upper class boundaries on the x-axis 4. Represent the cumulative frequency on the y-axis 5. Plot the cumulative frequency values for each upper class boundary 6. Plot the points. Please note the following: • The graph should start at the lower boundary of the first class. o At the lower boundary of the first class the cumulative frequency is zero. Therefore, the graph will be connected to the x-axis ("ground") at this point only. • The graph should end at the upper boundary of the last class. o At the upper boundary of the last class the cumulative frequency is equal to the sample size. Therefore, the graph will be at its highest point at the end. 7. Connect the points in order from left to right

How to construct a dot plot:

1. Find the largest and smallest data point. 2. Draw a horizontal axis that starts at one less than your smallest data entry and end at one more than your largest data entry and has equally spaced steps in between. 3. To represent a data entry, plot a point above the entry's position on the axis. If an entry is repeated, plot another point above the previous point.

Finding the Variance and Standard Deviation:

1. Find the mean of the data set. Recall: mean = sum of all the data entries divided by the number of entries 2. Find the deviation of each entry (this is a good time to start a table - see examples in book) Deviation = Each data entry - Mean Some answers will be negative Sum of deviations should equal zero (or close to it - depending on rounding) 3. Square each deviation Note: the square of a negative number is a positive; so all your answers in this step should be positive For some calculators in order to square a negative you need to put parentheses around the number (-2) and then square it 4. Add the squared deviations (the answers you had in step 3) This answer is called the sum of squares and notated as Note: the sum of the squares does not equal the square of the sum. Please do not try to take a shortcut and add the numbers and square that sum - if you do that the answer will be wrong. 5. Find the variance - be careful here there is one rule for populations and one for samples (make sure you know which one you are dealing with) a. Population variance: divide the answer in step 4 by N, where N is the number of data entries in the population (population size). SEE ATTACHED FORMULA b. Sample variance: divide the answer in step 4 by n - 1, where n is the number of data entries in the sample (sample size). SEE ATTACHED FORMULA 6. Find the standard deviation by taking the square root of the variance (step 5) It does not matter if you are dealing with a population or a sample, you find the standard deviation the same way. Choosing the correct symbol does matter. Remember o Population standard deviation = ATTACHED IMAGE o Sample standard deviation = ATTACHED IMAGE ***SEE ATTACHED IMAGE***

How do you construct a frequency polygon?

1. Find the midpoints for each class in the frequency distribution. 2. Draw and label the x and y axes 3. Represent the class midpoints on the x- axis 4. Represent the frequency on the y-axis 5. Plot the frequency values for each class midpoint. 6. The graph should begin and on the horizontal axis, (the graph must be connected to the x axis or the "ground") a. Extend the left side to one class width before the first class mid point. Repeat for right side. Allows the graph to show "zero" on each side. 7. Connect the points in order from left to right

How can you find a percentile using the Z-score?

1. Find the z-score 2. Use the empirical rule to find the percent.

What are the major types of Histograms?

1. Frequency Histogram: The vertical scale measures the frequency. 2. Relative frequency histogram: The vertical scale measures the relative frequencies. Properties: 1. The horizontal scale is quantitative and measures the data values. 2. The vertical scale measures the frequencies (or relative frequency) of the classes 3. Consecutive bars must touch NO GAPS

How to calculate the weighted mean:

1. Identify the x and the w. • Look for what you are trying to find. • The term that follows the word mean is the x. oExample: What is the mean score? Score = x oExample: What is the mean daily balance? Daily balance = x 2. Create a table with three columns a. In the first column place all the values of x. b. In the second column place all the values of w (make sure to pair up columns 1 & 2 correctly) c. In the third column multiply each value with its associated weight. This column will have xw 3. Find the sum of the weights (column 2) 4. Find the sum of the products of the values and the weights (column 3) 5. Divide the number found in Step 4 (sum of xw) by the number found in Step 3 (sum of w). ***SAMPLE***

Interpreting standard deviation: (EMPIRICAL RULE: 68 - 95 - 99.7 Rule)

Empirical Rule: a statistical rule stating that for a normal (bell-shaped curve) distribution, almost all data will fall within three standard deviations of the mean. In order to use the Empirical Rule you must know the value of the mean and the standard deviation. This rule only applies to data sets having a normal (bell-shaped, symmetric) distribution This rule does not apply to distributions that are not normal

Example of Empirical rule on a bell shaped distribution:

Example: Suppose the scores on a standardized test are normally distributed, with a mean of 100 and a standard deviation of 10 For this distribution: 68% of the scores will be between 90 and 110 (mean ± 1 (standard deviation) 95% of the scores will be between 80 and 120 (mean ± 2(standard deviation)) 99.7% of the scores will be between 70 and 130 (mean ± 3(standard deviation))

Box and whisker plot:

Exploratory data analysis tool that highlights important features of a data set. -Requires the five-number summary: 1. Minimum Entry 2. First Quartile 3. Median Q2 4. Third Quartile 5. Maximum entry ***See attached picture for HOW to make one.)***

True or false In a frequency​ distribution, the class width is the distance between the lower and upper limits of a class.

FLASE False. In a frequency​ distribution, the class width is the distance between the lower or upper limits of CONSECUTIVE classes.

Characteristics of Bell-shaped distribution (Standard deviation)

For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics: (also, see picture below) About 68% of the data lie within ONE standard deviation of the mean in either direction o To find the values within this section subtract the standard deviation from the mean and add the standard deviation to the mean About 95% of the data lie within TWO standard deviations of the mean in either direction o To find the value within this section subtract and add twice the standard deviation to the mean About 99.7% of the data lie within THREE standard deviations of the mean in either direction o To find the value within this section subtract and add three times the standard deviation to the mean

How does the Z-Score relate to the Empirical Rule:

From the Empirical Rule: Remember this applies only to data that is normally distributed About 68% of the data values have z-scores between -1 and 1 About 95% of the data values have z-scores between -2 and 2 About 99.7% of the data values have z-scores between -3 and 3 When a distribution is approximately bell-shaped, then based on the empirical rule about 95% of the data lie within 2 standard deviations of the mean When the distribution's values are transformed to z-scores, about 95% of the z-scores should fall between -2 and 2.

Z- Score (Standard Score):

Indicates how many standard deviations a data entry is from the mean - TO get a z-score you need three things: 1. Observed, actual data value of the random variable x 2. Population mean, μ Also known as expected outcome/value/center. 3. Population standard deviation, σ Then follow the formula: z=value-mean/standard deviation=x-μ/σ A z-score can be negative, positive, or zero If z is negative, the corresponding x-value is less than the mean If z is positive, the corresponding x-value is greater than the mean If z is zero, the corresponding x-value is equal to the mean A z-score equal to 1 represents an x-value that is 1 standard deviation greater than the mean A z -score equal to 2 represents an x-value that is 2 standard deviations greater than the mean A z -score equal to -1 represents an x-value that is 1 standard deviation less than the mean

How do I know which measure of central tendency to use? (MEAN)

Mean: use the mean to describe the middle of a set of data that does not have an outlier Advantages: • Takes into account every entry of a data set • Most popular in fields such as business, engineering and computer science • It is unique - there is only one answer • Useful when comparing sets of data Disadvantages: • Affected by extreme values (outliers) ***SAMPLE*** Example: Find the mean, median, and mode of the data, if possible. If any measure cannot be found or does not represent the center of the data explain why. Data: The Law School Admission Test scores for a sample of students accepted into law school. 174 172 169 176 169 170 175

How do I know which measure of central tendency to use? (MEDIAN)

Median: use the median to describe the middle of a set of data that does have an outlier Advantages: • Extreme values (outliers) do not affect the median as strongly as they do the mean • Useful when comparing sets of data • It is unique - there is only one answer Disadvantages • Not as popular as mean ***SAMPLE*** Example: Find the mean, median, and mode of the data, if possible. If any measure cannot be found or does not represent the center of the data explain why. Data: The Law School Admission Test scores for a sample of students accepted into law school. 174 172 169 176 169 170 175

How do I know which measure of central tendency to use? (MODE)

Mode: Use the mode when the data is non-numeric (qualitative) or when asked to choose the most popular item Advantages: • Extreme values (outliers) do not affect the median as strongly as they do the mean Disadvantages: • Not as popular as mean and median • Not necessarily unique - may be more than one answer • When no values repeat in the data set, the mode does not exist • When there is more than one mode, it is difficult to interpret and/or compare ***SAMPLE*** Example: Find the mean, median, and mode of the data, if possible. If any measure cannot be found or does not represent the center of the data explain why. Data: The Law School Admission Test scores for a sample of students accepted into law school. 174 172 169 176 169 170 175

Frequency (f)

Number of times and event or item occurs in a data set

Paired Data Sets:

Occurs when each data entry in one data set corresponds to one data entry in a second data set

Scatter Plot:

One way to graph paired data sets - The ordered pairs are graphed as points in a coordinate plane in a scatter plot. -Used to show the relationship between two quantitative variables.

Percentiles and other Fractiles:

Percentiles are often used in education and health-related fields to indicated how one individual compares with others in a group - They can also be used to identify unusually high or low values.

Pie Charts:

Provide a convenient way to present qualitative data graphically as a percent of a whole. - Circle that is divided into sectors that represent categories - The area of each sector is proportional to the frequency of each category.

Skewed Left Distribution):

Skewed Left Distribution (negatively skewed):The "tail" of the graph elongates more to the left. • The MEAN IS TO THE LEFT OF THE MEDIAN. Symmetric distribution Uniform distribution o The mean is less than the median

Mean of a Frequency Distribution:

Sometimes we are given a frequency distribution instead of the actual values. We can still come up with a good estimate of a typical value for the set of data, provided that we make some assumptions. • We assume that the values in each class are spread evenly throughout the group. If this is the case, then the mean for each class should be approximately equal to the midpoint for each class ***Formula attached***

Skewed Right Distribution (positively skewed):

The "tail" of the graph elongates more to the right. • The MEAN IS TO THE RIGHT OF THE MEDIAN. o The mean is greater than the median

Range:

The Difference between the maximum and minimum data entries in the set. The data must be quantitative. Range = (Max. data entry) - (Min. data entry) Advantage of range: easy to compute Disadvantage of range: uses only two entries from the data set

Median (Middle):

The value that lies in the middle of the data when the data set if ORDERED. Measures the center of an ordered data set by dividing it into two equal parts. Median can only be found if you have quantitative data. It can only be found if your data is a series of numbers. If you have qualitative data (where you have descriptions and the frequency they occur) then you cannot find the median. In cases like this you will state "MEDIAN IS NOT POSSIBLE" - DO NOT ROUND THE MEDIAN

Constructing a Pie Chart:

1. Convert each class frequency into a proportional part of the circle - degrees = ( f ) (360), where f = frequency and n = total number of data entries 2. Change the answer in step 1 to a percentage (move the decimal two places to the right) 3. Draw and label each section of the circle - In most cases, you will be interpreting a pie chart or constructing one using technology.

Symmetric Distributions:

A vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images. • In a symmetrical distribution the MEAN, MEDIAN, AND MODE ARE EQUAL. • Also called a bell-shaped curve or a normal distribution

Statistical Charts

A way of representing data in a study so that it can be better understood by those who benefit from reading the study. - There are three types: 1. Histogram 2. Frequency polygon 3. Ogive

Uniform Distribution (rectangular):

All entries or classes in the distribution have equal or approximately equal frequencies. Symmetric. • A uniform distribution is also symmetric. • In a uniform distribution the MEAN AND MEDIAN ARE EQUAL • There is NO MODE (because everyone occurs the same number of times)

How can you use the Z-Score to find outliers?

Another way to find outliers: z-scores that lie more than 2 standard deviations from the mean (z-score less than z = -2 or greater than z = 2) occur about 5% of the time and would be considered unusual z-scores that lie more than 3 standard deviations from the mean (z-score less than z = -3 or greater than z = 3) occur about 0.30% of the time and would be considered very unusual

Dot Plot

Another way to graph quantitative data. - Each data entry is plotted using a point, above a horizontal axis - Allows you to see how data is distributed, determine specific data entries and identify unusual data values.

Measure of central tendency:

A value that represents a typical, or central, entry of a data set -Most common measures of central tendency Mean Median Mode

Class

A specific grouping (interval) within the data

Variance:

Difference between the entry and the mean of the data set The symbol for Population variance is (σ = lowercase Greek letter sigma) The symbol for Sample variance is o Note: the 2 (squared) in the symbol does not mean that you need to square your answer. Instead the 2 is actually part of the "name" Two measures of variation that use all the entries from the data set are the variance and the standard deviation

Upper class limit

the largest value within the class - All class limits must be the same class width apart.

Class Boundries

the numbers that separate classes without forming gaps between them - Used because consecutive bars of a histogram must touch, bars must begin and end at class boundaries instead of class limits - If data entries are integers, subtract 0.5 from each lower limit to find the lower class boundaries. - To find the upper class boundaries, add 0.5 to each upper limit a. The upper boundary of a class will equal the lower boundary of the next higher class.

Lower class limit

the smallest value within the class - All class limits must be the same class width apart.

Coefficient of Variation:

used when the data sets have different units of measure or different means Describes the standard deviation as a percent of the mean o CV measures the variation of a data set relative to the mean of the data The notation for coefficient of variation is CV CV = standard deviation divided by the mean ***SEE ATTACHED PIC FOR FORMULAS AND EXAMPLE***

Population Mean:

• The lowercase Greek letter μ (pronounced mu) represents the population mean • N (capital) represents the number of entries in a population (population size) • x represents the data values (series of numbers) Recall: The uppercase Greek letter sigma Σ represents the summation of values

How to construct a Stem and Leaf Plot:

1. Put the scores in numerical order. (not required but it is helpful) 2. Make a vertical list of the stems in order from top to bottom. It is usually best to first try using the most significant digit(s) (i.e the left most digit) as a first approximation. - IF ANY STEMS DON'T HAVE A VALUE (Have no leaves) PUT IT DOWN ANYWAY BECAUSE IT WILL ALLOW US TO SEE GAPS IN OUR DATA. 3. Draw a vertical line to the right of the stem. 4. Write out the leaves horizontally next to the correct stem. The next lowest common place value is used to form the leaves. 5. Include a Key that will explain the plot. • Stem = Tens (this means the numbers in the stem represent the 10's place. So 0| represents one's, 3| represents 30's and 10| represents 100's) • Leaves = Ones (this means the numbers listed in the leaves represent the one's place. So 3|6 represents 36 or 13 | 1 = 131) • 1 | 3 = 13 This is a straight representation of what you are representing. It requires the user to think about what the stem and leaf represent. The number used in the key may or may not be in the stem-and-leaf plot. • If you are dealing with decimals you may express your key as any of the following: 1 | 8 = 1.8 or Stem = ones or Leaves = tenths • If the number of leaves in each stem is too large, divide the stems into two groups. Such a stem-and-Leaf plot will have two rows for each stem. o The first group corresponding to leaves beginning with 0 through 4 o The second group with leaves 5 through 9. • If you have too many stems, then your data would be too spread out with perhaps one or two data values on each stem - this problem is subjective because when you have lots of data, more stems may be advantageous o If this occurs you can compress data onto fewer stems by using significant digits at a higher level (i.e. 100s rather than 10s) and then splitting as needed • Data with more than two digits can rounded to two digits before plotting or can be truncated to two digits. o To truncate means to cut off. For a stem- and-leaf plot, you would truncate everything after the second digit. o Example: The number 355 would round to 36 o Example: The number 355 would truncate to 35

What are the advantages of a Stem and Leaf plot over a histogram?

1. The graph still contains the original data values. 2. Provides an easy way to sort data.

Why do we construct Frequency Distributions?

1. To organize and simplify the data so that it is possible to get a general overview of the results 2. To determine the nature or shape of the distribution 3. To compute measures of the central tendency, variation (spread) and position. 4. To make comparisons among different data sets 5. To draw charts and graphs for presentation.

Histogram Chart

A bar graph that represents the frequency distribution of a data set

Outliers:

A data entry that is far removed from the other entries in the data set - A data set can have one or more outliers, causing gaps in a distribution. - Conclusions that are drawn from a data set that contains outliers may be flawed - Usually, the presence of an outlier indicates some sort of problem. This can be a case which does not fit the model under study, or an error in measurement

Stem and leaf plot

A diagram that quickly summarizes data while maintaining the individual data points. - Way to display quantitative date - Each number is separated into a stem and leaf - Should have as many leaves as there are entries in the original data set - If a stem and leaf plot is turned vertically it would represent a histogram. - ALWAYS INCLUDE A KEY. Example in attached picture.

Skewed (Distribution):

A frequency distribution is skewed if the "tail" of the graph elongates more to one side than to the other. (Please note where the mean and median are in relation to each other.) • asymmetric distribution of the data values.

Cumulative frequency graph or ogive

A line graph that displays the cumulative frequency of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis, and the cumulative frequencies are marked on the vertical axis

Frequency Distribution

A table presenting statistical data by putting together the values of a characteristic along with the number of times each value appears in the data set. -Consists of at least two columns

Deviation:

Difference between the entry and the mean of the data set. The sum of the deviations is zero This is calculated the same whether you have a population or a sample Please note: the deviation is not the same as the standard deviation (discussed below) Two measures of variation that use all the entries from the data set are the variance and the standard deviation

Time Series Graph:

Displays data that occur over a specific period of time. - Data set that is composed of quantitative entries taken at regular intervals over a period of time.

Mode (Most):

The data entry (item or number) that occurs with the greatest frequency (occurs the most often) • A data set can have one mode, more than one mode, or no mode. • If no entry is repeated (everyone is the same) the data set has no mode. - DO NOT STATE NO MODE AS A MODE = 0 (ZERO) SINCE ZERO IS A POSSIBLE ANSWER FOR THE MODE. Instead always make sure to state either no mode or mode = none. • If two entries occur with the same greatest frequency, each entry is a mode (bimodal). • Note: mode can exist for qualitative or quantitative data. - If the data is qualitative you will look for the description with the greatest frequency. The mode will be the description, not the frequency. If the data is quantitative you will look for the number (or numbers) that occur the most often. The mode will be the number (s) that occur the most often. ***SEE SAMPLE PROBLEM*** - Mean = not possible - Median = not possible (Remember these require quantitative data) - Mode = "Money Needed" ⇒ This is the description with the greatest frequency / most responses.

Class width

The distance between lower (or upper) limits of consecutive classes - Class width functions in an up or down vertical direction. When you are adding the class width to a lower class limit you write the sum below the first number. - All class limits must be the same class width apart.

Steps for finding the median:

The first step of this problem is to put the numbers in numerical order. If the data set has an: - odd number of entries: median is the middle data entry. ▪ Example: 1 2 3 4 5 Median = 3 (middle entry) - even number of entries: median is the mean (average) of the two middle data entries. ▪ Example: 1 2 3 4 5 6 Median = (3+4) 2 = 3.5 - DO NOT ROUND THE MEDIAN

Weighted Mean:

The mean of a data set whose entries have varying weights (importance). - This is a very important concept. Your grade in this class and many of your college classes is calculated using a weighted mean. Formula attached

What is the Relative Frequency of a class?

The portion or percentage of the data that falls in that class. - To find the relative frequency of a class, divid the frequency f by the sample size n - The Relative Frequency can be written as a fraction, decimal, or percent. - The sum of the relative frequencies of all the classes must equal 1, or %100 - All relative frequencies answers must be decimals and will be between zero and one, inclusive

Mean (average):

The sum of all data entries provided. - Mean can only be found if you have quantitative data. It can only be found if your data is a series of numbers. - If you have qualitative data (where you descriptions and the frequency they occur) then you cannot find the mean. IN CASES LIKE THIS YOU WILL STATE "MEAN IS NOT POSSIBLE"

Cumulative Frequency (of a class)

The sum of the frequency for that class and all previous classes. - Determines the number of observations that lie below a particular value - It is calculated by adding each frequency to the sum of its predecessor in a frequency distribution table. - THE CUMULATIVE FREQUENCY OF THE LAST CLASS IS EQUAL TO SAMPLE SIZE n - There must be as many CFs as there are classes

How is the sum of frequency denoted?

The sum of the frequency is denoted by ∑ f. ▪ Where ∑ is the uppercase Greek letter sigma and means sum of ▪ f is the frequency o Please remember that the sum of the frequencies must equal the sample size. In symbols this is represented by ∑ f = 𝑛. ▪ Where n is the sample size.

Pareto Chart:

Used to represent a frequency distribution for qualitative analysis. - Vertical bar graph in which height of each bar represents frequency or relative frequency - The bars are positioned in ORDER OF DECREASING HEIGHT, with the tallest bar positioned at the left (First) a. Such positioning helps highlight important data and used frequently in business.

Chebyshev's Theorem:

applies to any distribution, regardless of shape and it places lower limits on the percentages of observations within a given number of standard deviations The theorem states that: At least of the elements of any distribution lie within k standard deviations of the mean (where k = a number greater than 1) Another way of explaining the theorem is that it shows that for any number of k greater than 1.0, the probability that a value of a given random variable will be within k standard deviation of the mean is at least o At least three quarter (75%) of the observations in a set will lie within two standard deviations of the mean o At least eight-ninths (88.9%) of the observations in a set will lie within three standard deviations of the mean o The table shows the percentages that would occur if you calculate from 2 standard deviations to 6 standard deviations.

Standard deviation:

gives an idea of how close the entire set of data is to the mean Measures the spread or dispersion around the mean of a data set Standard deviation has the same units of measure as the data set Standard deviation is always greater than or equal to zero. o When the standard deviation equals zero, the data set has no variation and all entries have the same value (which is equal to the mean). The more the entries are spread out, the greater the standard deviation. The symbol for Population standard deviation is σ The symbol for Sample standard deviation is s Standard deviation is the square root of the variance o It is the most widely-used measure of spread A small standard deviation (relative to the mean) indicates that the majority of data values tend to have values that are very close to the mean. o In these cases data may look clustered around the mean with only a few values farther away from the mean A large standard deviation (relative to the mean) tends to have cases that are more widely spread-out from the mean deviation is not the same as the standard deviation

Frequency Polygon

graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connect by straight lines - Line graph that emphasizes the continuous change in frequencies - Another way to graph a frequency distribution

The Shape of Distributions:

• Histograms are valuable and useful tools. If the raw data came from a random sample of population values, the histogram constructed from your sample values should have a distribution shape that is reasonable similar to that of the population • A graph reveals several characteristics of a frequency distribution. One such characteristic is the shape of the distribution


Conjuntos de estudio relacionados

Fundamentals of Networking Technologies Ch 5

View Set

Physical Science: Extension B (PEF)

View Set

2.3 Direct Participation Programs (DPP)

View Set

Maternity & Pediatric Nursing Ch 6

View Set

Real Estate Contracts & Practice - Earnest Money Agreements

View Set