BUS226 - CHAPTERS 1-5

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Empirical Rule

- how this fits with the standard bell shaped curve. The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve

Levels of Measurement (Four)

-Nominal, ordinal, interval, and ratio •The level of measurement determines the type of statistical analysis that can be performed •Nominal is the lowest level of measurement

Example: Population Mean (2 of 2)

1.Why is this information a population? This is a population because we are considering all of the exits in Kentucky. 2.What is the mean number of miles between exits?

Box Plots (1 of 2)

A box plot is a graphical display using quartiles •A box plot is based on five statistics: -Minimum value -1st quartile -Median -3rd quartile -Maximum value Begin by drawing a number line that will accommodate the minimum and maximum values. Then draw a vertical line above the median, Q1, and Q3. Enclose the regions between the first and third quartile creating the box. Next, draw dotted line segments from the third quartile to the largest value and from the first quartile to the smallest value. This shows the largest 25% and smallest 25% respectively. The interquartile range is the middle 50% of the data.

Contingency Tables (2 of 2)

A contingency table is a cross-tabulation that simultaneously summarizes two variables of interest and their relationship. The level of measurement can be nominal.

General Rule of Multiplication Example (1 of 2)

A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts are white and the others are blue. He gets dressed in the dark , so he just grabs a shirt and puts in on. He plays golf two days in a row and does not return the shirts to the closet. What is the probability both shirts are white? So the likelihood of selecting two shirts and finding them both to be white is .55. This can be extended to more than two events.

General Rule of Addition Example (1 of 2)

A sample of 200 tourists in Florida shows 120 went to Disney, 100 went to Busch Gardens, and 60 visited both. P(Disney)=120/200=.60 P(Busch)=100/200=.50 P(Disney and Busch)=60/200=.30 * So that we do not double count, subtract the probability of those tourists that went to both attractions.

Special Rule of Multiplication (2 of 2)

A survey by the American Automobile Association (AAA) revealed 60% of its members made airline reservations last year. Two members are selected at random. What is the probability both made airline reservations last year?

Graphic Presentation of Qualitative Data (Bar Chart)

BAR CHART A graph that shows the qualitative classes on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars. *Use a bar chart when you wish to compare the number of observations for each class of a qualitative variable.

Classical Probability (2 of 2)

COLLECTIVELY EXHAUSTIVE At least one of the events must occur when an experiment is conducted. There are 3 definitions of probability. The classical approach to probability is often applied to games of chance like playing cards and rolling dice. It can also be applied to lotteries since the total number of outcomes is known before the experiment. If the set of events are collectively exhaustive and mutually exclusive the sum of their probabilities is 1.

Contingency Tables (1 of 2)

CONTINGENCY TABLE A table used to classify sample observations according to two or more identifiable categories or classes. One hundred fifty adults were asked their gender and the number of Facebook accounts they used. The following table summarizes the results.

In a frequency polygon the points are plotted at the intersection of the class frequencies and the: class midpoints lower limits of the classes upper limits of the classes

Correct Answer class midpoints

Probability (2 of 3)

EXPERIMENT A process that leads to the occurrence of one and only one of several possible results. OUTCOME A particular result of an experiment. EVENT A collection of one or more outcomes of an experiment. *Three key terms used in the study of probability are experiment, outcome, and event. An experiment is the observation of some activity or the act of taking some measurement, like tossing a coin or asking a question. In this example, the experiment is rolling a die. There are six possible outcomes and numerous possible events.

DESCRIPTIVE STATISTICS

Methods of organizing, summarizing, and presenting data in an informative way.

Measures of Position Example (1 of 2)

Morgan Stanley is an investment company with offices located throughout the United States. Listed below are the commissions earned last month by a sample of 15 brokers.

Bayes' Theorem Example (2 of 3)

P(A1) = .05 Individual has the disease P(A2) = .95 Individual does not have the disease P(B|A1) = .90 Test shows the individual has the disease and is correct P(B|A2) = .15 Test incorrectly shows the individual has the disease

General Rule of Addition Example (2 of 2)

P(Disney or Busch) = P(Disney) + P(Busch) - P(Disney and Busch) = .60 + .50 - .30 = .80

Probability (1 of 3)

PROBABILITY A value between 0 and 1 inclusive that represents the likelihood a particular event happens.

Two basic types of variables

QUALITATIVE VARIABLE An object or individual is observed and recorded as a non-numeric characteristic or attribute. Examples: gender, state of birth, eye color QUANTITATIVE VARIABLE A variable that is reported numerically. Examples: balance in your checking account, the life of a car battery, the number of people employed by a company

Explain the difference between qualitative and quantitative variables. Give an example of qualitative and quantitative variables.

Qualitative data are not numerical, whereas Quantitative data are numerical.

Subjective Probability

SUBJECTIVE CONCEPT OF PROBABILTIY The likelihood (probability) of a particular event happening that is assigned by an individual based on whatever information is available. •Examples of subjective probability are •Estimating the likelihood the New England Patriots will be in the Super Bowl next year •Estimating the likelihood the U.S. budget deficit will be reduced by half in the next 10 years

Dot Plots Example (2 of 3)

Sheffield Motors Inc.

Skewness Example (2 of 3) STEP 1

Step 1: Compute the Mean

Skewness Example (2 of 3) STEP 2

Step 2: Compute the Standard Deviation

Common Shapes of Data

Symmetric Positively Skewed Negatively Skewed Bimodal There are four common shapes of data as we see here. Symmetric when mean = median. Positively skewed or skewed to the right when mean > median. Negatively skewed or left skewed when the data values extend further to the left and the mean < median. A bimodal shape has two or more peaks and may indicate two or more populations have been combined.

Tree Diagrams (2 of 2)

Table 5-1 summarizes data from a survey conducted by the National Association of Theater Managers. 500 randomly selected adults were asked their age and the number of times they saw a movie in a theater. The survey results are organized in a contingency table by age and by the number of movies attended per month. See the next slide for an illustration of how to use this data in a tree diagram to calculate probabilities.

Box Plot Example (4 of 4)

The box plot reveals the data is positively skewed since the dashed line to the right of the box (from 22 minutes to 30 minutes) is longer than the dashed line to the left of the box (from 15 minutes to 13 minutes) and since the median is not in the center of the box.

Standard Deviation (1 of 2)

The formula for sample standard deviation is simply the square root of the variance and the symbol, x bar represents sample standard deviation. Likewise, the population standard deviation (not shown) is the square root of the population variance and uses the Greek symbol sigma. The standard deviation is the most widely reported measure of dispersion.

Scatter Diagrams

To graph a scatter diagram scale one of the variables, the independent variable, on the horizontal axis and the dependent variable on the vertical axis. In the first graph, there is a positive relationship between the age of the buses and their maintenance cost, as buses increases in age, the maintenance cost increases. The middle graph displays a negative relationship between the odometer reading of a vehicle and its auction price, that is, as the odometer reading rises, the auction price falls. The graph on the right shows there is little relationship between the height of shift supervisors and their annual salary.

Example: Sample Mean

Verizon is studying the number of minutes used by clients in a particular cell phone rate plan. A random sample of 12 clients showed the following number of minutes used last month. 90 77 94 89 119 112 91 110 92 100 113 83 What is the arithmetic mean number of minutes used?

Relative Class Frequencies

You can convert class frequencies to relative class frequencies to show the fraction of the total number of observations in each class. - A relative frequency captures the relationship between a class frequency and the total number of observations.

Two numerical ways of describing quantitative variables, namely,

measures of location and measures of dispersion.

Sample Variance

s squared

The Sample Mean (X bar)

sum of the observations divided by the total number of observations •A measurable characteristic of a sample is a statistic STATISTIC: A characteristic of a sample.

bell-shaped distribution

the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle

Population Mean

the sum of the values in the population divided by the population size, •A measurable characteristic of a population is a parameter PARAMETER: A characteristic of a population. For raw data—that is, data that have not been grouped in a frequency distribution—the population mean is the sum of all the values in the population divided by the number of values in the population. To find the population mean, we use the following formula. Population Mean = sum of the values in the population /population size

Tree Diagram Example

• New Tree Diagram Chart 5-2 goes here This tree diagram summarizes all the probabilities based on the data in table 5-1. This information is useful for making decisions regarding discounts on tickets and concessions.

Contingency Table Example (2 of 2)

•90 of the 180 cars sold had a profit above the median and half below. This meets the definition of median. •The percentage of profits above the median are Kane 48%, Olean 50%, Sheffield 42%, and Tionesta 60%.

Contingency Tables

•A contingency table is used to classify nominal scale observations according to two characteristics CONTINGENCY TABLE A table used to classify observations according to two identifiable characteristics. •It is a cross-tabulation that simultaneously summarizes two variables of interest Both variables need only be nominal or ordinal

Describing the Relationship Between Two Variables

•A scatter diagram is a graphical tool to portray the relationship between two variables or bivariate data •Both variables are measured with interval or ratio level scale •If the scatter of points moves from the lower left to the upper right, the variables under consideration are directly or positively related •If the scatter of points moves from the upper left to the lower right, the variables are inversely or negatively related ** When studying the relationship between two variables the data is referred to as bivariate. When studying just one variable, the data is univariate.

Box Plot Example (3 of 4)

•Begin by drawing a number line using an appropriate scale •Next, draw a box that begins at Q1 (15 minutes) and ends at Q3 (22 minutes) •Draw a vertical line at the median (18 minutes) •Extend a horizontal line out from Q3 to the maximum value (30 minutes) and out from Q1 to the minimum value (13 minutes)

3 Reasons for studying statistics

•Data are collected everywhere and require statistical knowledge to make the information useful •Statistical techniques are used to make professional and personal decisions •A knowledge of statistics is needed to understand the world and be conversant in your career

Measures of Position (2 of 2)

•Deciles divide a set of observations into 10 equal parts •Percentiles divide a set of observations into 100 equal parts

QUANTITATIVE VARIABLES (discrete or continuous)

•Discrete variables are typically the result of counting -Values have "gaps" between the values -Examples: the number of bedrooms in a house, the number of students in a statistics course •Continuous variables are usually the result of measuring something -Can assume any value within a specific range -Examples: the air pressure in a tire, duration of flights from Orlando to San Diego

Measures of Position Example (1 of 2) cont.

•First, sort the data from smallest to largest

Measures of Position (1 of 2)

•Measures of location also describes the shape of the distribution and can be expressed as percentiles •Quartiles divide a set of observations into four equal parts -The interquartile range is the difference between the third quartile and the first quartile

Frequency Distributions - STEP 1 (cont.)

•Minimum: 294 •Maximum: 3,292

Dot Plots Example (3 of 3)

•Minitab provides dot plots and summary statistics These dot plots of data from two dealerships owned by the Applewood Auto Group show the difference in location and dispersion of the observations. We can clearly see the number of vehicles serviced at the Sheffield dealership is more widely dispersed and has a larger mean than at the Tionesta dealership. Notice the identities of the individual values has not been lost.

Measures of Position Example (2 of 2)

•Next, find the median •L50 = (15+1)*50/100 = 8 •So the median is $2,038, the value at position 8 $1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038 $2,047 $2,054 $2,097 $2,205 $2,287 $2,311 $2,406 The median is the same as the 50th percentile. The first quartile, Q1, L25 = (15+1)*25/100 = 4 so Q1=$1,721, the 4th value. The third quartile, Q3 is the 75th percentile, L75 = (15+1)*75/100 = 12, so Q3 is $2,205, the 12th value.

Ethics and Statistics

•Practice statistics with integrity and honesty when collecting, organizing, summarizing, analyzing, and interpreting numerical information •Maintain an independent and principled point of view when analyzing and reporting finding and results. •Question reports that are based on data that -do not fairly represent the population -does not include all relevant statistics -introduces bias in an attempt to mislead or misrepresent

Frequency Distributions - STEP 3

•Step 3 Set the individual class limits -Lower limits should be rounded to an easy to read number when possible. Make sure to use the words "up to" in each class in order to construct mutually exclusive classes; this will ensure each value in the data set fits in one and only one class. Also, check to see that the minimum value will go in the first class and the maximum value will go in the last class when setting class limits. Here the minimum value of $294 goes in the first class and the maximum value of $3,292 will go in the last class. This ensures you have exhaustive classes. Class midpoints can represent a typical value for each class and can be determined by adding the lower limits of consecutive classes and dividing by 2.

Frequency Distributions - STEP 4

•Step 4 Tally the individual data into the classes and determine the number of observations in each class -The number of observations is the class frequency There is some loss of detail when data is grouped but the frequency distribution results in an understandable and organized form. Equal class widths are preferred but might not always be possible.

Classical Probability (1 of 2)

•The classical definition of probability applies when there are n equally likely outcomes to an experiment MUTUALLY EXCLUSIVE The occurrence of one event means that none of the other events can occur at the same time.

Complement Rule P(A^C)

•The complement rule is used to determine the probability of an event happening by subtracting the probability of an event not happening COMPLEMENT RULE P(A) = 1 - P(~A) •You can also use the complement rule P(A or C) = P(~B) = 1 - P(B) = 1 - .900 = .10 This rule is useful because sometimes it is easier to calculate the probability of an event happening by determining the probability of it not happening and subtracting the result from 1. The events need to be mutually exclusive and exhaustive.

General Rule of Multiplication (2 of 2)

•The conditional probability is represented a P(B|A) and is read, the probability of B given A

Why Study Dispersion?

•The dispersion is the variation or spread in a set of data •The range is the difference between the maximum and minimum values in a set of data •The formula for range is RANGE: Range = Maximum value - Minimum value [3-6] •The major characteristics of the range are -Only two values are used in its calculation -It is influenced by extreme values -It is easy to compute and to understand A measure of location only describes the center of the data; it does not tell us anything about the spread of the data, we may need to know something about the variation in the data. Measures of dispersion also allows us to compare two or more distributions. Small measures of dispersion indicates the data are closely clustered around the mean and therefore the mean is representative of the data; large measures of dispersion indicates that the mean may not be representative of the data. We'll learn how to compute the range, the variance, and the standard deviation as well as the major characteristics of each.

Geometric Mean (Rate of Increase over time)

•The geometric mean is also used to find the rate of change from one period to another

Multiplication Formula (1 of 3)

•The multiplication formula states that if there are n ways of doing one thing, and m ways of doing another thing, then there are m*n ways of doing both MULTIPLICATION FORMULA Total number of arrangements = (m)(n) [5-8] There are three counting formulas that are useful in determining the number of outcomes in an experiment. This is the multiplication formula that is used to find the total number of arrangements for two or more groups.

ORDINAL LEVEL OF MEASUREMENT

•The rankings are known but not the magnitude of differences between groups •Data recorded at the ordinal level of measurement is based on a relative ranking or rating of items based on a defined attribute or qualitative variable. Variables based on this level of measurement are only ranked and counted. Examples: the list of top ten states for best business climate, student ratings of professors.

Box Plot Example (2 of 4)

•Using a sample of 20 deliveries, Alexander determined the following: -Minimum value = 13 minutes -Q1 = 15 minutes -Median = 18 minutes -Q3 = 22 minutes -Maximum value = 30 minutes •Develop a box plot for delivery times

Rules of Addition Example (2 of 2)

•What is the probability that a particular package will be either underweight or overweight? P(A or C) = P(A) + P(C) = .025 + .075 = .10

Skewness Example (2 of 3) STEP 3

Step 3: Find the Median The middle value in the set of data, arranged from smallest to largest is 3.18

Skewness Example (2 of 3) STEP 4

Step 4: Compute the Skewness •What do you conclude about the shape of the distribution?

The Permutation Formula (2 of 2)

There are three electronic parts to be assembled, so n=3. Because all three are to be inserted into the plug-in component, r=3. Label the parts A, B, and C ABC BAC CAB ACB BCA CBA * Use the permutation formula when the order of the objects is important and to find the number of r objects selected from a group of n objects. Remember by definition, 0! = 1. Excel has a formula that will calculate permutations.

Interpretations and Uses of the Standard Deviation (4 of 4)

CHART 3-7 A Symmetrical, Bell-Shaped Curve Showing the Relationships between the Standard Deviation and the Percentage of Observations If we have a symmetrical distribution we can use the Empirical Rule, sometimes called the Normal Rule, which allows us to be more precise than with Chebyshev's Theorem. Here is a symmetrical distribution with a mean of 100 and a standard deviation of 10. Applying the Empirical Rule, we'll find about 68% of the values between 90 and 110, about 95% of the values between 80 and 120, and about 99.7% of the values between 70 and 130.

Which one of the following is not a characteristic of a frequency distribution? It summarizes qualitative data. It uses mutually exclusive classes. It uses collectively exhaustive classes. It displays the number of observations in each class.

Correct Answer It summarizes qualitative data.

Relative Frequency Distributions (Table 2-7)

The relative class frequencies shows each class frequency relative to the entire distribution and adds up to 1.000.

INFERENTIAL STATISTICS

The methods used to estimate a property of a population on the basis of a sample.

Which of the following operations is true regarding relative frequency distributions? a) The relative frequency is found by dividing the class frequencies by the total number of observations. b) No two classes can have the same relative frequency. c) The sum of the relative frequencies is equal to the number of observations. d) The sum of the relative frequencies must be less than 1.

The relative frequency is found by dividing the class frequencies by the total number of observations.

Statistics

The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

Cumulative Frequency Distributions

To construct a cumulative frequency distribution, add each frequency to the frequencies before it. This shows how many values have accumulated as you move from one class down to the next class.

Cumulative Frequency Polygon

To plot a cumulative frequency distribution, scale the upper limit of each class along the X-axis and the corresponding cumulative frequencies along the Y-axis. Label the vertical axis on the right in terms of cumulative relative frequencies. CHART 2-7 Cumulative Frequency Polygon for Profit on Vehicles Sold Last Month at Applewood Auto Group

Raw Data Calculation:

X bar is the symbol for the sample mean. The symbol for "the sum of" is the Greek symbol sigma. To calculate the sample mean for raw data you first take the sum of all the raw scores in the sample and divide them the by the total number of raw scores in the sample. The formula reads: X bar equals the sum of X (or the sum of all the raw scores in the sample) divided by n (the sample size or number of raw data scores in the sample). Raw Data: 2 5 8 4 5 9 12 3 5 To calculate the mean of the list of raw data above you start first by summing all of the numbers: The sum of the numbers is: 53 Then we divide by the total number of the raw data points. This is also called "n" for sample size. n = 9, so 53/9 = 5.88888 or 5.89 rounded

Exits along interstate highways were formerly numbered successively from the western or southern border of a state. However, the Department of Transportation changed most of them to agree with the numbers on the mile markers along the highway. a) What level of measurement were data on the consecutive exit numbers? b) What level of measurement are data on the milepost markers? c) The newer system provided information on the distance between exits. (T/F)

a) What level of measurement were data on the consecutive exit numbers? [Ordinal] b) What level of measurement are data on the milepost markers? [Ratio] c) The newer system provided information on the distance between exits. [True]

Characteristics of the Median

•The median is the value in the middle of a set of ordered data •At least the ordinal scale of measurement is required •It is not influenced by extreme values •Fifty percent of the observations are larger than the median •It is unique to a set of data

Which of the following are characteristics of frequency distributions? a) Organize raw data b) It provides the tally for each class. c) Use classes and frequencies to organize data d) It shows all the observations in the data.

Correct Answer Organize raw data It provides the tally for each class. Use classes and frequencies to organize data

Which of the following practices are commonly used in setting class limits for a frequency distribution? a) Deleting data which is too low or too high to fit convenient intervals. b) Placing "excess" interval width equally in the two tails of the distribution. c) Overlapping the upper limit with the lower limit of the next higher class. d) Rounding the class size up.

Correct Answer Placing "excess" interval width equally in the two tails of the distribution. Rounding the class size up.

Which of the following features is not part of a histogram? The frequency of occurrence of data within classes. Quantitative data divided into classes. The frequency of occurrence of a nominal variable. Adjacent bars whose height represents a number or fraction.

Correct Answer The frequency of occurrence of a nominal variable.

A business statistics instructor teaches a class with 83 students. Suppose he would like to create a frequency distribution to summarize their 83 final exam scores. Using the "2 to the k rule," how many classes should used? 2 7 6 1 83

Correct Answer 7 Need help? Review these concept resources.Read About the Concept Feedback

Place the following steps used in constructing a frequency distribution into correct order. Decide on the number of classes. Determine class interval. Set individual class limits. Tally the number of observations in each class.

Correct Answer Decide on the number of classes. Determine class interval. Set individual class limits. Tally the number of observations in each class.

In the cumulative frequency polygon shown, Chart 2-7, about how many observations are there between a value of 200 and 250? 100 20 50 250. 5

Correct Answer 50 Reason: You must subtract the observations for 200 from the observations for

Which of the following is not a useful practice in setting individual class limits for a frequency distribution? Round the class interval up to get a convenient class size. Set clear limits so an observation will fit only one class. Excluding outliers that cause the interval to be too wide. Place excess interval space equally in the two tails of the distribution.

Correct Answer Excluding outliers that cause the interval to be too wide.

Relative Positions of Mean, Median, and Mode (1 of 3)

Chart 3-2 A Symmetric Distribution Refer to the first chart on the left. It is a symmetric distribution with zero skewness where the mean=median=mode.

NOMINAL LEVEL OF MEASUREMENT (lowest level)

Data recorded at the nominal level of measurement is represented as labels or names. They have no order. They can only be classified and counted. Examples: classifying M&M candies by color, identifying students at a football game by gender

A frequency distribution table shows the number of observations for each class interval of data. How is this data plotted as a frequency polygon? a) Frequency is plotted on the vertical axis and the class interval is plotted on the horizontal axis. b) Frequency is plotted on the vertical axis and the class midpoint is plotted on the horizontal axis. c) Frequency is plotted on the horizontal axis and the class interval is plotted on the vertical axis. d) The cumulative proportion of observations for each class is plotted on the vertical axis and the class upper limit is plotted on the horizontal axis.

Correct Answer Frequency is plotted on the vertical axis and the class midpoint is plotted on the horizontal axis.

FREQUENCY TABLE

A grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class. •Mutually exclusive means the data fit in just one class •Collectively exhaustive means there is a class for each value

Types of Variables Summary

CHART 1-2 Summary of the Types of Variables •Types of Variables -Qualitative - Brand of PC - Marital status - Hair color -Quantitative - Discrete - Children in a family - Stokes on a golf hole - TV sets owned - Continuous - Amount of income tax paid - Weight of a student - Yearly rainfall in Tampa, FL

Interpretations and Uses of the Standard Deviation (1 of 4)

CHEBYSHEV'S THEOREM For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least I - I/k2, where k is any value greater than I. We can use the standard deviation to describe a distribution. The Russian mathematician, P. L. Chebyshev developed a theorem that allows us to determine the minimum proportion of values that lie within a specified number of standard deviations of the mean. For example, consider 2 standard deviations, we find that 1 - 1/22 = .75 so a minimum of 75% of values lie within the mean. This theorem can be used regardless of the shape of the distribution.

Relative Positions of Mean, Median, and Mode (2 of 3)

Chart 3-3 A Positively Skewed Distribution The middle chart is positively skewed because the mode<median<mean and it has a tail on the right.

Levels of Measurement Summary

Levels of Measurement -Nominal - Data may only be classified - Jersey numbers of football players - Make of car -Ordinal - Data are ranked - Your rank in class - Team standings in the Southeastern Conference -Interval - Meaningful difference between values - Temperature - Dress size -Ratio - Meaningful 0 point and ratio between values - Number of patients seen - Number of sales calls made - Distance to class

Stem-and-Leaf Displays (2 of 2)

STEM-AND-LEAF DISPLAY A statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

Constructing Frequency Tables

TABLE 2-1 Frequency Table for Vehicles Sold Last Month at Applewood Auto Group by Location •To construct a frequency table -First sort the data into classes -Count the number in each class and report as the class frequency •Convert each frequency to a relative frequency -Each of the class frequencies is divided by the total number of observations -Shows the fraction of the total number observations in each class

Interpretations and Uses of the Standard Deviation (3 of 4)

THE EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus and minus one standard deviation of the mean, about 95% of the observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will lie within 3 standard deviations of the mean.

Interpretations and Uses of the Standard Deviation (2 of 4)

The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company's profit-sharing plan is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean? About 92%, found by

Example: Sample Variance (1 of 2)

The hourly wages for a sample of 5 part-time employees at Home Depot are: $12, $20, $16, $18, and $19. The sample mean is $17. What is the sample variance? The sample variance is used to estimate the population variance. Notice the change in the denominator. Rather than use n, we use n - 1 so that we do not underestimate the population variance.

Slate is a daily magazine on the Web. Its business activities can be described by a number of variables. What is the level of measurement for each of the following variables?

The number of hits on their website on Saturday between 8:00 a.m. and 9:00 a.m. [Ratio] The departments, such as food and drink, politics, foreign policy, sports, etc. [Nominal] The number of weekly hits on the Sam's Club ad. [Ratio] The number of years each employee has been employed with Slate. [Ratio]

Sample Mean of Grouped Data

This formula is used to find an estimated mean of data in a frequency distribution; once the data has been grouped in classes, individual values are no longer available. Letting the midpoint of each class represent the values in each group, multiply the midpoint by the class frequency for each group, sum these products, and then divide by the total number of frequencies. See table 3-1 in the text for more detail.

Relative Positions of Mean, Median, and Mode (3 of 3)

Chart 3-4 A Negatively Skewed Distribution In the last chart, the mean<median<mode and is negatively skewed with a tail to the left. If a distribution is highly skewed, the mean is probably not a representative measure of central tendency and the median or mode should be used.

Measures of Location • a measure of location is a value used to describe the central tendency of a set of data

Common Measures of Location: mean, median, mode Measures of location are often referred to as averages. The purpose of a measure of location is to pinpoint the center of a distribution of data. An average is a measure of location that shows the central value of the data. Averages appear daily on TV, on various websites, in the newspaper, and in other journals. Here are some examples: 1) The average U.S. home changes ownership every 11.8 years. 2) An American receives an average of 568 pieces of mail per year. 3) The average American home has more TV sets than people. There are 2.73 TV sets and 2.55 people in the typical home. 4) A marriage ceremony in the U.S. costs an average of $25,764. This does not include the cost of a honeymoon or engagement ring. 5) The average price of a theater ticket in the United States is $9.27, according to the National Association of Theater Owners. ** •The arithmetic mean is the most widely reported measure of location

In the cumulative frequency polygon shown, Chart 2-7, about how many observations are there between a value of 100 and 150? 5 50 25 10

Correct Answer 25 Reason: To get this amount, you must subtract the cumulative frequency for 100 from that for 150.

Given below are the data for blood types: A B B AB O O O B AB B A B 0 O O A O A A 0 A B B 0 AB Which is the frequency for blood type AB? 9 3 6 7

Correct Answer 3

Which of the following is the best definition of "class midpoint"? Halfway between the lower or upper limits of two consecutive classes. Halfway between the highest and lowest classes. The average value of the observations in a class interval

Correct Answer Halfway between the lower or upper limits of two consecutive classes.

Which of the following is an advantage of a frequency polygon over a histogram? A histogram can compare two or more distributions It allows comparing directly two or more frequency distributions. It depicts each class as a rectangle, with the height representing the number of observations.

Correct Answer It allows comparing directly two or more frequency distributions.

Which of the following is an advantage of a cumulative frequency polygon over a histogram or frequency polygon? It can show the total number of observations less than a particular class' upper limit. It can show the number of observations in a given class. It shows class midpoints as points on points on a polygon,

Correct Answer It can show the total number of observations less than a particular class' upper limit.

To divide data with a high value of H and a low value of L into k classes, the class interval must be: a) at most (H-L)/k b) one fifth of the range c) at least (H-L)/k d) equal to (H-L)/k

Correct Answer at least (H-L)/k

The value shown on the vertical axis of a cumulative frequency polygon for a particular class is found by: dividing the class frequency by the total number of observations counting the number of observations less than the upper limit of the class counting the number of observations within the class

Correct Answer counting the number of observations less than the upper limit of the class

A useful way to determine the number of classes (k) in a frequency distribution of n items n.is the "2 to the k rule". Which of the following correctly describes this rule? a) k is the value for which 2k=n. b) k is the largest number such that 2k>n. c) k is the smallest number such that 2k >n.

Correct Answer k is the smallest number such that 2k >n

A relative frequency distribution shows: a) the fraction or percentage of observations in each class interval b) the number of observations of a particular value in a set of data c) the number of observations in each class interval

Correct Answer the fraction or percentage of observations in each class interval

Two types of statistics

Descriptive and Inferential DESCRIPTIVE: •Descriptive statistics can be used to organize data into a meaningful form •You can summarize data and provide information that is easy to understand INFERENTIAL: •Inferential statistics can be used to estimate properties of a population •You can make decisions based on a limited set of data

Graphic Presentation of a Frequency Distribution

HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars, and the bars are drawn adjacent to each other. •A histogram shows the shape of a distribution. •Each class is depicted as a rectangle, with the height of the bar representing the number in each class. Histograms allow us to get a quick picture of the main characteristics of the data. They are similar to bar charts but since this is continuous data there are no gaps between the bars, they are drawn adjacent to one another. CHART 2-4 Histogram of the Profit on 180 Vehicles Sold at the Applewood Auto Group

The Mode

MODE The value of the observation that occurs most frequently. CHART 3-1 Number of Respondents Favoring Various Bath Oils •Major Characteristics of the mode: -The mode can be found for nominal level data -A set of data can have more than one mode The mode is useful for summarizing nominal level data. Here a company has developed five bath oils and has conducted a marketing survey to find which bath oil consumers prefer. We see that most of the survey respondents favored Lamoure since it is the highest bar. So Lamoure is the mode. Mode can be determined for all levels of measurement and it is not affected by extreme values. A disadvantage of using the mode is that a data set may not have a mode or that it has more than one mode.

Population Mean calculation for raw data

Mu is the symbol for the population mean. To calculate the population mean for raw data you first take the sum of all the raw scores in the population and divide them by the total number of raw scores in the population. The formula reads: Mu equals the sum of X (or the sum of all the raw scores in the population) divided by N (the population size or number of raw data scores in the population).

Stem-and-Leaf Display Example

New Table 4-1on page 98 goes here along with the final table at the top of page 99. ** Stem-and-Leaf Displays provides an alternative to frequency distributions and histograms. Table 4-1 shows the number of people attending 45 performances at the Theater of the Republic. Note, the smallest attendance is 88, so the first stem is 8. The largest attendance is 156, so the largest stem will be 15. The first number in the table is 96, so 9 is the stem and 6 is the leaf. The next number is 93, so a three is placed after the 6 and so on. It is customary to sort the leaf values from smallest to largest. It is easy to see the minimum value, the maximum value and how many times the number of people attending the performances was more than 150 and so on. Minitab can produce stem-and-leaf displays too.

Properties of the Arithmetic Mean

•Interval or ratio scale of measurement is required •All the data values are used in the calculation •It is unique •The sum of the deviations from the mean equals zero •A weakness of the mean is that it is affected by extreme values

Standard Deviation (2 of 2)

•Major characteristics of the standard deviation are: -It is in the same units as the original data -It is the square root of the average squared distance from the mean -It cannot be negative -It is the most widely used measure of dispersion These properties are useful for understanding both population standard deviations and sample standard deviations.

Population Variance

•Major characteristics of the variance are: -All observations are used in the calculation -The units are somewhat difficult to work with, they are the original units squared The formula for determining the population variance is shown above and is another measure of dispersion. The variance is the mean of the squared deviations from the arithmetic mean.

Graphic Presentation of a Frequency Distribution (cont.)

•A frequency polygon, similar to a histogram, also shows the shape of a distribution. •These are good to use when comparing two or more distributions. A frequency polygon consists of line segments connecting the points formed by the intersection of the class midpoints and the class frequencies. You may recall that class midpoints are halfway between the lower limits of two consecutive classes and represent the typical value for each class. Since the definition of a polygon is that it is a closed plane, the frequency polygon is closed by anchoring the connected line segments to the X axis at zero frequency one interval higher than the highest midpoint and one interval lower than the lowest midpoint.

The Weighted Mean (2 of 2)

•The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees: 14 are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. •What is the mean hourly rate paid for the 26 employees?

Empirical Probability (2 of 2)

Empirical probability is also known as relative frequency. An example of a firm using empirical probability is when life insurance companies use past data to determine the acceptability of an applicant as well as the premium to be charged. In this example, NASA has had 121 successful flights out of a total of 123 flights, therefore, the probability of a successful flight in the future is .98.

Frequency Distributions - STEP 2

•Step 2 Determine the class interval, I -i ≥ (highest value - lowest value)/k -Round up to some convenient number (See Formula) •So, decide to use an interval of $400 •The interval is also referred to as the class width After analyzing the table of raw data, you find the maximum value is $3,292 and the minimum value is $294, divide the difference by the number of classes. In this example, there are 8 classes so the result is $374.75. Round the result up to some convenient number like a multiple of 10 or 100, here we'll use an interval of $400.

RATIO LEVEL OF MEASUREMENT (highest level)

•The highest level of measurement is the ratio level •The data has all the characteristics of the interval scale and ratios between numbers are meaningful •The 0 point represents the absence of the characteristic •Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale. Examples: wages, changes in stock prices, and weight

The Weighted Mean (1 of 2)

•The weighted mean is found by multiplying each observation, x, by its corresponding weight, w Here is the formula for calculating the weighted mean. You'll discover that this is really just a short cut method of computing the arithmetic mean which we can use when we have recurring values in a data set. The mean hourly rate is $18.12

Example: Population Mean (1 of 2)

•There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances between exits (in miles).

INTERVAL LEVEL OF MEASUREMENT

•This data has all the characteristics of ordinal level data plus the differences between the values are meaningful •There is no natural 0 point • For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement. Examples: the Fahrenheit temperature scale, dress sizes

Finding the Median

•To find the median for an even numbered data set •Sort the observations and calculate the average of the two middle values The number of hours a sample of 10 adults used Facebook last month: 3 5 7 5 9 1 3 9 17 10 Arranging the data in ascending order gives: 1 3 3 5 5 7 9 9 10 17 Thus, the median is 6. For an even number of observations, the median may NOT be one of the given values.

Relative Frequency Distributions

•To find the relative frequencies, simply take the class frequency and divide by the total number of observations

Dot Plots Example (1 of 3)

•Use dot plots to compare the two data sets like these of the number of vehicles serviced last month for two different dealerships Tionesta Ford Lincoln Mercury ** Dot plots are useful for comparing two different data sets. Here are tables with the number of vehicles serviced by two of the dealerships owned by the Applewood Auto Group. We will use dot plots to show the difference in location and dispersion of the observations. To develop a dot plot we display a dot for each observation along a horizontal number line. If there are identical values, the dots are piled on top of each other.

Which of the following can be observed from a histogram? Check all that apply. The shape of the distribution. The approximate number of observations. The spread of the data. The concentration of the data. The relationship between two variables.

Correct Answer The shape of the distribution. The approximate number of observations. The spread of the data. The concentration of the data.

Calculating the Standard Deviation of Grouped Data

•Applewood Auto Group Frequency Distribution Compute the standard deviation of the vehicle profits. To calculate the standard deviation of grouped data, begin by estimating the mean (in this example the mean is $1,851), then find the deviation of each value from the midpoint, square the results in this column and then multiply by the class frequency. Divide the sum of these products and divide by n-1 and finally, take the square root of that calculation.

Frequency Distributions - STEP 1

•Step 1 Decide on the number of classes •Use the 2k > n rule, where n=180 -k is the number of classes -n is the number of values in the data set -2k > 180, let k = 8 -So use 8 classes TABLE 2-4 Profit on Vehicles Sold Last Month by the Applewood Auto Group •Minimum 294 •Maximum 3,292 Examining the raw data in the table, we find there are 180 values, so n = 180. First try k=7, 2 to the 7th power only equals 128 and 128 is not greater than 180 so 7 classes will not meet the criteria. Then try k=8, 2 to the eighth power equals 256 so decide to use 8 classes since 256 > 180.

Ethics and Reporting Results

•Useful to know the advantages and disadvantages of mean, median, and mode as we report statistics and as we use statistics to make decisions •Important to maintain an independent and principled point of view •Statistical reporting requires objective and honest communication of any results

Rules of Addition Example (1 of 2)

A machine fills plastic bags with a mixture of beans, broccoli, and other vegetables. Most of the bags contain the correct weight, but because of the variation in the size of the beans and other vegetables, a package might be underweight or overweight. A check of 4,000 packages filled in the past month revealed: Note, the events A, B, and C are mutually exclusive and exhaustive.

Explain the difference between a sample and a population.

A sample is a subset taken from a population

Multiplication Formula (2 of 3)

An automobile dealer wants to advertise that for $29,999 you can buy a convertible, a 2-door, or a 4-door model with your choice of either wire wheel covers or solid wheel covers. How many different vehicles can the dealer offer? Total possible = (m)(n)=(3)(2) = 6

Summary of Approaches to Probability

Approaches to Probability -Objective - Classical Probability - Based on equally likely outcomes - Empirical Probability - Based on relative frequencies -Subjective - Based on available information

Constructing Frequency Distributions

FREQUENCY DISTRIBUTION A grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class. •This is a four-step process 1.Decide on the number of classes 2.Determine the class interval 3.Set the individual class limits 4.Tally the data into classes and determine the number of the observations in each class

Graphic Presentation of Qualitative Data (Pie Chart)

PIE CHART A chart that shows the proportion or percentage that each class represents of the total number of frequencies. *Use a pie chart when you wish to compare relative differences in the percentage of observations for each class of a qualitative variable.

Bayes' Theorem Example (3 of 3)

Randomly select an individual and perform the test. The test results indicate the disease is present. What is the probability the test is correct? Use Bayes' theorem to solve.

The Combination Formula (2 of 2)

The Grand 16 movie theater uses teams of three employees to work the concession stand each evening. There are seven employees available to work. How many different teams can be scheduled? The combination formula is used when the order of the objects is not important; it is used to count the number of r object combinations from a set of n objects. Logically, the number of combinations is always less than the number of permutations. In this example, the seven employees taken three at a time would create the possibility of 35 different teams.

General Rule of Addition (2 of 2)

The rules of addition are used to find the probability of two or more events occurring. Here the two events, A and B, overlap and illustrate the joint probability of A and B.

Geometric Mean

•The geometric mean is the nth root of the product of n positive values •The formula for geometric mean is

Stem-and-Leaf Displays (1 of 2)

•An alternative to a frequency distribution and histogram •The advantages of the stem-and-leaf display -The identity of the each observation is not lost -The digits themselves give a picture of the distribution -The cumulative frequencies are also shown ** Stem-and-leaf displays are easy to use especially when performing a quick analysis of a small data set.

The Permutation Formula (1 of 2)

•Another counting formula used to determine a total number of outcomes PERMUTATION Any arrangement of r objects selected from a single group of n possible objects. Where: n is the total number of objects. r is the number of objects selected * Use the permutation formula when the order of the objects is important and to find the number of r objects selected from a group of n objects. Remember by definition, 0! = 1. Excel has a formula that will calculate permutations.

The Combination Formula (1 of 2)

•Another counting formula useful in determining the total number of outcomes •A combination is an arrangement where the order of the objects selected is not important Where: n is the total number of objects. r is the number of objects selected

Bayes' Theorem Example (1 of 3)

Suppose 5% of the population of Umen have a disease and A1 represents the part of the population that has the disease and A2 represents those who do not. Let B denote a test result that shows the disease is present. The probability of A1 and A2 are prior probabilities since their probabilities are assigned before any empirical data are obtained. P(A1) = .05 and let A2 represent the part that does not have the disease. P(A2) = 1-.05=.95 B denotes the event "test shows the disease is present". Assume the probability for a positive test result for someone with the disease is .90 and the probability of a positive test result is .15 for the individual who does not have the disease. The probability P(A1|B) is called a posterior probability since it is a revised probability based on additional information. Using Bayes' theorem, we find the probability of someone who tests positive and actually has the disease. Therefore the probability that a person has the disease, given that he or she tested positive, is .24.

Tree Diagrams (1 of 2)

•A tree diagram is a visual that is helpful in organizing and calculating probabilities for problems with several stages •Each stage of the problem is represented by a branch of the tree •Label the branches with the probabilities

Skewness (2 of 2)

•A value of 1.63 indicates moderate positive skewness •A value of 0 means the distribution is symmetrical The box plot reveals the data is positively skewed since the dashed line to the right of the box (from 22 minutes to 30 minutes) is longer than the dashed line to the left of the box (from 15 minutes to 13 minutes) and since the median is not in the center of the box.

Box Plot Example (1 of 4)

•Alexander's Pizza offers free delivery of its pizza within 15 miles. How long does a typical delivery take? Within what range will most deliveries be completed?

Contingency Table Example (1 of 2)

•Applewood Auto Group's profit comparison Contingency Table Showing the Relationship between Profit and Dealership Compute the median profit for all sales last month and then classify profit from sales data as being above or below the median.

Bayes' Theorem

•Bayes' Theorem is a method of revising a probability, given that additional information is obtained •For two mutually exclusive and collectively exhaustive events PRIOR PROBABILITY The initial probability based on the present level of information. POSTERIOR PROBABILITY A revised probability based on additional information. * A1 and A2 are mutually exclusive and exhaustive categories. See the next slide for an example.

Skewness Example (1 of 3)

•Following are the earnings per share for a sample of 15 software companies for the year 2016. The earnings per share are arranged from smallest to largest. •Begin by finding the mean, median, and standard deviation. Find the coefficient of skewness. •What do you conclude about the shape of the distribution?

Skewness (1 of 2)

•The coefficient of skewness is a measure of the symmetry of a distribution •Two formulas for coefficient of skewness •The coefficient of skewness can range from -3 to +3 •A value near -3 indicates considerable negative skewness Professor Karl Pearson developed the simplest formula for calculating skewness which is based on the difference between the mean and the median. The textbook also shows the software method of calculating skewness that is based on the cubed deviations from the mean. See Formula 4-3 for more information.

Empirical Probability (1 of 2)

•The empirical definition occurs when the number of times an event happens is divided by the number of outcomes EMPIRICAL PROBABILITY The probability of an event happening is the fraction of the time similar events happened in the past. LAW OF LARGE NUMBERS Over a large number of trials, the empirical probability of an event will approach its true probability.

General Rule of Addition (1 of 2)

•The general rule of addition is used when the events are not mutually exclusive GENERAL RULE OF ADDITION P(A or B) = P(A) + P(B) - P(A and B) JOINT PROBABILITY A probability that measures the likelihood two or more events will happen concurrently.

General Rule of Multiplication (1 of 2)

•The general rule of multiplication refers to events that are not independent •A conditional probability is the likelihood an event will happen, given that another event has already happened CONDITIONAL PROBABILITY The probability of a particular event occurring, given that another event has occurred. * When events are not independent, they are dependent. This is called a conditional probability because its value is conditional on what occurred with the first event. In other words, the probability of B is conditional on the occurrence and effect of event A.

Box Plots (2 of 2)

•The interquartile range is Q3 - Q1 •Outliers are values that are inconsistent with the rest of the data and are identified with asterisks in box plots

Rules of Addition

•The rules of addition refer to the probability that any two or more events can occur •The special rule of addition is used when the events are mutually exclusive SPECIAL RULE OF ADDITION P(A or B) = P(A) + P(B) If the events A and B are mutually exclusive, the probability of one or the other event occurring is the sum of their probabilities. This can be extended for more than two events. Here is a Venn diagram illustrating that events A, B, and C do not overlap and are therefore mutually exclusive. See the textbook for more information on Venn diagrams.

Special Rule of Multiplication (1 of 2)

•The rules of multiplication are applied when two or more events occur simultaneously •The special rule of multiplication refers to events that are independent INDEPENDENCE The occurrence of one event has no effect on the probability of the occurrence of another event. SPECIAL RULE OF MULTIPLICATON P(A and B) = P(A) P(B) * A joint probability is the likelihood that two or more events will happen at the same time. In the AAA example, the events are independent since whether a member made an airline reservation or not has no effect on whether another member made an airline reservation. Therefore the special rule of multiplication can be used.

Median

MEDIAN The midpoint of the values after they have been ordered from the minimum to the maximum values. Here are prices for condos in Palm Aire with an arithmetic mean price of $110,000. A better measure would be the median since it, $70,000, is not affected by the $275,000 unit. When finding the median, it doesn't matter if the values are sorted in ascending order or descending order. Since there are an odd number of values in this data set it is fairly easy to find the value that divides the set in half, with the same number of observations below $70,000 as above $70,000. Remember, the data must be at least the ordinal level of measurement.


Kaugnay na mga set ng pag-aaral

OB Exam 4 CH 15-19 Postpartum/Newborn period & complications

View Set

SCM Chapter 11 Managing inventory throughout the supply chain

View Set

Chapter 39 Vehicle Extrication and Special Rescue

View Set

Defining Drugs, Foods, Dietary Supplements, Devices and Cosmetics Part 1

View Set

unit 5 executive branch study guide

View Set

Pharmacotherapeutics for Advanced Practice Nurse Prescribers

View Set