Statistics chapter 1-7
Pareto charts
are bar charts that show the frequency of the categories that cause quality control problems Show quality problem categories in decreasing order • The most problematic categories are shown first
Bar Graphs
are more useful when you want to highlight the actual data values and when the classes combined don't form a whole
The Union of Events
of Events A and B represents the number of instances where either Event A or B occur or both events occur together chp 3 pg 27
Direct Observation or Focus Group Experiment Surveys or Questionnaires
primary data collections
Contingency tables
provide a format to display observations that have more than one value associated with them •Use rows and columns for separate variables to summarize the data efficiently
Scatter plots
provide a picture of the relationship between two data points that are paired together
sample correlation coefficient
rSMALLxy , measures both the strength and direction of the linear relationship between two variables ex: relationship between # of hours that students study and their exam score. CHP 3 PG 77
Standard Variation
real world measurement
class boundaries
represent the minimum and maximum values for each class
Population
represents all possible subjects that are of interest in a particular study
The mean of the binomial distribution
represents the long-term average number of successes to expect based on the number of trials conducted Proposition A: n = 10, p = 0.4, and q = 0.6 U= np =(10)(0.4)=4.0 out of ten randomly selected voters, on average 4 of them (40%) will support proposition A THIS EXAMPLE IS CHP 5 PG 25-28
The mean of the binomial distribution
represents the long-term average number of successes to expect based on the number of trials conducted Formula for Calculating the Mean of a Binomial Distribution U = np μ = The mean of the binomial distribution σ = The standard deviation of the binomial distribution n = The number of trials p = The probability of a success q = The probability of a failure chp 5 pg 24
sample correlation coefficient,
rxy , indicates both the strength and direction of the linear relationship between the independent and dependent variables •The values of r range from -1.0, a strong negative relationship, to +1.0, a strong positive relationship •When r = 0, there is no relationship between variables x and y chp 3 pg 81
The Coefficient of Variation Formula for the sample coefficient of variation:
s = the sample standard deviation x= the sample mean Formula for the population coefficient of variation O= the population standard deviation U = the population mean chp 3 pg 42
Quantitative Data
Described by numerical values: 1.Counted: Examples: • Number of Children • Defects per hour (Counted items) 2.Measured: Examples: • Weight • Voltage (Measured characteristics)
How many data values can be found in a specific interval?
Discrete Random Variables • a finite number of values within an interval and Continuous Random Variables • an infinite number of outcomes within an interval
Constructing a Box-and-Whiskers Plot
Draw a horizontal number line that spans the length of the data values •Draw a box above the number line extending from Q1 to Q3, with a center line at the median (Q2) •Whiskers extend from the central box to the highest and lowest values that are not outliers •If outliers exist in the data set, they are plotted with an asterisk above the number line chp 3 g 71
A discrete probability distribution meets the following conditions:
Each outcome in the distribution needs to be mutually exclusive with other outcomes in the distribution • The probability of each outcome, P(x), must be between 0 and 1 (inclusive): The sum of the probabilities for all the outcomes in the distribution must be 1 where n equals the total number of possible outcomes. chp 5 pg 9
Example: The Mean of Grouped Data
Example An online merchant has collected the following grouped data for the number of web pages viewed by a sample of its customers: The merchant would like to calculate the average number of viewed pages.
Mode example Example with numerical data: • Number of children per family in a sample of 24 families: 0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5
Example with numerical data: • Number of children per family in a sample of 24 families: 0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5 the mode is 2 out of 8 times because there are 8 twos
MEDIUM INDEX When the index point is an even whole number, the position of the median is halfway between the index point (i) and the next highest data point (the i + 1 position) When there are an even number of data values, the median is halfway between the two middle values
Example with sample of size n = 6: 145 157 170 182 204 209 The index number is i = 0.5(n) = 0.5(6) = 3 The index number is a whole number so the median value is halfway between the third and fourth values in the sorted data 145 157 170 182 204 209 median = 176 = (170 + 182)/2
The Central Limit Theorem
the average value of all possible sample means computed from all possible random samples of a given size from the population is equal to the population mean: The standard deviation of the sample means computed from all random samples of size n is equal to the population standard deviation divided by the square root of the sample size:
Statistics
the mathematical science that deals with the collection, analysis, and presentation of data, which can then be used as a basis for inference and induction
Mode example Example with categorical data cars: Toyota: 7 Acura :3 Ford: 5 BMW: 1 pretend there is bar graph
the mode is toyoya because it is the one with the most
The Range
Simplest measure of variation Difference between the highest value and the lowest value in a data set Range = Highest value - Lowest Value Example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 13 Range = 13 - 1 = 12
Percentiles To find percentiles manually:
Sort the data from lowest to highest • Calculate the index point, i Where: p = the percentile of interest n = the number of data value If i is not a whole number, round i to the next whole number. The ith position represents our value of interest If i is a whole number, the midpoint between the ith and i + 1 position is our value of interests i is not the value of the percentile, it is the position of the percentile value in the ranked data FORMULA IS PN PG 59 CHP 3
Business Statistics
Statistics applied to the business world in an effort to improve people's decision making in fields such as marketing, operations, finance and human resources to name a few
Sampling and Nonsampling Errors
Statistics will vary from sample to sample • A sample statistic is not likely to be exactly equal to the population parameter, since only a portion of the population is in the sample Sampling error : is defined as the difference between the sample statistic and the population parameter. Because a statistic is based on just a portion of the population, it would be unreasonable to expect the sample mean and population mean to be the same.
Stratified vs. Cluster Sampling
Strata are defined with a common characteristic •Values have something in common, such as each student being a freshman • Strata tend to be homogeneous collections, each with a certain characteristic of interest Clusters are "mini-subsets" of the larger population • Tend to be a combination of various characteristics • Each cluster should be representative of the entire population
Surveys or Questionnaires
Subjects are asked to respond to questions or discuss attitudes Example: E-mail surveys to customers to assess service quality
Working with Grouped Data
Suppose data has already been summarized by a frequency distribution •The individual data values are no longer shown •Only grouped data is available To estimate the average for the frequency distribution: •Find the midpoint for each group (The midpoint is the halfway point in each group) •Use the midpoint as a representative value for that group
Example: Probability between two values
Suppose income is normally distributed for a group of workers, with μ = $45,000 and σ = $5,000 Find the probability that a randomly selected worker from this group has an income between $38,000 and $48,000
Advertising
Household surveys, TV viewing habits • Viewing habits
Bar charts
are a good tool for displaying qualitative data that have been organized in categories
The interquartile range, IQR,
describes the middle 50% of a range Find the IQR by subtracting the first quartile from the third quartile
Excel calculates percentiles using the PERCENTILE.EXC function:
=PERCENTILE.EXC(array, k) array = The data range of interest k = The percentile of interest between 0 and 1 inclusive
The Variance and Standard Deviation of Grouped Data
Formula for the Sample Variance: Grouped Data Formula for the Population Variance: Grouped Data pg 55
information
Analyzing the data can provide information for decision making Table 1.1 | Golf-Score Data (Did a new driver after 7/1 change the average golf score?) because after 7/1 the scores started to seem lowered the scores in june
Biased Sample
- a sample that does not represent the intended population • can lead to distorted findings • biased sampling can occur intentionally or unintentionally •results can be manipulated by how we ask questions and who is responding to them
Ideally, the number of classes in a frequency distribution should be between___ and _____
40-20
Rules for Classes for Grouped Data
1. Equal-size classes. All classes in the frequency distribution must be of equal width 2. Mutually exclusive classes. Class boundaries cannot overlap 3. Include all data values. Make sure all data values are accounted for in the total row of the frequency distribution 4. Avoid empty classes. It is undesirable for a histogram to display a class so narrow that there are no observations in it 5. Avoid open-ended classes (if possible). These violate the first rule of equal class sizes
A Poisson process has the following characteristics:
1. The experiment consists of counting the number (x) of occurrences of an event over a period of time, area, distance, or other type of measurement 2. the mean of the Poisson distribution (λ) has to be the same for each equal interval of measurement 3. the number of occurrences during one interval has to be independent of the number of occurrences in any other interval 4. the intervals defined in the Poisson process cannot overlap
The Characteristics of a Binomial Experiment
1.The experiment consists of a fixed number of trials, denoted by n 2.each trial has only two possible outcomes, a success or a failure 3.the probability of a success p and the probability of a failure q are constant throughout the experiment 4.each trial is independent of the other trials in the experiment A binominal probability distribution allows us to calculate the probability of a specific number of successes for a certain number of trails Examples of binomial settings Counting number of successes in a fixed number of trials •A survey response to a question is "yes I will buy" or "no I will not" •An electronic component is either defective or acceptable •New job applicants either accept an offer or reject it
frequency distribution
A _____________________ ____________________shows the number of data observations that fall into specific intervals •Graphically summarize information not readily observable by merely looking at data in a table ex: number of ipods sold so its a quantitive is the graph that has a bunch of numbers and qualitative which is frequency is when you count. ok how many 0=5 how many 1=8 how many 2's=14 and so on
Using a Poisson Distribution to Calculate the Probability of Arrivals
A common use of the Poisson distribution is to determine the probability of customer arrivals Example: On average, 12 customers per hour arrive at the bank drive-through window • assume these arrivals follow the Poisson distribution What is the probability that exactly 4 customers will arrive during the next 30 minutes? Answer: First adjust the average number of arrivals per hour to a 30-minute interval • If the bank averages 12 customers arriving per hour, it would average 6 customers every 30 minutes, so λ = 6.0 To find the probability that exactly 4 customers will arrive during the next 30 minutes, we use the formula below with x = 4: FORMULA SOLUTION IS CHP 5 PG 40
posterior probability
A conditional probability is also known as a _____________ ______________, which is a revision of the prior probability using additional information
Specific Discrete Probability Distributions
Binomial Poission Hypergeometric
Covariance Calculations
A positive value implies a positive linear relationship (as one variable increases, the second variable also tends to increase) • A negative covariance indicates a negative linear relationship (as one variable increases, the second variable tends to decrease) • A covariance close to zero indicates no relationship between the two variables
The Empirical Rule
According to the empirical rule, if a distribution follows a bell-shaped, symmetrical curve centered around the mean, we would expect: Approximately 68% of the values to fall within ± 1 standard deviations from the mean Approximately 95% of the values to fall within ± 2 standard deviations from the mean Approximately 99.7% of the values to fall within ± 3 standard deviations from the mean
Systematic Sampling
Advantages of systematic sampling: • Easy to do manually • Can avoid bias by not allowing judgment or convenience to affect the sample EX: bias toward selecting some students rather than others Disadvantages: • One concern about systematic sampling is periodicity, which is a pattern in the population that is consistent with the value of k • Example: Sampling every 8 hours might obtain values only from the beginning or end of a shift, which might not be representative of all values during the day
The Range Advantages: Disadvantages:
Advantages: • Easy to calculate and understand Disadvantages: •Only based on two numbers in the data set (Ignores the way in which data are distributed) • Sensitive to outliers: example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 1000 1000-1=999
sample space
All the possible outcomes, or results, of an experiment • The sample space for our single-die experiment is {1, 2, 3, 4, 5, 6} chp 4 pg 6
simple event
An event with a single outcome in its most basic form that cannot be simplified • An example of a simple event is rolling a five with a single die
Calculating Probabilities for Normal Distributions Using Normal Probability Tables
Any normal distribution (with any mean and standard deviation combination) can be transformed into the standard normal distribution (z) Need to transform x units into z units • The resulting z value is called a z-score chp 6 pg 13 and 14
The Effect of the Sample Size on the Sampling Distribution
As the sample size increases •the standard error of the mean becomes smaller •which in turn reduces the sampling error CHP 7 PG 44
The Effect of the Sample Size on the Sampling Distribution
As the sample size n increases • The width of the interval from to becomes narrower • The sampling error decreases • So X gets closer in value to U If n = N, then the entire population is known, so the mean of that sample is the population mean, This is known as a census
Using Poisson Distributions to Approximate Binomial Distributions
Binomial probabilities can be calculated using the Poisson distribution when the following conditions are present: •When the number of trials, n, is greater than or equal to 20 and •When the probability of a success, p, is less than or equal to 0.05 • This approximation to the binomial is very close when the number of trials is large and the probability of success per trial is low, and can be easier to calculate by hand than the binomial formula FORMULA CHP 5 PF 47
Formula for the Poisson Probability Distribution
CHP 5 PG 36 x = The number of occurrences of interest over the interval λ = The mean number of occurrences over the interval e = 2.71828 P(x) = The probability of exactly x occurrences over the interval
Calculating Probabilities for a Hypergeometric Distribution Formula for the Mean of the Hypergeometric Distribution Formula for the Standard Deviation of the Hypergeometric Distribution
CHP 5 PG 57-62
Calculating Normal Probabilities Using Excel
CHP 6 PG 34-39
Formula for the exponential probability density function:
CHP 6 PG 46 A discrete random variable that follows the Poisson distribution with a mean equal to λ has a counterpart continuous random variable that follows the exponential distribution with a mean equal to μ = 1/ λ
Formula for the standard deviation of the Exponential Distribution:
CHP 6 PG 49
Calculating Exponential Probabilities Using Excel
CHP 6 PG 52-53
Formula for the Continuous Uniform Probability Density Function:
CHP 6 PG 56 where: a = Smallest allowable continuous random variable b = Largest allowable continuous random variable
Formula for the Uniform Cumulative Distribution Function:
CHP 6 PG 57 x1 = Lower endpoint of the interval of interest x2 = Upper endpoint of the interval of interest
continuous data
Can potentially take on any value, depending only on the ability to measure accurately • Often measured, fractional values are possible • thickness of an item • time required to complete a task • temperature of a solution • height, in inches In the whole foods (continuous) example in the book, there are infinite number of wait times within the interval 0-5 minutes. 3, 3.2, 4.5 minutes etc. Because we are measuring time on a continuous scale, the only limitation is the number of values within this 0-5 minute interval which is our measuring instrument's level of precisio
Methods of Assigning Probability
Classical Empirical subjective
Classical Probability Example
Classical probability assumes that each event in the sample space has the same likelihood of occurring (the chance of rolling a one is the same as rolling a two and so on. The set of events is collectively exhaustive if the sample space includes every possible simple event that can occur (grender)
Qualitative Data
Classified by descriptive terms to measure or classify something of interest Examples: • Marital Status • Political Party • Eye Color (Defined categories)
Nonprobability Sampling
Convenience
Contingency Tables with Probabilities
Convert table frequencies into probabilities by dividing each number in the table by the total number of observations
Finance and Economics
Data on income, credit risk, unemployment • Bank lending
Example: Using the Poisson Distribution to Approximate the Binomial Distribution
Example: 3% of the workers in a large factory are absent each day. From a random sample of 60 workers, what is the probability that exactly 1 worker is absent? Binomial distribution answer, with n = 60, p = .03, and x = 1: Example: 3% of the workers in a large factory are absent each day. From a random sample of 60 workers, what is the probability that exactly 1 worker is absent? The Poisson distribution approximation can be used since the conditions for the approximation are met: • The number of trials is n = 60, which is greater than or equal to 20 • The probability of a success is p = 0.03, which is less than or equal to 0.05 Poisson distribution approximation, with np = (60)(.03) = 1.8 : CHP 5 PG 48-50
Empirical Probabilityexample
Example: A survey of 400 new graduates asked how much they owed in student loans. The results are shown in the following table: a) What is the probability that a randomly selected graduate has between $5,000-$9,999 in student loans? b) What is the probability that a randomly selected graduate has $20,000 or more in student loans EXMAPLE IS ON CHP 4 PG 15
Using Normal Probability Tables EXAMPLE
Example: Finding the z or x value Suppose that μ = 12 and σ = 3 for a normal distribution Find the x value so that P(z ≤ x) = 0.95 1. Find the necessary z-score What z value is needed to include 95% of the area under the curve? Look in the body of the table for 0.9500 The value 0.9500 would be found in the 1.6 row and between the 0.04 and 0.05 columns. This means our point of interest is halfway between these columns at 1.6 + 0.045, or z = 1.645 CHP 6 PG 22-25
Uniform Probability Distributions example
Example: Suppose the temperature of a solution varies with a uniform distribution between 55 and 155 degrees What is the probability that the next measured temperature is between 70 and 90 degrees? The total area under the distribution must be 1.0, so if the width is 100 (155 degrees - 55 degrees), the height must be 0.01:
Calculating Exponential Probabilities Exponential Probability Distributions EXAMPLE
Example: The mean time between arrivals is 2 minutes What is the probability that the next arrival is within the next 3 minutes? Time between arrivals is exponentially distributed with mean time between arrivals of 2 minutes (30 per 60 minutes, on average)
Cluster Sampling example of cluster
Examples of clusters: •Individually boxed packages of bulk parts in a large delivery from a supplier •Individual cities where a new product is introduced • Customer account balances arranged in clusters by first letter of last name
Stratified Sampling
Examples of strata: • For an undergraduate population, strata could be class standing: Freshman, Sophomore, Junior, and Senior • For factory production, strata could be 1st shift, 2nd shift, and 3rd shift • For a population of workers, strata might be different age categories of workers Using stratified sampling helps insure that all classes, shifts, or ages are represented in the sample
Simple Random Sample EXCEL SHEET
Excel can be used to select a simple random sample: 1.Select Data > Data Analysis 2.In the Data Analysis dialog box, select Sampling and click OK 3.In the Sampling dialog box, click on the text box for Input Range: and select the desired range of cells 4.Select Random under Sampling Method 5.In the Number of Samples: text box, type the number desired for your sample size 6.Click on the text box for Output Range: select a cell from an empty area in your spreadsheet, and click OK Excel's random sampling tool uses sampling with replacement • This means that after a value from the population has been selected for the sample, the value is placed back into the population and can be chosen again for the same sample Sampling without replacement means that once a value from the population is selected for the sample, it is not returned to the population so that value cannot be chosen again
Systematic Sampling EXCEL
Excel can help with systematic sampling 1.Select Data > Data Analysis 2.Select Periodic in the Sampling dialog box 3.In the Period: text box, type in the value for k (which you must have already determined) 4.Click on the text box for Output Range:, and select a cell in an empty column, and click OK
Classical Probability Example
Experiment: Roll a die once Sample space = {1, 2, 3, 4, 5, 6} Define Event A as rolling a five • There are six possible outcomes in the sample space • Event A (rolling a five) can happen one way P(A) = 1/6 = 0.167, or a 16.7% probability • This is a Simple Probability: it represents the likelihood of a single (simple) event occurring by itself
Marketing Research
Focus group data, customer surveys • hotels
The Addition Rule
For mutually exclusive events, the addition rule states that the probability of two events occurring is simply the sum of their individual probabilities: P(A or B) = P(A) + P(B) If Events A and B are not mutually exclusive: P(A or B) = P(A) + P(B) - P(A and B) chp 4 pg 33
Independent and Dependent Events
Formula for Determining if Events A and B are independent P(A|B) =P(A) If P(A|B) ≠ P(A) then events A and B are not independent chp 4 pg
The Sampling Distribution of the Proportion with a Finite Population
If the ratio of n/N is greater than 5% and sampling is without replacement a finite population correction is needed When the population is small the proportion of the sample size to the population size, n/N, is large Small populations require an adjustment to the standard error of the mean calculation if the proportion n/N is greater than 5% and sampling is without replacement
Inferential Statistics
Making statements about a population by examining sample results
example with mean median and mode Prices for 5 homes have been collected House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,00
Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: appears most often = $100,000
Measures of Association Between Two Variables
Measures of Association Between Two Variables=Sample Covariance AND Sample Correlation Coefficient
Specific Continuous Probability Distributions
Normal Exponential Uniform
Direct Observation or Focus Group
Observing subjects in their natural environment Example: Watching to see if drivers stop at a stop sign
2 k n where k = Number of classes n = Number of data points • Find the lowest value of k that satisfies the rule Suppose n = 50 2^5 = 32 < 50 (k = 5 is too small) 2^6 = 64 > 50 (k = 6 is a good choice)
One method to determine the number of classes in a frequency distribution is the rule
event
One or more outcomes of an experiment • The outcome, or outcomes, is a subset of the sample space • An example of an event is rolling a pair with two dice
the Poisson distribution table
Organized by values of λ, the average number of occurrences • The sum of the probabilities in a column for a particular value of λ is equal to 1 •One limitation of using Poisson tables is that you are restricted to using only the values of λ that are shown in the table CHP 5 PG 41-45
Sampling and Nonsampling Errors
Parameters : are values that describe some characteristic of a population, such as its mean or median Statistics: are values calculated from a sample, such as the sample's mean or median
Advantages: • collected by the person or organization who uses the data Disadvantages: • Can be expensive and time consuming to gather
Primary Data advantages and disadvantages
Basic Properties of a Probability
Probability Rule 1 • If P(A) = 1, then with certainty, Event A must occur • Ex: rolling a single six-sided die and observing 1,2,3,4,5,6 Probability Rule 2 • If P(A) = 0, then with certainty, Event A will not occur Probability Rule 3 • The probability of any event must range from 0 to 1 • Probabilities can never be negative or greater than 1. The probability that I will buy a pair of shoes next month could be 0 (0%) or 1 (100%). Not 1 (-100%) or 2 (200%). Probability Rule 4 • The sum of all the probabilities for the simple events in the sample space must be equal to 1 • refer to pg. 153 table 4.5 Probability Rule 5 • The complement to Event A is defined as all of the outcomes in the sample space that are not part of Event A. The complement is denoted as A' Probability of the complement of an event occurring is 100% minus the probability of the event itself occurring. • Page 154 cookie example P(A) + P(A' ) = 1 or P(A) = 1 - P(A' )
The Two Main Types of Data
Qualitative Data Quantitative Data
Two Main Types of Data and their Corresponding Levels
Qualitative-Nominal and Ordinal Quantitative-Interval and Ratio
Operations
Quality control, reliability, operate better • Cheez-it
Expressing z-Scores in Terms of x chp 3 pg 49
Question: For a symmetric bell shaped population with a mean of 20 and a standard deviation of 3, what interval will contain about 95% of all the values? Answer: About 95% of the values are within ± 2 standard deviations: About 95% of the values will fall between 14 and 26
Data
Raw facts or measurements of interest Table 1.1 | Golf-Score Data (Each individual value is considered a data point)
Sampling from a Population
Sampling from a Population=Probability Sampling AND Nonprobability Sampling
Advantages: • Readily available • Less expensive to collect Disadvantages: • No control over how the data was collected • Less reliable unless collected and recorded accurately
Secondary Data advantages and disadvantages
Systematic Sampling EXAMPLE
Select a systematic sample of size n = 30 from a population of N = 270 • From a list of all population values, choose every 9th value for the sample
Probability Sampling
Simple Random Systematic Stratified Cluster Resampling
Putting the Central Limit Theorem to Work
Suppose people drive an average of 12,000 miles per year with a standard deviation of 2,580 miles per year • What is the probability that a randomly selected driver will drive more than 12,500 miles? Suppose people drive an average of 12,000 miles per year with a standard deviation of 2,580 miles per year • What is the probability that a randomly selected driver will drive more than 12,500 miles? What is the probability that a randomly selected sample of 36 drivers will drive, on average, more than 12,500 miles? • Since n =36 we can apply the Central Limit Theorem, so will be normally distributed with mean and standard deviation:
Using the Central Limit Theorem to Test Claims
The CLT can be used to check the validity of claims made about a population parameter • Idea: Use a sampling distribution to see how unusual a sample result is, if the claim is true • If the sample result is very unusual, we conclude that the claim is not valid
The Multiplication Rule Formula for the multiplication rule for two independent events:
The Multiplication Rule Formula for the multiplication rule for two independent events: P(Aand B) = P(A)P(B) When multiple events are all independent, the probability of them all occurring is simply the product of their individual probabilities:
Primary Data Secondary Data
The Sources of Data
The Variance and Standard Deviation of a Discrete Probability Distribution
The Variance is a measure of the spread of the individual values around the mean of a data set σ2 = The variance of the discrete probability distribution xi = The value of the random variable for the i the outcome μ = The mean of the discrete probability distribution P(xi) = The probability that the i the i outcome will occur n = The number of outcomes in the distribution chp 5 pg 12 and 13
dependent variable/independent variable
The ______________ ______________, which is placed on the vertical axis of the scatter plot, is influenced by changes in the , ______________ ________________ which is placed on the horizontal axis
Characteristics of the Normal Probability Distribution
The distribution is bell-shaped and symmetrical around the mean •Because the shape of the distribution is symmetrical, the mean and median are the same value •Values near the mean, where the curve is the tallest, have a higher likelihood of occurring than values far from the mean, where the curve is shorter The total area under the curve is always equal to 1.0 Normal Probability Distributions f(x) x μ • Because the distribution is symmetrical around the mean, the area to the left of the mean equals 0.5, as does the area to the right of the mean • The left and right ends of the normal probability distribution extend indefinitely A distribution's mean (μ) and standard deviation (σ) completely describe its shape Changing μ shifts the distribution left or right Changing σ increases or decreases the spread CHP 6 PG 10 11
Measures of Association Between Two Variables
The goal of this section is to examine two descriptive statistics that measure the linear relationship between two variables
Example with sample of size n = 7: 21 27 27 28 34 45 50
The index number is i = 0.5(n) = 0.5(7) = 3.5 The index number is not a whole number so round up to i = 4 The median value is therefore in the fourth position of our sorted data which is the number 28 ps:The median is not sensitive to outliers 21 27 27 28 34 45 5000 • The median is still 28
Bias
The manner in which survey questions are asked can affect responses
The mean and standard deviation are useful when comparing two different distributions Example: Number of rings before a call is answered • Atlanta vs. Boston call centers
The mean and standard deviation are useful when comparing two different distributions Example: Number of rings before a call is answered • Atlanta vs. Boston call centers chp 5 pg 14
STRATA VS CLUSTERS
The members of a specific stratum all have something in common, such as being a freshmen. As a result, strata tend to be homogenous collections, each with a certain characteristic of interest. Clusters, on the other hand, are "mini-subsets" of the larger population and therefore tend to be a melting pot of various characteristics. For example, a particular classroom (cluster) could have a mixture of freshman, sophomores, juniors or seniors.
Using the Normal Distribution to Approximate the Binomial Distribution
The normal distribution can be used as an approximation to the binomial distribution •Normal probabilities are easy to look up in Appendix A, Tables 3 and 4 •Binomial probabilities are more difficult to calculate The normal distribution approximation can be used when the sample size is large enough so that np ≥ 5 and nq ≥ 5
joint probability
The probability of the intersection of two events is known as a
experiment
The process of measuring or observing an activity for the purpose of collecting data • An example is rolling a single six-sided die
The shape of the exponential distribution depends on the value λ
The shape of the exponential distribution depends on the value λ Compared to normal distributions: 1.The exponential distribution is right-skewed, not symmetrical 2.The shape is completely described by only one parameter, λ 3.The values for an exponential random variable cannot be negative CHP 6 PG 47
The Effect of the Sample Size on the Sampling Distribution
The shape of the population distribution will affect the shape of the sampling distribution, as will the size of the sample As the sample size gets large enough, the sampling distribution becomes almost normal regardless of shape of population CHP 7 PG 46
Use a standard normal probability table (Table 3 or Table 4 in Appendix A) to calculate normal probabilities
The table provides the cumulative area under a standard normal distribution curve that lies to the left of the z-score CHP 6 PG 21-23
Z-SCORE EXAMPLE FOR NORMAL DISTRIBUTION
The time customers spend on the phone for service follows the normal distribution with a mean of 12 minutes and a standard deviation of 3 minutes. What is the probability that the next customer who calls will spend 14 minutes or less on the phone? Known: μ = 12 and σ = 3 Find the z-score for x = 14: ANSWER: 0.67 This says that x = 14 is 0.67 standard deviations (0.67 increments of 3 units) above the mean of 12 chp 6 PG 17-19
The Z-score
The z-score identifies the number of standard deviations a particular value is from the mean of its distribution • A z-score has no units The z-score is - zero for values equal to the mean - positive for values above the mean - negative for values below the mean pg 45 chp 3
Continuous data examples
Time required to read chapter 2 • Thickness of paint applied to a car body • Voltage of batteries produced in August
Experiment
Treatments are applied in controlled conditions Example: Crop growth from different plots using different fertilizers
Pie Charts
are another excellent tool for comparing proportions for categorical data Each segment of the pie represents the relative frequency of one category
Independent and Dependent Events
Two events are considered independent of one another if the occurrence of one event has no impact on the occurrence of the other event If the occurrence of one event affects the occurrence of another event, the events are considered dependent
Formula for the Population Mean
U=the population mean (the Greek letter "mu") N = the number of data values in the population
Outliers
Upper Limit = QSMALL3 + 1.5 (IQR) Lower Limit = Qsmall1 - 1.5 (IQR) Values beyond these limits are considered outliers
Pie Chart
Use a __________ __________ to compare the relative sizes of all possible categories
Subjective Probability
Used when classical and empirical probabilities are not available •Instead use experience or intuition to estimate the probabilities • Example: The probability that inflation will be greater than 4% next year
The Variance and Standard Deviation for a Population
Used when the data set represents an entire population rather than a sample from a population U= the population mean N = population size x little i - U = the difference between each data value and the population mean chp 3 powerpoint pg 34
Discrete
Values are whole numbers (integers) • Usually counted, not measured • number of complaints per day • number of TVs in a household • number of rings before the phone is answered In the Marriot Hotel (discrete) example in the book, there are only five possible outcomes within the interval 1-5 for a customer to choose from when rating his or her satisfaction.
Calculating Probabilities for a Hypergeometric Distribution
When sampling is without replacement, the probability of success changes during the sampling process • This violates the requirements for a binomial probability distribution •Use the hypergeometric distribution instead formula for the Hypergeometric Distribution pg 54 chp 5 where: N = The population size R = The number of successes in the population n = The sample size x = The number of successes in the sample Example: 5 of 50 accounts are delinquent. If an auditor randomly selects 10 accounts without replacement, what is the probability that at least one is found to be delinquent? • Need to find P(x ≥ 1) = 1 - P(x = 0) Use: N = 50 = The population size R = 5 = The number of successes in the population n = 10 =The sample size x = 0 = The number of successes in the sample
The Standard Normal Distribution
When the original random variable, x, follows the normal distribution, z-scores also follow a normal distribution with μ = 0 and σ = 1 This is known as the standard normal distribution
Empirical Probability
With Classical probability "There are 4 aces in a deck of 52 cards, so the probability of drawing an ace is 4/52. - Empirical probability: Involves conducting an experiment to observe the frequency with which an event occurs. Requires that you count the frequency that an event occurs through an experiment and calculate the probability form the experiment's relative frequency distribution. P(A) = Frequency in which Event A occurs/ Total number of observations chp 4 pg 14 and 15
Uniform Probability Distributions
With the continuous uniform probability distribution, the probability of any interval in the distribution is equal to any other interval with the same width https://www.youtube.com/watch?v=m1vXj- 6Asik
Parameter
a described characteristic about a population
Statistics
a described characteristic about a sample
The uniform distribution
describes data where all the values have the same chance of occurring
Advantages and Disadvantages of Using the Mean to Summarize Data
advatages • Simple to calculate • Summarizes the data with a single value Disadvantages: • With only a summary value you lose information about the original data • Sample 1 with n = 3: 999, 1000, 1001 = 1000 • Sample 2 with n = 3: 0, 1000, 2000 = 1000 • Just knowing the mean does not help you know what the underlying data looks like
weighted mean
allows you to assign more weight to certain values and less weight to others • Formula for the Weighted Mean:
Pareto charts
also plot the cumulative relative frequency as a line on the chart known as an ogive
Continuous random variables
are outcomes that take on any numerical value in an interval, as determined by conducting an experiment •Usually measured rather than counted • Examples of continuous data include time, distance, and weight The purpose of this chapter is to identify the probability that a specified range of values will occur for continuous random variables, using continuous probability distributions
Combinations
are the number of different ways in which objects can be arranged without regard to order Formula for the Combinations of n Objects Selected x at a Time: chp 4 pg 52 and 53
Permutations
are the number of different ways in which objects can be arranged in order: 123, 132, 213, 231, 312, 321 The number of permutations of n distinct objects is n! n! = n(n - 1)(n - 2)...(2)(1) By definition, 0! = 1 chp 4 pg 51
Contingency Tables with Probabilities Decision trees :
are used to display marginal and joint probabilities from a contingency table
Discrete data
are values based on observations that can be counted and are typically represented by whole numbers • represent something that has been counted • take on whole numbers such as 0, 1, 2, 3 Because discrete data can be counted, they have a finite number of values within an interval,
Displaying Qualitative data
are values that are categorical • Can be nominal or ordinal measurement level •Describe a characteristic, such as gender or level of education
Continuous data
are values that can take on any real numbers, including numbers that contain decimal points • usually measured rather than counted • Examples are weight, time, and distance whereas continuous data have an inifinite number of values available
standard deviation vs variance
both derived from mean of a given data set - both are statistical measures of dispersion of data. They represent how much variation there is from the average or to what extent the values typically "deviate" from the mean Examples: data set includes the height of six dandelions. The variance is 7.25 and SD is 2.69. This means that any dandelion within 2.69 inches of the mean (5.5 inches) is norma
contingency table
can be used to show the number of occurrences of events that are classified according to two categorical variables
Continuous probability distributions
can have a variety of shapes CHP 6 PG 7
Bias
can occur when a question is stated in a way that encourages or leads a respondent to a particular answer
Continuous random variables
can take on any value within a specified interval Because there are an infinite number of possible values, the probability of one specific value occurring is theoretically equal to zero Probabilities are based on intervals, not individual values • Probability is represented by an area under the probability distribution
The formula for the Sample Mean from Grouped
chp 3 pg 52
Find the midpoint for each class
chp 3 pg 54 chp 3 pg 53 54 55
Classical Probability
chp 4 pg 8 Used when the number of possible outcomes of the event of interest is known • Requires that you know the number of outcomes that pertain to a particular event. You also need to know the total number of possible outcomes in the sample space • Formula for classical probability P(A) = Number of possible outcomes that constitute Event A/ Total number of possible outcomes in the sample space where: P(A) = The probability that Event A will occur
The Mean of a Discrete Probability Distribution
chp 5 pg 10,11 The mean, μ, of a discrete probability distribution is the weighted average of the outcomes of the random variables that comprise it Also known as the expected value, E(x) μ = The mean of the discrete probability distribution xi = The value of the random variable for the i th outcome P(xi ) = The probability that the i th outcome will occur n = The number of outcomes in the distribution
Formula for the Variance of a Poisson Distribution
chp 5 pg 37 The variance of the distribution is the same as the mean EXAMPLE: If a bank receives an average of λ = 4 bad checks per week, what is the probability that it will receive 3 bad checks next week? Solution: λ = 4 and x = 3, There is about a 19.5% chance that the bank will receive 3 bad checks next week. FORMULA SOLUTION FOR EXAMPLE OS CHP 5 PG 38
The goal is to create a histogram to ___________ and __________ show the pattern in the data
clearly and usefully
Measures of relative position
compare the position of one value in relation to other values in the data set Measures of Relative Position= Percentiles and Quartiles
Secondary data
data collected by someone else
Nominal Data:
data described as a category or labels Examples: gender (female/male) marital status (married, single, divorced, widowed) NO RANKING ALLOWED EX: ZIPCODES
Information
data that are transformed into useful facts that can be used for a specific purpose, such as making a decision
Primarty Data
data that you have collected for your own use
sampling distribution of the proportion
describes the pattern that sample proportions tend to follow when randomly drawn from a population Suppose that CBS, the network that broadcasted the 2013 super bowl, estimated that 45% of the U.S. households would tune in to the game when they established the cost of a 30 second commercial before the big event. Also, suppose that Coca-Colas, one of the game's advertisers, wanted to verify this claim independently. After the game, Coca-Cola randomly selected 200 household and found that 84 of them watched the Super Bowl. Based on this sample, can Cocoa-Cola validate the claim made by CBS? CHP 7 PG 47-48 x = The number of observations of interest in the sample (successes) n = Sample size (trials) p = Population proportion n = Sample size CHP 7 48-53
Relative frequency distributions
display the proportion of observations of each class relative to the total number of observations •shows the fraction of observations in each class •found by dividing each frequency by the total number of observations •the fractions in a relative frequency distribution add up to 1.00 EXAMPLE SO IF THERE are 50 past days and there is a graph which sows 0=5 4=6 -2=4 since its past 50 days you do 5 divided bu 50 and 4 divided by 50 and6 divided by 50 etc.
Stratified sampling
divides the population into mutually exclusive groups, or strata (recall chapter 4 that 2 events are considered to be mutually exclusive if they cannot occur at the same time during an experiment.) • A random sample from each strata is selected • Strata are based on important variables that can have an impact on the data collected and the results that are achieved • Example: We used stratified sampling because we felt that the class the students belonged towhether they were freshmen, sophomores, juniors or seniors- was an important factor in how they would respond to your survey. This helps ensure that the sample is representative of the overall population.
In systematic sampling,
every k the member of the population is chosen for the sample. The value of k is determined by dividing the size of the population (N ) by the size of the sample (n). This means every second or every third member and so on can be chosen
percentage polygon
graphs the midpoint of each class as a line rather than a column • The height of each midpoint represents the relative frequency of the corresponding class • Used to compare the shape of two or more distributions on one graph
Microsoft Excel
has built-in options for data presentation and statistical analysis
Ratio level of measurement:
have all the features of internal data with the added benefit of having a true zero point Example: salary ( 0 salary no money), money $ 20 twice as much as $10 MEANING DIFFERENCES TRUE ZERO POINT INCOME ($48,000,$0)
Ordinal Data
have all the properties of nominal data, with the added feature that we can rank-order the values from highest to lowest Examples: educational level RANKING ALLOWED NO MEASURE MEANING TO THE NUMBER DIFFERENCES EX:EDUCATION LEVEL (MASTES,BACH,AA)
Continuous random variables
have outcomes that take on any numerical value as a result of conducting an experiment ex: length of time a customer waits in the checkout line at whole foods; ounces of soda consumed by an adult in 1 month.
Discrete random variables
have outcomes that typically take on whole numbers as a result of conducting an experiment
Revisiting the Empirical Rule
https://www.youtube.com/watch?v=ykmT12Ipigc CHP 6 PG 29AND 30
the percentile rank
identifies the percentile of a particular value within a set of data Formula to find the approximate percentile rank for a value x: formula is chp 3 pg 62
standard deviation
is the square root of the variance • Has the same units as the original data Sample standard deviation formula: chp 3 powerpoint pg 32 and 33 4 6 8 9 11 12 12 18 n = 8 Mean = = 10 formula (square root) 130/7=18.571 (answer) because 18.571 is in sqaure root then the answer REALLY is 4.309
median formula:i = 0.5(n)
is the value in the data set for which half the observations are higher and half the observations are lower • First arrange the data in ascending order •Use an Index Point to determine the position of the median in the data set (middle of the data value) Formula for the Index Point for the Median:
• Cluster sampling
involves dividing the population into mutually exclusive groups, or clusters, that are each representative of the population • Then randomly select clusters to form the final sample • These clusters are often selected based on geography to help simplify the sampling process
A simple random sample
is a sample in which every member of the population has an equal chance of being chosen
A nonprobability sample
is a sample in which the probability of a population member being selected for the sample is not known
The exponential probability distribution
is another common continuous distribution • Commonly used to measure the time between events of interest • Examples: • the time between customer arrivals • the time between failures in a business process
A convenience sample
is used when sample values are selected simply because they are easily accessible Convenience Nonprobability Sampling • Advantages: • Quick and easy to get sample data • Provides general information about the population • Disadvantages: • May not be representative of the population. Ex: choosing current stat class as sample to provide feedback on this text book. This may not represent all of the students in the nation who read stats books
hypergeometric distribution
is used when samples are taken from a finite population without being replaced. Samples are no longer independent in this case. Under these conditions the probabilities of success change repeatedly because the sample space becomes smaller after each selection.
A discrete probability distribution
is • a listing of all the possible outcomes of an experiment for a discrete random variable along with the relative frequency of each outcome
A probability sample
is a sample in which each member of the population has a known, nonzero, chance of being selected for the sample Advantage: can perform inferential statistical tests to draw reliable conclusions about the population
The standard deviation
is a common measure of consistency in business applications, such as quality control • The standard deviation measures the amount of variability around the mean The standard deviation is affected by the scale of the data •When sample means are different, comparing standard deviations can be misleading
A histogram
is a graph showing the number of observations in each class of a frequency distribution • Excel uses the term "bins" for the classes in the distribution
box-and-whisker plot
is a graphical display showing the relative position of the three quartiles as a box on a number line It also shows the minimum and maximum values in the data set and any outliers
The cumulative percentage polygon, or ogive
is a line graph that plots the cumulative relative frequency distribution
probability
is a numerical value ranging from 0 to 1 Probability indicates the chance, or likelihood, of a specific event occurring •If there is no chance of the event occurring, the probability is 0 •If the event is absolutely going to occur, the probability of it occurring is 1
Line Chart
is a scatter plot in which the data points in the scatter plot are connected with line segments • Often used with time series data When graphing a time series the convention is to place the time data on the horizontal axis
Central tendency
is a single value used to describe the center point of a data set
Resampling
is a statistical technique where many samples are repeatedly drawn from a population One type of resampling methods is the bootstrap method •Involves using computer software to extract many samples with replacement in order to estimate a parameter of the population, such as a mean or proportion
PHStat
is an Excel Add-in developed by Prentice Hall to provide students with additional features for statistical analysis
The expected monetary value (EMV)
is the mean of a discrete probability distribution when the discrete random variable is expressed in terms of dollars • The EMV represents a long-term average, as if outcomes from the distribution occurred many times chp 5 pg 17
The mean, or average,
is the most common measure of central tendency • Calculate the mean by adding all the values in a data set and then dividing the result by the number of observations
Conditional probability
is the probability of Event A occurring, given the condition that Event B has occurred
Conditional probability
is the probability of Event A occurring, given the condition that Event B has occurred
Mode
is the value that appears most often in a data set • If no data value or category repeats more than once, then we say that the mode does not exist • more than one mode can exist if two or more values tie for most frequent The mode is a particularly useful way to describe categorical data
The binomial probability distribution
is used to calculate the probability of a specific number of successes (x) for a certain number of trials (n), given specified probability of success (p) and probability of failure (q) Formula for the probability of exactly x successes from n trials: FORMULA CHP 5 PG 21 P(x,n) = The probability of observing x successes in n trials n = Number of trials x = Number of successes p = Probability of a success q = Probability of a failure EXAMPLE" Example: 40% of all voters support Proposition A. If a random sample of 10 voters is polled, what is the probability that exactly five of them support the proposition? find P(x = 5) if n = 10, p = 0.4, and q = 0.6 pg 22 chp 5
The addition rule for probabilities
is used to calculate the probability of the union of events • the probability that Event A, or Event B, or both events will occur Two events are considered to be mutually exclusive if they cannot occur at the same time during the experiment
The exponential distribution
is used to describe data where lower values tend to dominate and higher values don't occur very often
The multiplication rule
is used to determine the probability of the intersection (joint probability) of two events occurring, or P(A and B) Formula for the multiplication rule for dependent events: chp 4 pg 43 and 45
Poisson distribution
is useful for calculating the probability that a certain number of events will occur over a specific interval of time or space; Counting number of success in a given time interval Examples: •Number of customers per hour •Number of flaws per meter of cloth •Number of accidents per month
The normal probability distribution
is useful when the data tend to fall into the center of the distribution and when very high and very low values are fairly rare
Inferential statistics
making claims or conclusions about the data based on a sample • Population: represents all possible subjects that are of interest to us in a particular study • Sample: refers to a portion of the population that is representative of the population from which it was selected.
Variance
mean is simply the average of all data points, the variance measures the average degree to which each point differs from the mean. (the greater the variance, the larger the overall data range.) - Variance http://www.investopedia.com/terms/v/variance.asp
Percentiles
measure the approximate percentage of values in the data set that are below the value of interest The pth percentile of a data set (where p is any number between 1 and 100) is the value that at least p percent of the observations will fall below Examples: •20% of the data values are below the 20th percentile •73% of the data values are below the 73rd percentiles
The coefficient of variation, CV,
measures the standard deviation in terms of its percentage of the mean • A high CV indicates high variability relative to the size of the mean • A low CV indicates low variability relative to the size of the mean A smaller coefficient of variation indicates more consistency within a set of data values A smaller coefficient of variation indicates more consistency within a set of data values Pg 104 Nike vs google stock price. Nike 7.4 % and google 6.7 %. Even though Google's stock price has a higher standard deviation than Nike's does, it is more consistent because the coefficient of the variation is lower In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. In simple language, the lower the ratio of standard deviation to mean return, the better your riskreturn tradeoff.
Nonsampling errors
occur as a result of issues such as • ambiguous survey questions • questions that lead respondents to a certain "correct" answer • data collection errors These are errors not related to sampling variability
The intersection
of Events A and B represents the number of instances in which Events A and B occur at the same time
According to the Central Limit Theorem
sample means from samples of sufficient size, drawn from any population, will be normally distributed •In most cases, sample sizes of 30 or larger will result in sample means being normally distributed, regardless of the shape of the population distribution •If the population follows the normal probability distribution, the sample means will also be normally distributed, regardless of the size of the samples
Measures of variability Range Variance-for a sample and for a population Standard Deviation- for a sample and for a population
show how much spread is present in the data.
Quartiles
split the ranked data into 4 equal groups: •The first quartile (Q1) is the value that constitutes the 25th percentile •The second quartile (Q2) is the value that constitutes the 50th percentile •Note that the second quartile (the 50th percentile) is the median •The third quartile (Q3) is the value that constitutes the 75th percentile CHP 3 PG 65
stem and leaf display
splits the data values into stems (the larger place values) and leaves (the smaller place value) By listing all of the leaves to the right of each stem, we can graphically describe how the data are distributed •All the original data points are visible on the display • Easy to construct by hand • Provides a histogram-like view of the distribution
Chebyshev's Theorem
states that for any number z greater than 1, the percent of the values that fall within z standard deviations above and below the mean will be at least pg 50 chp 3 Applies regardless of the shape of the distribution •At least 75% of the data values will fall within ±2 standard deviations around the mean •At least 89% of the data values will fall within ±3 standard deviations around the mean •At least 94% of the data values must fall within ±4 standard deviations around the mean
The Central Limit Theorem
states that the sample means of large-sized samples will be normally distributed regardless of the shape of their population distributions • A key concept to be used repeatedly throughout the rest of the book
law of large numbers
states that when an experiment is conducted a large number of times, the empirical probabilities of the process will converge to the classical probabilities Example: Flip a coin a large number of times • The observed number of heads would be very close to 50%
Interval measurement level:
strictly quantitative, allow us to measure the differences between the categories with actual numbers in a meaningful way Examples: temperature measurements, GPA; do not have a true zero point. The term true zero point means that a zero data value indicates the absence of the object being measured MEANINGFUL DIFFERENCES NO TRUE ZERO POINT EX:CALENDAR YEAR (2014,2015)
Short-Cut Formula Example pg 97
sum of the data values=186 sum of the square data values=5,952 so (186) square root of 2= 34,596 5952-34,596/6/6=5952-5776/6=31 chp3 powerpoint pg 37
Sample Covariance
sxy , measures the direction of the linear relationship between two variables • A relationship is linear if the scatter plot of the independent and dependent variables has a straight-line pattern • If the linear relationship between x and y are positive that means that as the value of x increases, the value of y tends to increase. chp 3 pg 75
Distribution Shape symmetric LeftSkewed RightSkewed
symmetric is when the top is in the middle and then to right it goes down but also the left LeftSkewed- from small to big but starting on the left side RightSkewed-from small to big but starting on the right side If the mean is greater than the median, the shape of the distribution is said to be right-skewed. If the median is greater than the mean, the distribution is left-skewed. If the mean and median are close (or equal), the distribution is said to be symmetric
cumulative relative frequency distribution
totals the proportion of observations that are less than or equal to the class at which you are looking • Shows the accumulated proportion as values vary from low to high • Example: if the manger of the apple store wanted to determine the percentage of days that three or fewer ipads were sold. which is the set of numbers which made you find the relative frequency so you get the first relative frequency from hen you did it. and then you add it with the next frequency so .10+.16=.26 .26+ .28= .54 .54+ .26= .80 and so on
Data
values assigned to observations or measurements and are the foundation of statistics
Cross Section Data
values collected from a number of subjects during a single time period ex: an unemployment graph which shows on the left the amount # but on the bottom it shows what state (US, Canada, etc)
Time Series Data
values that correspond to specific measurements taken over a range of time periods EX:unemployment graph which on the left shows how many but on the bottom is shows 2208,2009,2010 and 2011
Features of z-scores for Normal Distributions Using Normal Probability Tables
z-scores are negative for values of x that are less than the distribution mean • z-scores are positive for values of x that are more than the distribution mean • The z-score at the mean of the distribution equals zero
Descriptive statistics
• Collecting, summarizing, and displaying data. Allows us to get an overview of the information. Can be useful, but has limitations. By summarizing large quantities of data, you lose information
why sample
• Examining the entire population would be expensive and time consuming • Can't examine everything if the test is destructive If a sample is selected properly and the analysis performed correctly, sample information can be used to make an accurate assessment of the entire population
Bias
• Example: "Do you agree that the current overly complex tax code should be simplified and made more fair?
Discrete data examples
• Number of children per family • Number of cars listed per insurance policy • Vacation days per month
Once k is known, the width of each class can be found
• The width is the range of numbers to put into each class • Round this estimate to a useful whole number that makes the frequency distribution more readable • 17.4-.06/6= 2.8 round to 3 (info from table 2.6 pg 31) k
Sample
•refers to a portion of the population that is representative of the population from which it was selected