OMIS 380 - TEST 2 (Chapter 5-7)
A sampling plan states:
1. The objectives of the sampling activity 2. The target population 3. The population frame (the list from which the sample is selected) 4. The method of samplings 5. The operational procedures for collecting the data, and 6. The statistical tools that will be used to analyze the data
Chi-Square approach
breaks down the theoretical distribution into areas of equal probability and compares the data points within each area to the number that would be expected for that distribution. If you use the chi-square, you should have at least 50 data points;
Binomial Distribution Models (shapes/skewness)
can assume different shapes and amounts of skewness, depending on the parameters
Uniform Distribution (def)
characterizes a continuous random variable for which all outcomes between a minimum (a) and a maximum (b) are equally likely.
Bernoulli Distribution (def)
characterizes a random variable having two possible outcomes (success and failure), each with a constant probability of occurrence.
Probability Density Function
characterizes outcomes of a continuous random variable and calculates the probability of a random variable lying with a certain interval f(x) >= 0 for all values of x. This means that a graph of the density fcn must lie at or above the x-axis.
The Kolmogorov-Smirnov procedure
compares the cumulative distribution of the data with the theoretical distribution and bases its conclusion on the largest vertical distance between them. for small samples, the Kolmogorov-Smirnov test generally works better.
Triangular Distribution
defined by 3 parameters: the minimum, a the maximum, b and most likely, c Often used when no data are available to characterize an uncertain variable and the distribution must be estimated judgmentally.
Cumulative Distribution Function (CDF)
defines or specifies the probability that a random variable, X, takes on a value equal to or less than a specified value, x. Represents the sum, or cumulative value, of the probabilities of the outcomes up to and including a specific outcome. F(x) = P(X ≤ x)
Level of Confidence
denoted by 1 - alpha, where alpha is a number between 0 and 1. usually expressed as a percent; common values are 90%, 95%, or 99%.
Judgment sampling
expert judgment is used to select the sample (survey the "best" customers)
Cumulative distribution function
f(x) = 0, if x < a x - a/b - a, if a <= x <= b 1, if b < x Expected Value = (a + b)/2; Variance = (b - a)2/12 (squared)
Density function
f(x) = 1/b-a, for a<= x <= b
Normal Distribution
f(x) is a bell-shaped curve; characterized by 2 parameters MU (mean) and SD (standard dev.) ppt 67 - whole slide
continuous random variable
has outcomes over one or more continuous intervals of real numbers
[having children example]
having children: boy/girl first pregnancy does not have an impact on P(B) or P(G) on next pregnancy - INDEPENDENT
Estimating Population Parameters
involves assessing the value of an unknown population parameter - such as a population mean, population proportion, or population variance - using sample data.
Simple random sampling
involves selecting items from a population so that every subset of a given size has an equal chance of being selected. If the population data are stored in a database, simple random samples can generally be easily obtained.
Probabilistic sampling
involves selecting the items in the sample using some random procedure. Is necessary to draw valid statistical conclusions.
Outcome
is a result that we observe
The Anderson-Darling method
is similar but puts more weight on the differences between the tails of the distributions. This approach is useful when you need a better fit at the extreme tails of the distribution.
Probability
likelihood that a particular event/outcome will happen/occur. Probabilities are expressed as values between 0 and 1.
Binomial Distribution (def)
models n independent replications of a Bernoulli experiment, each with a probability p of success. X represents the number of successes in the n experiments
Sampling (statistical) Error
occurs because samples are only a subset of the total population. is inherent in any sampling process, and although it can be minimized, it cannot be totally avoided.
Expected value of a discrete random variable
the expected value of a random variable corresponds to the notion of the mean, or average, for a sample
Discrete Random Variable (X)
the expected value, denoted E[X], is the weighted average of all possible outcomes, where the weights are the probabilities: see ppt 45 for equation
Estimators
the measures used to estimate population parameters. for example...we use the sample mean x-bar to estimate a population mean mu.
Lognormal Distribution
the natural logarithm of a random variable X is normal, then X has a lognormal distribution. Often used for "spiked" service times, that is, when the probability of zero is very low, but the most likely value is just greater than zero.
Standard Normal Distribution (def)
the normal distribution with a mean of 0 and std dev = 1 This distribution is important in performing many probability calculations. A std normal random variable is usually denoted by Z, and its density function by f(z). The scale along the Z-axis represents the number of std dev from the mean of zero. Pg. 156 Textbook
Marginal Probability
the probability of a single event, irrespective of the outcome, without consideration of any other joint event
Conditional Probability
the probability of occurrence of one event ( A ), given that another event ( B ) is known to be true or has already occurred.
joint probabilities
the probability of the intersection of two events
Poisson Distribution
there is no limit on the number of occurrences The average number of occurrence per unit is a constant denoted as lambda. see ppt 59 for Prob Mass Function
Union of Events
two events contains all outcomes that belong to EITHER of the two events.
nonsampling error
when the sample does not represent the target population adequately. Generally a result of poor sample design, such as using a convenience sample when a simple random sample would have been more appropriate or choosing the wrong population frame. It may also result from inadequate data reliability.
Standard Normal Distribution (excel function)
=NORM S. DIST (z) finds probabilities for the standard normal distribution
Fitting approach
A better approach is to identify the underlying probability distribution from which sample data come by "fitting" a theoretical distribution to the data and verifying the goodness of fit statistically
Goodness of Fit
A better approach that simply visually examining a histogram and summary statistics is to analytically fit the data to the best type of probability distribution.
Random Sampling
A random number is one that is uniformly distributed between 0 to 1. Excel Function =RAND( )
Random Variate
A value randomly generated from a specific probability distribution
Sampling from Common Probability Distributions
A value randomly generated from a specified probability distribution is called a random variate
Non-mutually exclusive events
Adding two events probabilities would result in double-counting some outcomes, so an adjustment is necessary.
Binomial Distribution (Probability Mass Function)
BINOM.DIST(number_s, trials, probability_s, cumulative) where number_s plays the role of x, and probability_s is the same as p. If cumulative is set to TRUE, then this fcn will provide cumulative probabilities; otherwise the default is FALSE, and it provides values of the probability mass function, f(x) f(x) = see ppt 55 and see pg 147 in textbook
Using sample data
Can limit the ability to predict uncertain events because potential values outside the range of the sample data are not included.
Determining how well sample data fits a distribution is typically measured using what?
Chi-Squared Kolmogorov-Smirnov Anderson-Darling Statistics pg. 170
Definitions of Probability
Classical Definition Relative Frequency Definition Subjective Definition
Example 5.5: Computing the probability of Mutually Exclusive Events
Dice Example: A = {7, 11}: P(A) = 8/36 B = {2, 3, 12}: P(B) = 4/36 P(A or B) = Union of events A and B = P(A) + P(B) = 8/36 + 4/36 = 12/36
Ex 5.4: Computing the Probability of the Complement of an Event
Dice example: A = {7, 11} P(A) = 8/36 Ac = {2, 3, 4, 5, 6, 8, 9, 10, 12} Using Rule 2: P(Ac) = 1 - 8/36 = 28/36
Bernoulli Distribution (Probability Mass Function)
E[X] = p Var[X] = p(1-p) f(x) = p if x = 1 f(x) = 1-p if x = 0
See Example 5.7: Applying Probability Rules to Joint Events
Energy Drink Survey (ppt 17 - 18)
Basic Concepts of Probability
Experiment Outcome Sample Space
Union of Two Events
If A and B are events, the two events, the probability that some outcome in EITHER A or B (that is, the union of A and B) occurs is denoted as P(A or B).
Complement of an Event
If A is any event, the complement of A, denoted Ac, consists of all outcomes in the sample space not in A.
Exponential Distribution Examples
It is often used in such applications as modeling the time between customer arrivals to a service system or the time to or between failures of machines, lightbulbs, hard drives and other mechanical or electrical components. Pg 158 textbook
Beta Distribution
One of the most flexible distributions for modeling variation over a fixed interval from 0 to a positive value.
Variations of the Conditional Probability Formula
P(A|B) = P(A and B) / P(B) P(A and B) = P(A | B) P(B) P(B and A) = P(B | A) P(A) note: P(A and B) = P(B and A)
Conditional Probability Formula
P(A|B) = P(A and B) / P(B) We read the notation P(A|B) as "the probability of A given B"
Understand Example 5.1
Roll 2 dice simultaneously (at the same time). What is the Frequency or # of ways to get sum of dice.
Example 5.3 -Computing the Probability of an Event
Rolling 7 or 11 on two dice : Probability = of 7 is 6 ways to obtain a 7 in a roll (6/36); Probability of 11 is 2 ways to obtain an 11 in a roll (2/36). Probability = 6/36 + 2/36 = 8/36
Probabilities Associated with Events
Rule 1. The probability of any event is the sum of the probabilities of the outcomes that comprise that event.
Complement of an Event
Rule 2. The probability of the complement of any event A is P(Ac) = 1-P(A).
Mutually Exclusive
Rule 3: If events A and B are mutually exclusive, then P(A or B) = P(A) + P(B)
Non-Mutually exclusive events rule
Rule 4: If two events A and B are NOT mutually exclusive, then P(A or B) = P(A) + P(B) - P(A and B). Here (A and B) represents the INTERSECTION of events A and B -- that is, all outcomes belonging to both A and B.
Normal Distribution Properties
Symmetric - so its measure of skewness is 0 Mean = Median = Mode are all equal; thus half the area falls above the mean and half falls below it Range of X is unbounded, meaning the tails of the dist. extend to negative and positive infinity Empirical Rules apply (exactly for the normal distribution; the area under the density function with +-1 std dev. is 68.3%, +-2 std dev. is 95.4% and +-3 std dev. is 99.7% page 154 textbook
Sample Space
The collection of all possible outcomes in an experiment.
Independent Events
The outcome of one event (A) does not affect the outcome of the second event (B). Two events A and B are independent if P(A \ B) = P(A)
Multiplication Law of Probability
The probability of two events 'A' and 'B' is the product of the probability of 'A' given 'B', and the probability of 'B' (or ) the product of the probability of 'B' given 'A', and the probability of 'A'. P(A and B) = P(A | B) P(B) = P(B | A) P(A)
[energy drink survey]: the probability of preferring a brand DEPENDS on gender.
Thus we may say that brand preference and gender are not independent
Types of Continuous Distributions
Triangular Lognormal Beta Pg 160 textbook
Mutually exclusive events
Two events that cannot occur at the same time (i.e., they have no outcomes in common).
Read Page 186
Understand the difference in the formulas for population and sample variance
Data Modeling and Distribution Fitting
Using sample data may limit our ability to predict uncertain events that may occur because potential values outside the range of the sample data are not included.
Variance of a Discrete Random Variable
Var[X], of a discrete random variable X is a weighted average of the squared deviations from the expected value: see ppt 51 for equation
Probability distribution
We may develop a probability distribution using any one of the three perspectives of probability: Classical, relative frequency, and subjective.
Probability Distributions
a characterization of the possible values that a random variable may assume along with the probability of assuming these values.
Event
a collection of one or more outcomes from a sample space
Normal Distribution (def)
a continuous distribution that is described by the familiar bell-shaped curve and is perhaps the most important distribution used in statistics.
Exponential Distribution (def)
a continuous distribution that models the time between randomly occurring events.
Sampling Plan
a description of the approach that is used to obtain samples from a population prior to any data collection activity
Poisson Distribution (def)
a discrete distribution used to model the number of occurrences in some unit of measure (often time or distance)-- ex., the number of customers arriving at Subway (see ppt 60)
Random Variables
a numerical description of the outcome of an experiment
confidence interval
a range of values between which the value of the population parameter is believed to be, along with a probability that the interval correctly estimates the true (unknown) population parameter.
Systematic (Periodic) Sampling
a sampling plan that selects every nth item from the population
point estimate
a single number derived from sample data that is used to estimate the value of a population parameter
T-distribution
actually a family of probability distributions with a shape similar to the standard normal distribution.
Stratified Sampling
applies to populations that are divided into natural subsets (called strata) and allocates the appropriate proportion of samples to each stratum
Cluster Sampling
based on dividing a population into subgroups (clusters), sampling a set of clusters, and (usually) conducting a complete census with the clusters sampled.
discrete random variable
one for which the number of possible outcomes can be counted.
Central Limit Theorem
one of the most important practical results in statistics that makes systematic inference possible. (read all page 190)
Bernoulli Distribution (p and 1-p)
p is the probability of a success and 1-p is the probability of a failure. Typically, x = 1 represents "success" and x = 0 represents "failure"
Relative Frequency Definition
probabilities are based on empirical data
Subjective Definition
probabilities are based on judgment and experience
Classical Definition
probabilities can be deduced from theoretical arguments
Experiment
process that tests a hypothesis by collecting information under controlled conditions. process that results in an outcome
Interval estimate
provides a range for a population characteristic based on a sample.
Chi-Square def
relating to or denoting a statistical method assessing the goodness of fit between observed values and those expected theoretically
Convenience sampling
samples are selected based on the ease with which the data can be collected ( survey all customers who happen to visit this month).