OMIS 380 - TEST 2 (Chapter 5-7)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A sampling plan states:

1. The objectives of the sampling activity 2. The target population 3. The population frame (the list from which the sample is selected) 4. The method of samplings 5. The operational procedures for collecting the data, and 6. The statistical tools that will be used to analyze the data

Chi-Square approach

breaks down the theoretical distribution into areas of equal probability and compares the data points within each area to the number that would be expected for that distribution. If you use the chi-square, you should have at least 50 data points;

Binomial Distribution Models (shapes/skewness)

can assume different shapes and amounts of skewness, depending on the parameters

Uniform Distribution (def)

characterizes a continuous random variable for which all outcomes between a minimum (a) and a maximum (b) are equally likely.

Bernoulli Distribution (def)

characterizes a random variable having two possible outcomes (success and failure), each with a constant probability of occurrence.

Probability Density Function

characterizes outcomes of a continuous random variable and calculates the probability of a random variable lying with a certain interval f(x) >= 0 for all values of x. This means that a graph of the density fcn must lie at or above the x-axis.

The Kolmogorov-Smirnov procedure

compares the cumulative distribution of the data with the theoretical distribution and bases its conclusion on the largest vertical distance between them. for small samples, the Kolmogorov-Smirnov test generally works better.

Triangular Distribution

defined by 3 parameters: the minimum, a the maximum, b and most likely, c Often used when no data are available to characterize an uncertain variable and the distribution must be estimated judgmentally.

Cumulative Distribution Function (CDF)

defines or specifies the probability that a random variable, X, takes on a value equal to or less than a specified value, x. Represents the sum, or cumulative value, of the probabilities of the outcomes up to and including a specific outcome. F(x) = P(X ≤ x)

Level of Confidence

denoted by 1 - alpha, where alpha is a number between 0 and 1. usually expressed as a percent; common values are 90%, 95%, or 99%.

Judgment sampling

expert judgment is used to select the sample (survey the "best" customers)

Cumulative distribution function

f(x) = 0, if x < a x - a/b - a, if a <= x <= b 1, if b < x Expected Value = (a + b)/2; Variance = (b - a)2/12 (squared)

Density function

f(x) = 1/b-a, for a<= x <= b

Normal Distribution

f(x) is a bell-shaped curve; characterized by 2 parameters MU (mean) and SD (standard dev.) ppt 67 - whole slide

continuous random variable

has outcomes over one or more continuous intervals of real numbers

[having children example]

having children: boy/girl first pregnancy does not have an impact on P(B) or P(G) on next pregnancy - INDEPENDENT

Estimating Population Parameters

involves assessing the value of an unknown population parameter - such as a population mean, population proportion, or population variance - using sample data.

Simple random sampling

involves selecting items from a population so that every subset of a given size has an equal chance of being selected. If the population data are stored in a database, simple random samples can generally be easily obtained.

Probabilistic sampling

involves selecting the items in the sample using some random procedure. Is necessary to draw valid statistical conclusions.

Outcome

is a result that we observe

The Anderson-Darling method

is similar but puts more weight on the differences between the tails of the distributions. This approach is useful when you need a better fit at the extreme tails of the distribution.

Probability

likelihood that a particular event/outcome will happen/occur. Probabilities are expressed as values between 0 and 1.

Binomial Distribution (def)

models n independent replications of a Bernoulli experiment, each with a probability p of success. X represents the number of successes in the n experiments

Sampling (statistical) Error

occurs because samples are only a subset of the total population. is inherent in any sampling process, and although it can be minimized, it cannot be totally avoided.

Expected value of a discrete random variable

the expected value of a random variable corresponds to the notion of the mean, or average, for a sample

Discrete Random Variable (X)

the expected value, denoted E[X], is the weighted average of all possible outcomes, where the weights are the probabilities: see ppt 45 for equation

Estimators

the measures used to estimate population parameters. for example...we use the sample mean x-bar to estimate a population mean mu.

Lognormal Distribution

the natural logarithm of a random variable X is normal, then X has a lognormal distribution. Often used for "spiked" service times, that is, when the probability of zero is very low, but the most likely value is just greater than zero.

Standard Normal Distribution (def)

the normal distribution with a mean of 0 and std dev = 1 This distribution is important in performing many probability calculations. A std normal random variable is usually denoted by Z, and its density function by f(z). The scale along the Z-axis represents the number of std dev from the mean of zero. Pg. 156 Textbook

Marginal Probability

the probability of a single event, irrespective of the outcome, without consideration of any other joint event

Conditional Probability

the probability of occurrence of one event ( A ), given that another event ( B ) is known to be true or has already occurred.

joint probabilities

the probability of the intersection of two events

Poisson Distribution

there is no limit on the number of occurrences The average number of occurrence per unit is a constant denoted as lambda. see ppt 59 for Prob Mass Function

Union of Events

two events contains all outcomes that belong to EITHER of the two events.

nonsampling error

when the sample does not represent the target population adequately. Generally a result of poor sample design, such as using a convenience sample when a simple random sample would have been more appropriate or choosing the wrong population frame. It may also result from inadequate data reliability.

Standard Normal Distribution (excel function)

=NORM S. DIST (z) finds probabilities for the standard normal distribution

Fitting approach

A better approach is to identify the underlying probability distribution from which sample data come by "fitting" a theoretical distribution to the data and verifying the goodness of fit statistically

Goodness of Fit

A better approach that simply visually examining a histogram and summary statistics is to analytically fit the data to the best type of probability distribution.

Random Sampling

A random number is one that is uniformly distributed between 0 to 1. Excel Function =RAND( )

Random Variate

A value randomly generated from a specific probability distribution

Sampling from Common Probability Distributions

A value randomly generated from a specified probability distribution is called a random variate

Non-mutually exclusive events

Adding two events probabilities would result in double-counting some outcomes, so an adjustment is necessary.

Binomial Distribution (Probability Mass Function)

BINOM.DIST(number_s, trials, probability_s, cumulative) where number_s plays the role of x, and probability_s is the same as p. If cumulative is set to TRUE, then this fcn will provide cumulative probabilities; otherwise the default is FALSE, and it provides values of the probability mass function, f(x) f(x) = see ppt 55 and see pg 147 in textbook

Using sample data

Can limit the ability to predict uncertain events because potential values outside the range of the sample data are not included.

Determining how well sample data fits a distribution is typically measured using what?

Chi-Squared Kolmogorov-Smirnov Anderson-Darling Statistics pg. 170

Definitions of Probability

Classical Definition Relative Frequency Definition Subjective Definition

Example 5.5: Computing the probability of Mutually Exclusive Events

Dice Example: A = {7, 11}: P(A) = 8/36 B = {2, 3, 12}: P(B) = 4/36 P(A or B) = Union of events A and B = P(A) + P(B) = 8/36 + 4/36 = 12/36

Ex 5.4: Computing the Probability of the Complement of an Event

Dice example: A = {7, 11} P(A) = 8/36 Ac = {2, 3, 4, 5, 6, 8, 9, 10, 12} Using Rule 2: P(Ac) = 1 - 8/36 = 28/36

Bernoulli Distribution (Probability Mass Function)

E[X] = p Var[X] = p(1-p) f(x) = p if x = 1 f(x) = 1-p if x = 0

See Example 5.7: Applying Probability Rules to Joint Events

Energy Drink Survey (ppt 17 - 18)

Basic Concepts of Probability

Experiment Outcome Sample Space

Union of Two Events

If A and B are events, the two events, the probability that some outcome in EITHER A or B (that is, the union of A and B) occurs is denoted as P(A or B).

Complement of an Event

If A is any event, the complement of A, denoted Ac, consists of all outcomes in the sample space not in A.

Exponential Distribution Examples

It is often used in such applications as modeling the time between customer arrivals to a service system or the time to or between failures of machines, lightbulbs, hard drives and other mechanical or electrical components. Pg 158 textbook

Beta Distribution

One of the most flexible distributions for modeling variation over a fixed interval from 0 to a positive value.

Variations of the Conditional Probability Formula

P(A|B) = P(A and B) / P(B) P(A and B) = P(A | B) P(B) P(B and A) = P(B | A) P(A) note: P(A and B) = P(B and A)

Conditional Probability Formula

P(A|B) = P(A and B) / P(B) We read the notation P(A|B) as "the probability of A given B"

Understand Example 5.1

Roll 2 dice simultaneously (at the same time). What is the Frequency or # of ways to get sum of dice.

Example 5.3 -Computing the Probability of an Event

Rolling 7 or 11 on two dice : Probability = of 7 is 6 ways to obtain a 7 in a roll (6/36); Probability of 11 is 2 ways to obtain an 11 in a roll (2/36). Probability = 6/36 + 2/36 = 8/36

Probabilities Associated with Events

Rule 1. The probability of any event is the sum of the probabilities of the outcomes that comprise that event.

Complement of an Event

Rule 2. The probability of the complement of any event A is P(Ac) = 1-P(A).

Mutually Exclusive

Rule 3: If events A and B are mutually exclusive, then P(A or B) = P(A) + P(B)

Non-Mutually exclusive events rule

Rule 4: If two events A and B are NOT mutually exclusive, then P(A or B) = P(A) + P(B) - P(A and B). Here (A and B) represents the INTERSECTION of events A and B -- that is, all outcomes belonging to both A and B.

Normal Distribution Properties

Symmetric - so its measure of skewness is 0 Mean = Median = Mode are all equal; thus half the area falls above the mean and half falls below it Range of X is unbounded, meaning the tails of the dist. extend to negative and positive infinity Empirical Rules apply (exactly for the normal distribution; the area under the density function with +-1 std dev. is 68.3%, +-2 std dev. is 95.4% and +-3 std dev. is 99.7% page 154 textbook

Sample Space

The collection of all possible outcomes in an experiment.

Independent Events

The outcome of one event (A) does not affect the outcome of the second event (B). Two events A and B are independent if P(A \ B) = P(A)

Multiplication Law of Probability

The probability of two events 'A' and 'B' is the product of the probability of 'A' given 'B', and the probability of 'B' (or ) the product of the probability of 'B' given 'A', and the probability of 'A'. P(A and B) = P(A | B) P(B) = P(B | A) P(A)

[energy drink survey]: the probability of preferring a brand DEPENDS on gender.

Thus we may say that brand preference and gender are not independent

Types of Continuous Distributions

Triangular Lognormal Beta Pg 160 textbook

Mutually exclusive events

Two events that cannot occur at the same time (i.e., they have no outcomes in common).

Read Page 186

Understand the difference in the formulas for population and sample variance

Data Modeling and Distribution Fitting

Using sample data may limit our ability to predict uncertain events that may occur because potential values outside the range of the sample data are not included.

Variance of a Discrete Random Variable

Var[X], of a discrete random variable X is a weighted average of the squared deviations from the expected value: see ppt 51 for equation

Probability distribution

We may develop a probability distribution using any one of the three perspectives of probability: Classical, relative frequency, and subjective.

Probability Distributions

a characterization of the possible values that a random variable may assume along with the probability of assuming these values.

Event

a collection of one or more outcomes from a sample space

Normal Distribution (def)

a continuous distribution that is described by the familiar bell-shaped curve and is perhaps the most important distribution used in statistics.

Exponential Distribution (def)

a continuous distribution that models the time between randomly occurring events.

Sampling Plan

a description of the approach that is used to obtain samples from a population prior to any data collection activity

Poisson Distribution (def)

a discrete distribution used to model the number of occurrences in some unit of measure (often time or distance)-- ex., the number of customers arriving at Subway (see ppt 60)

Random Variables

a numerical description of the outcome of an experiment

confidence interval

a range of values between which the value of the population parameter is believed to be, along with a probability that the interval correctly estimates the true (unknown) population parameter.

Systematic (Periodic) Sampling

a sampling plan that selects every nth item from the population

point estimate

a single number derived from sample data that is used to estimate the value of a population parameter

T-distribution

actually a family of probability distributions with a shape similar to the standard normal distribution.

Stratified Sampling

applies to populations that are divided into natural subsets (called strata) and allocates the appropriate proportion of samples to each stratum

Cluster Sampling

based on dividing a population into subgroups (clusters), sampling a set of clusters, and (usually) conducting a complete census with the clusters sampled.

discrete random variable

one for which the number of possible outcomes can be counted.

Central Limit Theorem

one of the most important practical results in statistics that makes systematic inference possible. (read all page 190)

Bernoulli Distribution (p and 1-p)

p is the probability of a success and 1-p is the probability of a failure. Typically, x = 1 represents "success" and x = 0 represents "failure"

Relative Frequency Definition

probabilities are based on empirical data

Subjective Definition

probabilities are based on judgment and experience

Classical Definition

probabilities can be deduced from theoretical arguments

Experiment

process that tests a hypothesis by collecting information under controlled conditions. process that results in an outcome

Interval estimate

provides a range for a population characteristic based on a sample.

Chi-Square def

relating to or denoting a statistical method assessing the goodness of fit between observed values and those expected theoretically

Convenience sampling

samples are selected based on the ease with which the data can be collected ( survey all customers who happen to visit this month).


Ensembles d'études connexes

Kantola Professional Email Etiquette

View Set

System Analyzing and Design Chapter 1

View Set

Sociology Chapter 9 Constructing Gender and Sexuality

View Set

CIPP/US Chapter 8 - Financial Privacy

View Set

H. Biology Mastering 11.15-11.18 (for 4/13 quiz)

View Set

NAVEDTRA 14325, Basic Military Requirements

View Set

MERCANTILISM - Social Studies - I can summarize the economic policy of mercantilism.

View Set

Economics Chapter 4: Market Failures: Public Goods and Externalities

View Set

SOCI 1101 Introduction to Sociology Midterm

View Set