Business Analytics Final SG
If A and B are independent events, P(A) = .2, and P(B) = .7, determine P(A∪B).
0.14
0!
1
The probability of failure, q, is 1 - ____.
1 - p
How Large for the Sample?
1. If the sample size is at least 30, then for most sampled populations, the sampling distribution of sample means is approximately normal 2. If the population is normal, then the sampling distribution of x bar is normal regardless of the sample size.
How to Find Q1, Q2, and Q3:
1. Order your data set from lowest to highest values 2. Find the median (Q2) 3. At Q2, split the ordered data set into two halves 4. Q1 is the median of the lower half of the data 5. Q3 is the median of the upper half of the data
Properties of f(x): f(x) is a continuous function such that...
1. f(x) > or equal to 0 for all x 2. The total area under the curve of f(x) is equal to 1. 3. Area under curve = total probability!
1st Quartile (Q1) is the ______ percentile 2nd Quartile (median) is the _____ percentile 3rd Quartile (Q3) is the _____ percentile
25th, 50th, 75th
If n = 15 and p = .4, then the standard deviation of the binomial distribution is _______.
3.6
What are the 3 important percentages in normal distributions?
68.26% 95.44% 99.73%
standard normal distribution
A normal distribution with a mean of 0 and a standard deviation of 1.
Population vs. sample
A population is a set of all elements about which we wish to draw conclusions from, while a sample is a subset of the elements of a population.
Let A, B and C be events and assume the following probabilities: P(A) = 0.2, P(B) = 0.3, P(C) = 0.5, P(A ∩ B) =0.06, P(A ∩ C) = 0, P(B ∩ C) = 0.5. Which two of the three events are mutually exclusive?
A, C
A newly married couple plans to have two children & would like to know all possible outcomes: What is the probability of at least one girl?
All possible outcomes: BB, BG, GB, GG Want to know probabilities (assuming all equal): P(BB) = P(BG) = P(GB) = P (GG) = ¼ = 1/n = 3/4
________ says that if the sample size is sufficiently large, then the sample means are approximately normally distributed.
Central Limit Theorem
Car mileage and temperature are examples of ________ random variables.
Continuous
Cross-sectional data vs. Time series data
Cross sectional data is collected at the same or approximate point in time, while time series data is collected over different time periods
Range
Largest measurement - smallest measurement
Chebyshev's Theorem
Let μ and σ be a population's mean and standard deviation, then for any value k >1.
Coefficient of Variation
Measures the size of the standard deviation relative to the size of the mean (Standard deviation/ mean) x 100%
Outliers
Outliers are measurements that are very different from other measurements
Addition
P(A ∪ B) = P(A) + P(B)
Probability of an event
P(E) = number of favorable outcomes/total number of possible outcomes
If x = the number of occurrences in a specified interval, then x is a _____ random variable
Poisson
Find Q1, Q2, and Q3: 2, 4, 6, 8, 12, 13, 16, 18, 20
Q1: 5 Q2: 12 Q3: 17
The interquartile range (IQR) is
Q3-Q1
Which two of the following variables is discrete? A. Shoe size B. Waist measurement C. Foot length D. Dress size
Shoe size & dress size
Variance
The average of the squared deviations of the different values of the random variable from the expected value.
Complement
The complement (A') of an event A is the set of all sample space outcomes not in A.
Conditional Probability
The probability of an event A, given that the event B has occurred, is called the condition probability of A given B: P(A|B) = P(A∩B) / P(B)
Central Limit Theorem
The theory that, as sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution. (sample size of at least 30)
T or F: In binomial distributions, Probability of success, p, is constant from trial to trial
True
T or F: Point Estimation is a form of statistical inference.
True
T or F: The highest point of the curve is over the mean
True
T or F: an area under a continuous probability distribution is a probability.
True
Skewed to the left
the left tail of the histogram is longer than the right tail.
Mode
the most frequently occurring score(s) in a distribution
Which of the following is not a discrete random variable? A) the number of times a light changes red in a 10-minute cycle B) the number of minutes required to run 1 mile C) the number of defects in a sample selected from a population of 100 products D) the number of criminals found in a five-mile radius of a neighborhood
the number of minutes required to run 1 mile
Frequency
the number of times a value, value range or event occurs in gathered data.
Skewed to the right
the right tail of the histogram is longer than the left tail
Sample space
the set of all possible outcomes of a probability experiment
Standard Deviation
the square root of the variance
Median
the value of the MIDDLE point of the ordered measurements
Bar chart
vertical/horizontal rectangle represents the frequency for each category. Only good for categorical/qualitative variables!!
A standard normal distribution has a mean of ________ and standard deviation of ________.
zero ; one
Which of the following is a categorical variable?
Whether a person has a traffic violation
Is sample mean quantitative?
YES
Normal distribution
a bell-shaped curve, describing the spread of a characteristic throughout a population
Percent Frequency
a display of data that indicates the percentage of observations for each data point or grouping of data points.
Non-Probability Sampling
a method of selecting units from a population using a subjective (i.e. non-random) method.
Population parameter
a number calculated from all the population measurements that describe some aspect of the population
Sample statistic
a number calculated using the sample measurements that describe some aspect of the sample
Histogram
a picture of the frequency distribution. QUANTITATIVE Step 1: Find # of classes (2^k > N) Step 2: Find class length ((largest value-smallest value) / # of classes) Step 3: Form non overlapping classes of equal width Step 4: Graph histogram
sampling distribution of the sample mean
a probability distribution of all possible sample means of a given sample size
Probability Distribution
a table, graph, or formula that gives the probability associated with each possible value that the variable can assume
Ordinal
a variable where there is a meaningful ordering, or ranking, of the categories EX: teaching effectiveness, class (freshman/junior/etc)
Nominative
a variable where there is no meaningful ordering, or ranking, of the categories. EX: gender, car color
Interval
all of the characteristics of ordinal plus measurements are on numerical scale with an arbitrary zero point, can only meaningfully compare values by the interval between them EX: temperature
Ratio
all the characteristics of interval plus measurements are on a numerical scale with a meaningful zero point, values can be compared by their interval and ratios (most quantitative variables) EX: earnings, profit, loss, age, height, distance
Multiple choice questions
allow more than two responses, usually analyzed with averages
All probability distributions are characterized by _________ and a ________.
an expected value (mean) variance (standard deviation squared)
Variable
any characteristic of an element EX: model design, lot type, list price, etc
subjective method
assessment based on experience, expertise or intuition
Relative frequency method
assigning probabilities based on experimentation or historical data (ex: estimate the probability that a randomly selected consumer prefers Coca-Cola to all other soft drinks.)
Mean
average
Intersection
belong to both A and B.
Union
belong to either A or B or both.
If x is the total number of successes in n trials of a binomial experiment, then x is a ______ random variable.
binomial
In the uniform distribution, _____ is the smallest value and ____ is the largest.
c ; d
Convenience sampling
choosing individuals who are easiest to reach
Pie chart
circle divided into slices where the size of each slice represents its relative frequency or percent frequency.
Contingency table
classify data on two dimensions
Dichotomous questions
clearly stated, easy to answer/analyze, limited info, "yes or no"
What are the two types of random variable?
discrete (countable) and continuous (decimals)
Multistage cluster sampling
divide population into clusters and then randomly select clusters to sample.
Stratified random sampling
divide population into non-overlapping groups (strata) and then select a random sample from each strata.
Data
facts and figures from which conclusions can be drawn
Empirical Rule
if a population has a mean (mu) and a standard deviation and is described by a normal curve then, 68.2% of the population measurements lie within one standard deviation of the mean. 95.44% lie within two standard deviations of the mean 99.73% lie within three standard deviations of the mean
Voluntary sampling
individuals are self-selected by responding to an incentive
The ________ of two events A and B is the event that consists of the sample space outcomes belonging to both event A and event B.
intersection
Qualitative (categorical)
labels or names used to identify an attribute of each element EX: name of customer, email address, address of house
Even if the population of individual items is not normal, the sampling distribution is approximately normal if the sample is ______ enough (Central Limit Theorem)
large
Systematic sampling
list population, select random starting point, sample each nth element.
Relationship among mean, median, and mode
mean > median > mode
The Central Limit Theorem states that as the sample size increases, the distribution of the sample ________ approaches the normal distribution.
means
Probability
measure of the chance that an experimental outcome will occur when an experiment is carried out.
If a population distribution is skewed to the right, then, given a random sample from that population, one would expect that the
median would be less than the mean/average
Open ended questions
most honest and complete information, cannot be readily summarized
If two events are independent, we can ________ their probabilities to determine the intersection probability.
multiply
If the population of individual items is normal, then the population of all sample means is also ___________.
normal
Sample Size
number of elements
Quantitative
numbers represent qualities EX: weight of package, shipping costs, package size
What is a random variable?
quantitative value that represents the outcomes from an experiment.
Judgment sampling
samples in which a person who is extremely knowledgeable about the population selects population elements he or she feels are most representative.
Probability Sampling
sampling where we know the chance that each element in the population will be included in the sample (cluster, systematic, stratified)
Dot Plots
see the overall pattern of the data by grouping the data into classes. Best for small to moderately sized data distributions.
Event
set of sample space outcomes
Relative Frequency
summarizes the proportion of items in each class. For each class, divide the frequency of the class by the total # of observations. Multiply by 100 to obtain the percent frequency
Data Set
the data that is collected for a particular study
Outcomes
the experimental outcomes in the sample space