MegaSTATS
if p- is the value that a normal random variable assumes, then we can transform it into its standard normal value as
(p- - p)/ square root p(1-p)/n
a population has a mean of 100 and a SD of 12. A random sample of 36 is selected. The SD of x is equal to
2
new dining plan. residents were asked to give a grade of A, B, C, D, F to the new plan. Of 482 residents, 234-A, 148-B, 87-C, 9-D. How many F's
4
which graphical depictions are useful to observe the shape of a data set for a single variable
histogram stem and leaf polygon
the sample size required to approximate the normal distribution depends on
how much the population varies from normality
samples are primarily used to
make inferences about population parameters
The set of all possible outcomes from a random experiment is called a _________ __________.
sample space
Multiplying data by a fraction (where the fractions add to 1) and summing results in a _____ mean.
weighted
when a mean is calculated and some observations are given greater importance, we refer to this measure of central location as a
weighted mean
cluster sampling works best
when most of the variation of a population is within groups and not between groups
which is not an example of an experiment
winner of last weeks lottery
the probability distribution of the sample mean is commonly referred to as the
x-
A numerical value that measures the likelihood of an uncertain even occurring is a ________________.
probability
Marginal probabilites
probability of an event relative to the frequency that is found by dividing a row or column total by the total sample size
conditional probability
probability of event A given B has occured
cumulative relative frequency distribution for qualitative data identifies the
proportion of observations that fall below the upper limit of each class
coefficient of variation
a relative measure of dispersion
when a sample statistic is used to make inferences about a population parameter, it is referred to as an
estimator
a subset of the sample space is a/an
event
Independent events
event A is independent of event B if and only if P(A/B)=P(A)
stratified sampling is preferred to cluster sampling when the objective is to
increase precision
if two events do not influence each other, the events are
independent
an example of cross-sectional data
results of market research testing consumer preferences for soda
when testing u and ó is known, Ho can never be rejected if z <= 0 for a
right tailed test
a ___ is a subset of a population
sample
all possible outcomes of an experiment is the ___ space
sample
True or false: The arithmetic mean is the average of a data set.
true
True or false: The trimmed mean can mitigate the effect of outliers.
true
in order to transform a value x into its standardized value z, we use the following formula
z=(x-u)/ó
The mean, or average, has the property that distances from the mean to the individual points always sum to:
zero
probability that a continuous random variable X assumes a particular value x is
zero
we do not reject the null hypothesis when the p-value is
>= a
If Fund A has a coefficient of 1.1 and Fund B has a coefficient of variation of 0.9, Fund ____ has the greater relative dispersion.
A
Binomial shape
A binomial distribution is skewed right if Pi<.50 and skewed left if Pi>.50 and symmetric only if Pi=.50 Use =BINOM.DIST(x, n, pi, cumulative)
For two events A and B, the multiplication rule is
P(A∩B)= P(A|B)xP(B)
it is known that the length of a certain product X is normally distributed with u=20 inches. How is the probability P(X>16) related to the probability P(X<16)
P(X>16) is greater than P(X<16)
due to symmetry, the probability that the normal random variable Z is greater than 1.5 is equal to
P(Z<-1.5)
Mean absolute deviation (MAD)
This statistic reveals the average distance from the center. Absolute values must be used since otherwise the deviations around the mean would be zero. =AVEDEV(Data)
the addition rule for 2 events A and B is
P(AUB) = P(A) + P(B) - P(AnB)
Trimmed mean
Same as the mean except omit highest and lowest k% of data values (e.g., %) =TRIMMEAN(Data, Percent)
A _________ of an experiment contains all possible outcomes of the experiment
Sample space
True or false: The geometric mean does not mitigate the effect of outliers.
false
average score on a stats exam was 75 with a standard dev of 15. if you scored a 60, your z-score is
-1
Calculate the standardized score for the following data value. Assume the mean = 100 and the standard deviation = 25: x=60, z=_____.
-1.6
When rolling a pair of dice and summing the two values rolled, which of the following are exhaustive events?
-a value of 6 or more and a value of 8 or less -a value of 7 or more and a value of 6 or less -an even number and an odd number. Not: a value of 9 or more and a value of 7 or less
suppose you are performing a hypothesis test on u and the value of ó is known. At the 5% significance level, the critical value(s) for a two tailed test is (are)
-z0.025 and z 0.025
prob that a customer orders popcorn is 0.4. The prob that the order a drink is 0.65. The prob that they order popcorn and a drink is 0.3. If they have already ordered a drink, what is the prob that they will order popcorn?
.46
What is the probability that a randomly selected worker slept on the job?
.587
The manufacturer of liquid laundry detergent has a 0.02 probability that the detergent bottles will be improperly filled. There is a 0.03 probability that the label on the bottle will not be affixed properly. If the events of bottle fill and affixing the label are independent, then the probability of a bottle being filled improperly and having an improperly affixed label is
0.0006 Since the events are independent, we calculate 0.02 x 0.03 = 0.006
If 4 passenger cars are randomly selected, what is the probability that all of the passenger cars get more than 35 mpg?
0.0138 ± 0.01
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Approximate the p-value.
0.05< p-value < 0.02
The probability of an employee getting a promotion is 0.20. The probability of an employee having an MBA is 0.30. The probability of an employee getting a promotion given that the employee has an MBA is 0.25. The probability that an employee has an MBA and gets a promotion is
0.075 P(A∩B)= P(A|B) x P(B)= 0.25 x 0.30 = 0.075
the joint probability table examines whether a residents location, east or west side, affects whether the resident gets season passes to the pool on the west side. Prob that a resident lives on the west side and does not have a pass
0.10
Center for Studying Health System Change None will delay or go without medical care?
0.1678 +- 0.001
the joint probability table examines whether a residents location, east or west side, affects whether the resident gets season passes to the pool on the west side. Given that a resident has a season pass, what is the prob the resident lives on the east side
0.40
Let P(A)=0.30 and P(B)=0.40. Suppose A and B are independent events. Calculate P(B | A)
0.40 Since A and B are independent events, P(B)=P(B|A)
The probability of Margaret receiving a promotion at XYZ corp. is 0.70. The prob of Katia receiving a promotion at ABC inc. is 0.60. If the two promotions are independent, then the probability of both Margaret and Katia receiving a promotion is _______________
0.42 Since the events are independent, 0.70 x 0.60 = 0.42
area under a normal curve below its expected value is
0.5
due to symmetry, the probability that the standard normal random variable Z is less than 0 is equal to
0.5
the probability that a normal random variable X is less than its mean is equal to
0.50
a population has a mean of 50 and a SD of 10. A random sample of 256 is selected. The SD of x- is equal to
0.625
The odds against a horse winning a race were set at 7 to 1. The probability of that horse not winning the race is ___________ . Answer should be in decimal form, using 3 decimal places.
0.875 (7/1+7)= 0.875
zα/2 for 88%
1.56
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -test statistic
2 ± 0.50
zα/2 for 98%
2.33
an investment strategy has an expected return of 12% and a SD of 10%. If investment returns are normal, the probability of earning a return of more than 32% is closest to
2.5%
Quartile divide the data into _______ (choose a number) of equal parts
4
Recent home sales in a suburb of Washington, D.C., are shown in the accompanying ogive. Approximate the percentage of houses that sold for more than $500,000.
60%
If an experiment is selecting a card from a deck of cards, then the sample space is
All the cards in the deck (-not just face cards, red cards, aces, etc.)
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Make an inference
Americans do not sleep less than the recommended 7 hours of sleep
The relative frequency of an event is used to calculate what type of probability?
An empirical probability
random experiment
An observational process whose results cannot be known in advance
For any given event, the probability of that event and the probability of the _______ of the event must sum to one.
Complement
A _________ probability is the probability of an event given that another event has already occurred.
Conditional
for a discrete random variable, the variance of X is calculated as
E (xi-u)^2 P(X=xi)
A trial, or process, that produces several possible outcomes is referred to as an_______
Experiment
The _______ formula is used to determine the number of possible ways to arrange (n) items when there are no groups
Factorial
Venn diagram
Illustration of A' and A
The probability of State College winning a football game is 0.60 The probability of University of State winning a football game is 0.65. Given that State College has won its football game, the probability that Univ of State wins its game is 0.65. The teams are not playing each other. The events State College and U of State winning are
Independent
The probability that a customer will purchase a product is 0.15. The probability that a customer is a male is 0.5. The probability that a customer is a wale and will purchase a product is 0.075. The events purchasing a product and being a male are
Independent
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Can you conclude that the machine is working improperly
No
If A and B are independent events, then
P(A)=P(A|B)
The ________ formula is used to determine the number of ways to arrange (x) objects from a group of (n) objects where the order of the objects matters.
Permutation
The five nines rule
Prime business customers expect public carrier-class telecommunications data links to be available 99.999% or the times. This equates to 5 min downtime per year.
A numerical value that measures the likelihood of an uncertain event is a ________?
Probability
z-score
Standardized values that can tell us how far away from the mean each observation lies
Poisson distribution (model of rare events)
The number of occurrences within a randomly chosen unit of time or space. All characteristics are determined by its mean λ. The standard divination of the square root is its mean.
Joint Probability
The probability A and B
True False: When constructing a joint probability table, the cell in the lower right corner must always equal 1.0
True (the lower right cell represents all outcomes in the sample space, which is the probability 1.)
The symbol π in the binomial PDF a. is the probability of success b. is in the interval [0,1] c. is greater than 1 d. can be either negative or positive values
a. is the probability of success and b. is in the interval [0,1]
How many outcomes of an experiment constitute a simple event? a. more than one b. one c. two d. none
b. one
which statement is not correct concerning the p-value and critical value approaches to hyp testing
both approaches use the same decision rule concerning when to reject Ho
a theorem that allows us to use the normal probability distribution to approximate the sampling distribution of the sample mean whenever the same size is large is known as the
central limit theorem
An economist reports that 506 out of a sample of 1,200 middle-income American households actively participate in the stock market. -Construct a 90% confidence interval for the proportion of middle-income Americans who actively participate in the stock market
confidence interval 0.397 ± .01 to 0.446 ± .01
suppose the competing hypothesis for a test are Ho: u=100 vs Ha: u =/ 100. If the p-val for the hyp test is 0.07 and the chosen level of sig is 0.05, then the correct conclusion is
do not reject Ho and conclude that the population mean does not differ from 100 and the 5% sig level
a type 2 error occurs when we
do not reject the null hypothesis when it is actually false
5!=? a. 25 b. 24 c. 100 d. 5 e. 120
e. 120 A factorial is the product of all number from 1 to n. Therefore 5! = 5*4*3*2*1 = 120
relative frequency of an event is used to calculate what type of probability
empirical probability
Expected value E(X)
is the weighted average that measures center, the variance
the normal distribution is completely described by these two parameters
mean and variance
4, 5,6,9 mode
none
Leptokurtic
peaked more sharply than normal
ó
population SD
when constructing a histogram what values/labels go on the horizontal (x) axis and the vertical (y) axis
quantitative class limits on the horizontal axis; frequency or relative frequency on the vertical axis.
At a small firm in Boston, seven employees were asked to report their one-way commute time (in minutes) into the city. Their responses were.
shortest: 20 longest: 90 mean: 45 median:40 mode:35
A coach believes laurie has a 0.5 prob of getting a hit against a pitcher that she has never batted against before. This type of probability is best characterized as
subjective probability
probability
the number of measure the relative likelihood that the event will occur. P(A)
which characteristic of interest is a variable
the number of pizzas ordered from pizza hut per day
Mutually exclusive
their intersection is the empty set (contains no elements). One event precluded the other from occurring.
The mode for the data set: 4, 5, 6, 9 is
there is no mode
statistics is used
to make informed decisions based on data
Union
two events consist fo all outcomes in the sample that are contained in wither A or B or both
Nadia purchased 400 shares of XYZ stock at $20 per share. When the stock decreased in value to $16 a share, Nadia purchased 600 more shares of XYZ stock. The weighted average price per share that Nadia paid for XYZ stock is $_____ (use 2 decimal places).
$17.60
a continuous random variable X follows the uniform distribution with a lower limit of a and an upper limit of b. The expected value of X is
(a+b)/2
Quartiles
(denoted Q1, Q2, Q3) are scale points that divided the sorted data into four groups of approximately equal size, that is, the 25th, 50th, and 75th percentile respectively
in order to approximate class width for a frequency distribution of quantitative data, we calculate:
(largest value - smallest) / n
5!=?
120 5x4x3x2x1
a pop has a mean of 50 and a SD of 10. A random sample of 144 is selected. The expected value of x is equal to
50
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Calculate the critical value(s) at a 5% level of significance
+- 2.03 ± 0.001
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Test using the critical value approach with α = 0.05 Critical Value:
-1.64 ± .01
probability that Z is greater than 1.32
.0934
An apartment complex rents an average of 2.3 new units per week. If the number of apartments rented each week has a Poisson distribution, then the probability of renting exactly three apartments in a week is __________ [round your answer to 3 decimal places and do not enter a percentage]
.203 (e^-2.3)(2.3^3)/3! = 0.203
Harris Interactive for job site table
.2214 .3657 .587 .2071 .2057 .413 .4286 .5714 1.00
The average number of customers arriving at Jimmy's Burgers is 1.7 per minute. What is the probability that only 1 customer arrives in the next minute? ______________ (Round to three decimals and do not enter a percentage.)
.311 1.7^1e^-1.7/1! = .311
What is the probability that a randomly selected adult is neither overweight nor obese?
.312
Professor Stats has 40 students in her statistics class. 24 of the students are male. If she randomly selects 6 of her students at, without replacement, the probability of selecting 4 men in the sample is _________ [round your answer to three decimal places.)
.332
What is the probability that a randomly selected passenger car gets more than 35 mpg?
.3429
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the p-value to test the researcher's claim at α = 0.01
0.0401 ± 0.004
What is the probability that the average mpg of 4 randomly selected passenger cars is more than 35 mpg?
0.1714 ± 0.01
random samples of 400 are taken from a population whose pop proportion is 0.25. The expected value of the sample proportion is
0.25
The probability of randomly selecting a "spade" from a deck of cards is
0.25 (because 4 suits in a deck)
new dining plan. 480 residents. 234: B 85: C 13: D 4: F proportion of grades designated as A were
0.30
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -probability that the sample proportion is less than 0.80?
0.3015 ± 0.01
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -A consumer advocate comments that the majority of consumers spend over $8,000 on a debit card. Find a flaw in this statement.
0.3372 ± .005
at a local diner, 30% order the special. What is the probability that exactly 1 of the next 5 order the special.
0.36
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -probability that the sample proportion is within ± 0.02 of the population proportion
0.3970 ± 0.01
A list of the top twenty restaurants in Chicago was released. Four of the restaurants specialize in seafood. If five restaurants are selected randomly from the list, the expected value for the number of restaurants specializing in seafood is _______________.
1
An analyst has developed the following probability distribution of the rate of return for a common stock. Rate of return is
1%
an investment strategy has an expected return of 12% and a SD of 10%. If investment returns are normal, the probability of earning a return of less than 2%
16%
The Chartered Financial Analyst -confidence interval?
170,180 to 145,820
Mike is placing a bet on an upcoming horse race in which 7 horses are running. Mike places a trifecta bet that wins only if he correctly picks the first, second, and third place horses in order. There are _________ possible outcomes for the first three horses in the correct order.
210 [permutation] nPr= n! ------ (n-r)!
Mike is placing a bet on an upcoming horse race in which seven horses are running. Mike places a trifecta bet that wins only if he correctly picks the first, second, and third place horses in order. In how many different ways can Mike select three horses when order matters?
210 n!/(n-x)! = 7!/(7-3)! = 5040/24 = 210
In general, a data point is considered an outlier if it falls more than ________ standard deviations away from the average.
3
An article in the National Geographic News argues that Americans are increasingly skimping on their sleep. A researcher in a small Midwestern town wants to estimate the mean weekday sleep time of its adult residents. He takes a random sample of 80 adult residents and records their weekday mean sleep time as 6.4 hours. Assume that the population standard deviation is fairly stable at 1.8 hours. -Calculate a 95% confidence interval for the population mean weekday sleep time of all adult residents of this Midwestern town
6.01 ± 0.02 to 6.79 ± 0.02
20% of a restaurants customers order the chef's special. 230 customers are anticipated to dine at the restaurant tonight. The STANDARD DEVIATION for this binomial distribution is _____________ (round your answer to 3 decimal places)
6.066 230 x 0.2 x 0.8 = 36.8 square root of 36.8 = 6.066
local restaurant, 20% of customers order chefs special. Binomial random variable X is the number that order the special. If they expect 230 customers, then the SD for this is
6.1
which would the arithmetic mean NOT be a good measure of central location
7,8,8,9,25
Platykurtic
A population that is flatter than normal
Bernoulli experiment
A random experiment that has only two outcomes.
when rolling a pair of dice and summing the two values rolled, which of the following are exhaustive events?
A value of 6 or more and a value of 8 or less A value of 7 or more and a value of 6 or less An even and an odd value
Sample correlation coefficient
A well-know statistic that describes the degree of linearity between paired observations on two quantitative variable X and Y
How does an ogive differ from a polygon?
An ogive is a graphical depiction of a cumulative frequency or cumulative relative frequency distribution, while a polygon is a graphical depiction of a frequency or relative frequency distribution
which are examples of a binomial experiment
Ask customers at a movie if they spent 20$ or more on concessions Asking randomly selected people whether they are a FB member
The conditional probability of A given B is calculated by diving the intersection of A and B by the probability of
B
Subjective
Based on informed opinion or judgement
Which of the following BEST represents an empirical probability?
Based on past data, a manager believes there is a 70% chance of retaining an employee for at least one year. (not: the probability of tossing a head on a coin is 0.5) - this is an a priori probability
which are mutually exclusive
Being on time and being late for an appointment Receiving an A and receiving a B as a final grade
For hotels in NYC, a travel web site wants to provide information comparing hotel costs (high,average,low) versus the quality ranking of the hotel (excellent, good,fair,bad). A useful way to summarize this data is to construct a
Contingency table
Empirical approach / relative frequency approach
Counting the frequency of observed outcomes (f) defined in our experimental sample space and dividing by the number of observations (n). The estimated probability if f/n.
PDF - probability distribution function
Defined by either a list of X-values and their probabilites or by mathematical equations. A discrete PDF shows the probability of each X-value
stem and leaf diagrams can be used to
Determine how dispersed the data is. Analyze the shape of the data Observe individual data points
histograms can be used to
Determine the shape of the data. Observe the spread or the variability of the data
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Select the null and the alternative hypotheses to test the policy analyst's objective.
H0: p ≥ 0.82; HA: p < 0.82
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively. -Select the null and the alternative hypotheses to determine if the machine is working improperly, that is, it is either underfilling or overfilling the cereal boxes.
H0: µ = 1.20; HA: µ ≠ 1.20
the alternative hypothesis for a two sided test for a population mean would be denoted as
Ha: u =/ (not equal) uo
Classical
Known a priori by the nature of the experiement
Recognizing hyper-geometric application
Look for a finite population (N) containing a known number of successes (s) and sampling without replacement (n items in the sample) where the probability of success is not constant for each sample item drawn.
which about the MAD is most accurate
MAD is denominated in the same units as the original data
avg of the absolute differences between the values of the data set and the mean
MAD mean absolute deviation
Covariance
Measures the degree to which the values of X and Y change together.
Median
Middle value in sorted array =MEDIAN(data)
Mode
Most frequently occurring data value. =MODE.SNGL(data)
Are the events "IT Professional" and "Slept on the Job" independent?
No because P("IT" | "Yes") ≠ P("IT").
Assume the sample space S={win,loss}. Select which fulfills the requirements of the definition of probability.
P({win})=0.8, P({loss})=0.2 (equal to 1)
Excel quartiles
Q1: =QUARTILE.EXC(Data 1) (25% below) Q2 =QUARTILE.EXC(Data2) (50% below) Q3 =QUARTILE.EXC(Data3) (75% below)
A probability based on personal judgment rather than on observation or logical analysis is best referred to as a
Subjective probability (review correlated, empirical, and a priori, prob)
Population variance (sigma squared)
Sum of the squared deviations from the mean divided by the population size.
CDF - cumulative distribution distribution function
The CDF shows the cumulative sum of probabilities, adding from the smallest to the largest X-value.
Intersection
The event consisting of all outcomes in the sample space S that are contained in both event A and B.
Interquartile range
The first and third quartiles Q1 and Q3 indicate the center because they define the boundaries for the middle 50% of the data, but Q1 and Q3 also indicate variability because the interquartile range Q3-Q1 (IQR) measure the degree of spread in the data (middle 50%)
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -What is the conclusion
The law has been effective since the value of the test statistic falls in the rejection region.
Symmetric data
The mean and median are about the same. Tails of the histogram are balanced (low/high values offset) mean [almost equals] median
Skewed right (positively skewed)
The mean exceeds the median. Long tail of histogram points right (most data on left but a few high values) Mean>Median
Skewed left (negatively skewed)
The mean is below the median. Long tail of histogram points left (a few low values but most data on right) Mean<median
Kurtosis
The relative length of the tails and the degree of concentration in the center
Exercise 1-5 Recent research suggests that depression significantly increases the risk of developing dementia later in life (BBC News, July 6, 2010). In a study involving 949 elderly persons, it was reported that 22% of those who had depression went on to develop dementia, compared to only 17% of those who did not have depression.
The sample consists of 949 elderly people The population is all elderly people The numbers 22% and 17% represent the sample statistics
sample space
The set of all possible outcomes (S) for the experiment
Coefficient of variation
The standard deviation expressed as a percent of the mean.
Mean
The statistical measure of center. Sum of the data values divided by the number of data items. =AVERAGE(data)
True or false: E(X)=u
True
classical approach
Using deduction to determine probability
a type 1 error is commonly denoted as
a (alpha)
bar chart, which is most accurate
a bar chart is a useful graphical tool for qualitative data
Binomial distribution
a binomial random variable X is the sum of n independent Bernoulli random variables
Tree diagrams
a diagram used to display events and probabilities to help visualize all possible outcomes.
which is most accurate
a parameter is a constant although its value may be unknown
Which of the following are characteristics of a binomial distribution? a. Each trial is independent of the previous trail b. Each trail has only two possible outcomes c. It is continuous distribution d. For each trail the probability of success remains the same
a. Each trial is independent of the previous trail and b. Each trail has only two possible outcomes and d. For each trail the probability of success remains the same
Consider rolling two dice. Which of the following describe two events that are collectively exhaustive? a. Event 1 A value of 7 or more. Event 2: A value of 6 or less b. Event 1: A value of 9 or more. Event 2: A value of 7 or less c. Event 1: A value of 6 or more. Event 2: A value of 8 or less d. Event 1: rolling an even number. Event 2: Rolling an odd number
a. Event 1: A value of 7 or more. Event 2: A value of 6 or less [Their union included all possible outcomes of rolling two dice] and c. Event 1: A value of 6 or more. Event 2: A value of 8 or less [The union includes all possible outcomes of rolling two dice] and d. Event 1: rolling an even number. Event 2: Rolling an odd number [Their union includes all possible outcomes of rolling two dice]
Which expression(s) below would be correct for calculating the probability that no more than 2 of 4 patients has health insurance? [check that apply] a. P(X </= 2) b. P(X < 2) c. P(X = 0)+P(X = 1)+P(X = 2) d. P(X = 2)+P(X = 3)+P(X = 4)
a. P(X </= 2) and c. P(X = 0)+P(X = 1)+P(X = 2)
In which of the following ways are binomial distributions and hyper-geometric distributions similar? [check all that apply] a. They both have two possible outcomes: success or failure b. They are both discrete distributions c. They both assume each trial is independent of each other. d. The probability of success is constant in each trail.
a. They both have two possible outcomes: success or failure and b. They are both discrete distributions
The binomial distribution can be approximated using the Poisson [check all that apply] a. When n is large and π is less than or equal to .05 b. if n < 10 and π > .5 c. by letting λ = nπ d. when n is small and π is large
a. When n is large and π is less than or equal to .05 and c. by letting λ = nπ
Generally, a negative covariance between market demands indicated [check all the apply] a. a centralized distribution will decrease the aggregate demand variance b. a centralized distribution center can reduce the need for safety stock c. a centralized distribution center will decrease the aggregate average demand d. a centralized distribution center will increase the need for safety stock.
a. a centralized distribution will decrease the aggregate demand variance and b. a centralized distribution center can reduce the need for safety stock
Which of the following are characteristics of the geometric distribution? [check all that apply] a. counts the number of trial until the first success b. the mean is 2/π c. the probability of success is not constant d. a series of Bernoulli trials
a. counts the number of trial until the first success and d. a series of Bernoulli trials
A binomial distribution is skewed left when the probability of success is a. greater than .5 b. less than .5 c. equal to .5
a. greater than .5
Discrete random variables: [check all that apply] a. have a set of distinct values b. are countable c. can have a finite or infinite number of values d. must have a clear upper limit
a. have a set of distinct values and b. are countable and c. can have a finite or infinite number of values
The probability of State College winning a football game is 0.6. The probability of University of State winning a football game is 0.65. Given that State College has won its football game, the probability of University of State winning its game is over 0.65. The teams are not playing each other. The event State College winning and University of State winning are: a. independent b. mutually exclusive c. dependent
a. independent
The probability that a customer will purchase a product is 0.15. The probability that a customer is male is 0.5. The probability that a customer is a male and will purchase a product is 0.075. The events purchasing a product and being male are. a. independent b. dependent c. mutually exclusive
a. independent
The interquartile range of a data set: (check all that apply) a. is calculated by subtracting the first quartile from the third quartile b. represents the middle 75% of the data c. is the range between the first and fourth quartiles d. represents the middle 50% of the data
a. is calculated by subtracting the first quartile from the third quartile and d. represents the middle 50% of the data
Which of the following criteria indicate it would be acceptable to approximate the hyper-geometric with a binomial? a. n/N < .05 b. n/N > .05 c. n/N < .10
a. n/N < .05
When estimating sigma using the following formula: Xmax-Xmin/6, one is assuming the distribution is: a. normal b. highly skewed c. Bimodal d. No assumption on the distribution
a. normal
The box plot is constructed using several different values. Which of the following values from a data set are included in a box plot? a. the largest value b. the mean c. the mode d. the first quartile e. the fifth quartile
a. the largest value and d. the first quartile
A box plot is constructed using several different values. Which of the following values are included in a box plot? (select all that apply) a. the smallest value b. the standard deviation c. the 90th percentile d. the second quartile e. the third quartile
a. the smallest value d. the second quartile e. the third quartile
Define event A = {1, 2, 3, 4} and event B = {2, 3, 6, 7}. A or B = a. {1, 2, 3, 4, 6, 7} b. 5 c. {1, 4, 6, 7} d. {2, 3}
a. {1, 2, 3, 4, 6, 7}
if the population from which the sample is drawn is normally distributed, then the sampling distribution of the sample mean is
always normally distributed
the conclusions of a hypothesis test that are drawn from the p-value approach versus the critical value approach are
always the same
weakness of the ordinal scale
an inability to measure differences between the ranked values
a continuous random variable X can assume
an infinite number of values over some interval
discrete probability distributions
assigns probability to each value of a discrete random variable X.
The following relative frequencies help determine if residents' location affects whether the residents get season passes to the pool on the west side. E W Total Pass 0.3 0.45 0.75 None 0.15 0.10 0.25 Total 0.45 0.55 1.0 a. 0.55 b. 0.1 c. 0.15 d. 0.45
b. 0.1
Calculate the joint probability that a country club member plays tennis and golf. Ten. No Total G. 180 0 0 no 0 50 100 Totals 230 400 a. 0.25 b. 0.45 c. 0.125 d. 0.575
b. 0.45
A company receives an average of 0.8 purchase orders per minute. The company wants to determine the probability of receiving 6 purchase orders in 5 minutes. What value should she company use for the mean to help calculate the probability? a. 6 b. 4 c. 5 d. 0.8
b. 4
Consider rolling two dice. Which of the following describe two events that are collectively exhaustive? [select all that apply] a. Event 1: A value of 9 or more. Event 2: A value of 7 or less. b. Event 1: rolling an even number. Event 2: Rolling an odd number c. Event 1: A value of 7 or more. Event 2: A value of 6 or less. d. Event 1: A value of 6 or more. Event 2: A value of 8 or less
b. Event 1: rolling an even number. Event 2: Rolling an odd number and c. Event 1: A value of 7 or more. Event 2: A value of 6 or less. and d. Event 1: A value of 6 or more. Event 2: A value of 8 or less
The following contingency table can help determine if residents' location is independent of whether or not the resident purchases a season pass to the pool. The pool is located on the West side. East West Total Pass 35 50 85 None 25 10 35 60 60 120 a. P(pass and East) and P(none) b. P(Pass and East)P(pass) c. P(pass and West)P(pass) d. P(pass and West) and P(East)
b. P(Pass and East)P(pass) and c. P(pass and West)P(pass)
Which of the following experiments is likely to produce a uniform discrete distribution? a. Test scores on college entrance exam, such as the ACT or SAT b. The answer selected on a multiple choice question that has four choices by a student who did not study c. The values that occur from repeated spins of a roulette wheel at a casino d. The number of patrons arriving every 3 minutes at a sandwich shop.
b. The answer selected on a multiple choice question that has four choices by a student who did not study [the student would be guessing so each answer is equally likely.] and c. The values that occur from repeated spins of a roulette wheel at a casino
In which of the following situations is it appropriate to use a Poisson process? [check all that apply] a. The number of people outof 20 who prefer Pepsi-Cola to Coca-Cola in a blind taste test b. The number of customers who purchase concessions every 5 minutes, while a movie is playing at a theater c. Twenty firms are randomly selected from the S&P 500 and asked whether they will increase hiring over the next year. d. The number of landfills per county in the state of Texas.
b. The number of customers who purchase concessions every 5 minutes, while a movie is playing at a theater and d. The number of landfills per county in the state of Texas.
Which of the following should one look for when identifying a hyper-geometric application? [check all the apply] a. sampling with replacement b. a known number of success, s c. a finite population, N d. probability of success is constant
b. a known number of success, s and c. a finite population, N
When calculating a percentile, the first step is to arrange the data set in: a. groups of ten unites b. ascending order (from least to greatest) c. descending order (from greatest to least) d. classes of equal width
b. ascending order (from least to greatest)
The first step to determine the median is to a. fine the average of the data set b. place the data in numerical order c. fine the range of the data set
b. place the data in numerical order
Contingency tables are useful to analyze [check all the apply] a. one quantitative variable b. relative frequencies c. the results of a survey d. one qualitative variable
b. relative frequencies and c. the results of a survey
The range is the difference between a. the first and third quartiles b. the largest and smallest values c. the mean and the median
b. the largest and smallest values
A box plot is constructed using several different values. Which of the following values from a data set are included in a box plot? a. the fifth quartile b. the largest value c. the first quartile d. the mean e. the mode
b. the largest value and c. the first quartile
One common way to describe a Poisson process is a. the model of departure times. b. the model of arrivals c. the model of dependent events
b. the model of arrivals
Which of the items describes the usefulness of a standard deviation? a. to compare variables with different unites of measure b. to gauge the relative position of data values within the data set c. to help estimate the mean d. to determine the median of the data set
b. to gauge the relative position of data values within the data set
one method of graphical presentation for qualitative data is
bar chart
which best represents an empirical probability
based on past data, a manager believes there is a 70% chance of retaining an employee for at least one year
a left tailed test of the population mean is conducted at a=0.10. The calculated test statistic is z= -1.55 and P(Z< -1.55)= 0.0606
be rejected since the p-value=0.0606<0.10
the graph depicting the normal probability density function is
bell shaped
probability that a discrete random variable X assumes a particular value x is
between 0 and 1
if a sample statistic consistently over or under estimates a population parameter, then there is ____
bias
if 230 out of 400 country club members play tennis, the value 230 should be interred in the contingency table in the cell with the letter
c
The probability of randomly selecting a "spade" from a deck of cards is a. 0.33 b. 0.019 c. 0.25 d. 0.077
c. 0.25
Which of the following can be used to determine the proportion of data points that fall within a specified number of standard deviation from the mean? a. The mode b. percentiles c. Chebyshev's Theorem d. The empirical rule - assuming a normal distribusion
c. Chebyshev's Theorem and d. The empirical rule - assuming a normal distribusion
Which of the following event are mutually exclusive? a. Being of German decent and being of Mexican decent. b. Passing a statistics test and passing an English test c. Rolling an odd number and an even number on the same roll of the die. d. Being on time and being late for an appointment.
c. Rolling an odd number and an even number on the same roll of the die. and d. Being on time and being late for an appointment.
the alternative hypothesis
contests the status quo for which a corrective action may be required
Choose the logical binomial random variable a. The number of late shipments for two different companies b. The number of late shipments and missing shipments out of the next 12 sent c. The number of late shipments out of the next 12 sent by one company d. The number of late shipments over the next month
c. The number of late shipments out of the next 12 sent by one company
A travel web site wants to provided information comparing hotel costs versus the quality ranking of the hotel for hotels in New York City. One way to summarize this data would be a. a histogram b. a frequency distribution c. a contingency table
c. a contingency table
Which of the following should one look for when identifying a hyper-geometric application? [check all the apply] a. probability of success is constant b. sampling with replacement c. a known number of success s d. a finite population N
c. a known number of success s and d. a finite population N
When comparing two data sets with different unites of measurement, what is the relative measure of dispersion? a. the standard deviation b. the range c. the coefficient of variation
c. the coefficient of variation
The addition rule is used to calculate a. the conditional probability of two events b. The intersection of two events c. the union of of two events d. the independence of two events
c. the union of of two events
which is not a step we use when formulating the null and alternative hypotheses
calculate the value of the sample statistic
Redundancy
can increase system reliability even when individual component reliability is low
for hotels in NYC, at site wants to provide info comparing hotel costs versus the quality ranking of the hotel. A useful way to summarize this data is to construct a(n)
contingency table
a random variable X with an equally likely chance of assuming any value within a specified range is said to have which distribution
continuous uniform distribution
suppose you were told that the delivery time of your new washing machine is equally likely over the time period 9am-12. If we define the random variable X as delivery time, then X follows the
continuous uniform distribution
the two equivalent methods to solve a hypothesis test are the
critical value approach p-value approach
data collected about many subjects at the same point in time or without regards to differences in time is known as ___ data
cross-sectional
contingency table
cross-tabulations of frequencies
for a continuous random variable X, the function used to find the area under f(x) up to any value x is called the
cumulative distribution function
when a researcher examines quantitative data and wants to know the number of observations that fall below the upper limit of a particular class, the researcher is best served by creating a ____
cumulative frequency distribution
Compound events
cumulative probabilities can be evaluated by summing individual X probabilities.
Calculate the marginal probability that a country club member plays tennis. T No T Totals G 180 No G 50 100 Totals 230 400 a. 0.25 b. 0.125 c. 0.45 d. 0.575
d. 0.575 230/400 = 0.575
Calculate the conditional probability that a country club member plays golf given that they play tennis. T No T Total G 180 No G 50 100 Total 230 400 a. 0.5 b. 0.45 c. 0.125 d. 0.783
d. 0.783
suppose the competing hypotheses for a test are Ho: u <= 10 vs Ha: u >10. If the value of the test statistic is 1.90 and the CV at the 1% sig level is z 0.01 = 2.23, then the correct conclusion is
do not reject Ho and conclude that the pop mean does not appear to be greater than 10 at the 1% sig level
total area under the normal curve is
equal to 1
a particular value of an estimator is called an
estimate
all are conditions of the binomial experiment (bernoulli process) except
for each trial, the probability of success equals the probability of failure
Method of medians
for small data sets, you can fine quartiles this way. 1. sort observations 2. find the median Q 3. find the median of the data values that lie below Q 4. find the median of the data values that lie above Q2
a ____ is a way to organize qualitative data into categories and record the number of observations in each category
frequency distribution
to summarize qualitative data, a useful tool is a
frequency distribution
random variable
function or rule that assigns a numerical value to each outcome in the sample space of a random experiment
The _______ mean is the multiplicative average of the data set.
geometric
The ____________ mean is the appropriate measure to use when evaluating growth rates.
geometric
in descriptive stats, a polygon is a
graph that plots the midpoints of each class of a frequency distribution
there are several guidelines to follow when constructing graphs that summarizes statistical data. Which statement is LEAST accurate
graphs should have a lot of adornments
Unusual (z-score classification)
if absolute value of z1>2 (beyond u +/- 2o)
actuarially fair
insurance program must collect as much overall revenue as it pays out in claims. Premiums must be set to reflect empirical experience with the insured group.
The _____________ of two events A and B contains only those outcomes that are in both A and B.
intersection
which scales of data measurement are associated with quantitative data
interval and ratio
event
is any subset of outcomes in the sample space
a continuous random variable has the uniform distribution on the interval [a,b] if its probability density function f(x)
is constant for all x between a and b, and 0 otherwise
a compound event
is expressed using an inequality.
Variance Var(X) of a discrete random diviation
is the sum of the squared deviation about its expected value, weighted by the probability of each X-value.
Weighted mean
is the sum that assigned each data value a weight w1 that represents a fraction of the total (i.e. the k weights must sum to 1)
which best describes a frequency distribution for qualitative data
it groups data into categories, and records the number of observations in each category
The measure of center where half the values of the data set lie above this measure and half the values of the data set lie below this measure is known as the
median
measure of central location where half the values of the data set lie above this measure and half the values of the data set lie below this measure is known as the
median
the ___ is the best measure of central location when outliers are present
median
Bayes Theorem
method of revising probabilities to reflect new information. Prior (unconditional) probability of an event B is revised after event A has occurred to yield a posterior (conditional) probability.
The _________ is the measure of the center that identifies the most frequently occurring value in the data set.
mode
hyper-geometric distribution
similar to the binomial except that sampling is without replacement from a finite population of N items. Therefore the trails are not independent and the porbability of success is not constant from trial to trail.
the variance of x, which is equal to ó^2/n, is
smaller than the variance of the individual observation ó^2
put the following steps in the p-value approach to hypothesis testing in the correct order.
specify the null and alternative hypothesis specify the significance level calculate the value of the test statistic and its p-value state the conclusion and interpret results
standard deviation of p- equals
square root p(1-p)/n
a continuous random variable X follows the uniform distribution with a lower limit of a and an upper limit of b. The __ of X is calculated using the formula square root (b-a)^2 / 12
standard deviation
General law of addition
the probability of the union of two events A and B is the sum of their probabilities less the probability of their intersection
which of the following is an example of a conditional probability
the probability that Lisa purchases groceries, given that Neil has already purchased groceries
the z table provides the cumulative probabilities for a given z. What does "cumulative probabilities" mean
the probability that Z is less than or equal to a given z value
When a data set is symmetrical the mean and the median are approximately
the same
for an alternative hypothesis of Ha: u > uo, we might possibly reject the null hypothesis if
the sample mean is greater than uo
the central limit theorem states that the distribution of the sample mean will be approximately normal if
the sample size is sufficiently large; as a general guideline n>= 30
T/F: we choose a value for a before conducting a hypothesis test
true
Box plots
useful for EDA (exploratory data analysis) - center (position of the median Q2). This plot shows varaibility (width of the box) defined by Q1 and Q3 and the range between xmin and xmax. A box plot shows shape
which is likely to produce a discrete uniform distribution
values from repeated wheel spins at a casino
The dean of the business school at a local university categorizes students by major (i.e., accounting, finance, marketing, etc.) to help in determining class offerings in the future.
nominal
Mesokurtic
normal bell shaped population
which is an example of a continuous random variable?
normal random variable
most accurate
normally distributed, 95% of data will fall within 2 SDs of the mean
a two tailed test of the population mean is conducted at a=0.10. The calculated test statistic is z=1.55 and P(Z>=1.55)=0.0606. The null should
not be rejected since the p-value = 0.1212 > 0.10
when testing u, the p-value is the probability of obtaining a sample mean at least as large or at least as small as the one derived from a given sample, assuming the ___ hypothesis is true
null
the p-value is calculated assuming the
null hypothesis is true
when performing a hypothesis test on u, the p-value is defined as the
observed probability of making a type 1 error
we use sample data because
obtaining data from the population is often expensive
which of the following graphical depictions displays cumulative data
ogive
how does an ogive differ from a polygon?
ogive is a graph of a cumulative (relative) frequency distribution, while a polygon is a graph of a (relative) frequency distribution
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the value of the test statistic
-1.75 ± 0.04
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the critical value at α = 0.01.
-2.33 ± 0.01
Which of the following events are mutually exclusive? [either this will happens, or that?]
-Receiving an 'A' and receiving a 'B' as a final grade in an Accounting class. -Being on time and being late for an appointment Not: -Being of German descent and being of Mexican descent. -Passing a stats test and passing an english test.
Recognizing a Poisson application
-occurs randomly over time or space -average arrival rate remains constant -arrivals are independent of each other -The random variable X is the number of event within an observed time interval
suppose you are performing a hypothesis test on u and the value of ó is known. At the 10% sig level, the critical value(s) for a left tailed test is (are)
-z0.10
The following table represents the number of cartons of milk that childrenat Hoover-Wood Elementary school purchase for lunch. Milk Probability 0 .2 1 .7 2 3+ .02 The probability that a student purchases 2 milk cartons is ____________.
.08
cartons of milk that the children at Hoover-wood school purchase for lunch The probability that a student purchases 2 milk cartons is
.08
If a randomly selected worker slept on the job, what is the probability that he/she is an IT professional?
.3772
What is the probability that a randomly selected worker is an IT professional?
.4286
If a randomly selected worker is a government professional, what is the probability that he/she slept on the job?
.64
What is the probability that a randomly selected adult is either overweight or obese?
.688
A company received an average of .64 purchase orders per minute. Assuming a Poisson distribution for the number of purchase order per minute, what is the standard deviation for this distribution? a. 1.28 b. 2 c. 0.8 d. 0.41
.8 [The standard deviation is the square of the mean in a Poisson distribution.]
in a particular industry, it is known that 82% of companies ship their products by truck and 47% of companies ship their product by rail. 40% of companies ship by truck and rail. The probability that a company ships by truck or rail is
.89
Loans that are 60 days or more past due are considered seriously delinquent. The Mortgage Bankers Association reported that the rate of seriously delinquent loans has an average of 9.1% (The Wall Street Journal, August 26, 2010). Let the rate of seriously delinquent loans follow a normal distribution with a standard deviation of 0.80% Above 8%? Bet 9.5% and 10.5%
.9162 +- 0.005 0.2684 +- 0.005
find probability Z is greater than -2.22
.9868
for a discrete probability distribution, the probability of each value x is
0 <_ P(X=x)<_1
Center for Studying Health System Change At least seven will delay or go without medical care
0.0001 ± 0.001
XYZ Corp. has filled 100,000 purchase order during its existence. 1,100 of the purchase orders have had errors. Using empirical probability, the probability of the next purchase order having an error is __________ (round your answer to three decimal place and enter as a probability not a percentage).
0.011 1100/100,000 = 0.011
The screening process for detecting a rare disease is not perfect. Researchers have developed a blood test that is considered fairly reliable. It gives a positive reaction in 98% of the people who have that disease. However, it erroneously gives a positive reaction in 3% of the people who do not have the disease. Answer the following questions using the null hypothesis as "the individual does not have the disease." Type 2 error prob
0.02
The screening process for detecting a rare disease is not perfect. Researchers have developed a blood test that is considered fairly reliable. It gives a positive reaction in 98% of the people who have that disease. However, it erroneously gives a positive reaction in 3% of the people who do not have the disease. Answer the following questions using the null hypothesis as "the individual does not have the disease." -Type 1 error
0.03
The High Roller Casino puts the odds of a certain baseball team winning the World Series at 1 to 30 (1:30). Based on those odds, what is the probability that this baseball team will win the WS?
0.032 (1/30?)
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -standard error
0.0384 ± 0.0013
data set has a pop SD of 4 units and a pop mean of 10 units, the coeff of var is
0.4
randall racer runs 100 m dash in an average 10.4 sec with SD of 0.1 sec. Bell shaped, what proportion of his time will fall between 10.3 and 10.5 secs
0.68
The probability of a customer purchasing popcorn at the movie theater is 0.3. What is the probability that a customer does not purchase popcorn?
0.7
prob that a customer orders popcorn is 0.4. The prob that the order a drink is 0.65. The prob that they order popcorn and a drink is 0.3. If they have already ordered popcorn, what is the prob that they will order a drink?
0.75
The probability that Anthony is on time for work is 0.90. The probability that Anthony takes the train to work is 0.80. Given that Anthony takes the train to work, the probability that he is on time is 0.95. The probability that Anthony is on time for work and takes the train is
0.76 P(on time ∩ train) = P(on time | train) x P(train) = 0.95 x 0.80 = 0.76
Center for Studying Health System Change No more than two will delay or go without medical care?
0.7969 ± 0.001
Kareem is trying to decide which college to attend full time next year. Kareem believes there is a 55% chance that he will attend State College and a 33% change that he will attend Northern University. The probability that Kereem will attend either State or Northern is _________ [state your answer as a decimal and round your answer to two decimal places.]
0.88
In a particular industry, it is known that 82% of companies ship their products by truck and 47% of companies ship their product by rail. Forty percent of companies ship by truck and rail. The probability that a company ships by truck or rail is
0.89 0.82 + 0.47 - 0.40 = 0.89
in order: steps to calculate the mean absolute deviation
1) calc the arithmetic mean 2) find the absolute diff between each value and the mean 3) sum the absolute differences 4) divide by the sample (or the population) size
Match the following terms with their meaning: 1. Mesokuric 2. Platykurtic 3. Leptokurtic ___A flatter distribution than normal with heavier tails __ Normal bell-shaped distribution -- A sharply peaked distribution with thinner tails
1. Mesokutic is normal bell-shaped distribution 2. Platykurtic is a flatter distribution than normal with heavier tails 3. Leptokirtic is sharply peaked distribution with thinner tails
find z value that satisfies P(Z>z) = 0.0951
1.31
z values that satisfies p(Z<_z) = 0.9207
1.41
Center for Studying Health System Change Expected number of individuals who will delay or go without medical care?
1.6 ± 1
zα/2 for 90%
1.65
exam to 50 students. High was 98, low was 48. Frequency distribution is divided into 5 classes. Class width for the data is
10 points
a population has a mean of 100 and a SD of 10. A random sample of 25 is selected. The expected value of x- is equal to
100
XYZ. 1% of 100,000 widgets are defective. Value is
1000
A random variable X has μ = 25 and σ = 5. A new random variable Y = 5X. The mean of Y is ____________.
125 5(25) = 125
The maximum value of a data set is 200 and the minimum value is 80. The midrange is equal to ____________.
140
A company sold 1000 units in its first year of operation, 1400 units in its second year of operation, and 1680 units in the third year of operation. The average growth rate of the company's sale for years one to three is ______%. (round your final answer to a decimal answer with four places and then convert to % with 2 decimals)
29.61% square root of (1680/1000) - 1 = 0.2961
A festival has become so popular that is must limit the number of tickets it issues. People who hope to attend the festival send in a request for tickets, and requests are filled by random selection. Only 21% of the ticket requests are fulfilled. The odds that a random applicant does not receive a ticket are
3.76 to 1 - The odds against A occurring equal 1-P(A)/P(A)
Suppose a data set has 80 data points. A 5% trimmed mean would be calculated by removing the ___________ highest values and the __________ lowest values.
4 4
20% of a restaurant's customers order the chef's special. 230 customers are anticipated to dine tonight at the restaurant. The expected number of chef's specials that will be ordered tonight is ______________.
46 230 x 0.2 = 46
Suppose 20% of a business's employees commute by bus. How many employees will have to be sampled in order to find the first employee who commutes by bus?
5 1/.2 = 5
If the median price for a home is $200,000, than _______% of homes cost less than $200,000.
50
For the data set 4,5,6, and 9 the arithmetic mean is
6
The median for the data set 10, 6, 4, 9, 5 is
6
median for data set 10,6,4,9,5
6
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -What is the interquartile range of this distribution
670 ± 5%
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -25th percentile of the amount spent on a debit card
7,455 +- 2.5%
A data set has a mean of 1500 and a standard deviation of 100. Using Chebyshev's theorem, what percentage of the observations fall between: 1300 and 1700? 1100 and 1900?
75% 94%
a 95% confidence interval for the mean value of a store's customer accounts is computer as $850 +- 70, then the null hypothesis of a two tailed hypothesis test would be rejected if the value of uo is less than $____ or greater than $____
780, 920
The range for the data set: 2, 5, 5, 7, and 10 is _____________.
8
data set, 2,5,5,7,10. range is
8
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -75th percentile of the amount spent on a debit card
8,125 ± 2.5%
contingency table was created by a middle school to determine the relationship between a student taking a foreign language and whether the student plays in the school band. The number of students that do not play in the band is
90
The empirical rule states that approximately ________% of observations will fall within two standard deviations of the mean.
95.44%
XYZ. 1% of 100,000 widgets are defective. Variance is
990
for a binomial random variable X, the probably of x successes in n Bernoulli trials is calculated as
= (n!/x!(n-x!)) * p^x(1-p)^(n-x)
Midrange)
=0.5*(MIN(Data)+MAX(Data))
Geometric mean (G)
=GEOMEAN(Data)
Sample standard deviation (s)
=STDEV.S(Data)
Standard deviation
=STDEV.S(Data)
Sample variance (s^2)
=VAR.S(Data)
Variance
=VAR.S(Data)
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -What is the conclusion?
Do not reject H0 since the p-value is greater than α
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -What is the conclusion at the 5% significance level?
Do not reject H0 since the p-value is greater than α.
Independent
Each occurrence has no effect on the probability of the other events occuring
A subset of the sample space is an __________
Event
True False: 0! = 0
False (0! = 1)
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. He takes a random sample of 150 Americans and computes the average sleep time of 6.7 hours on weekdays. Assume that the population is normally distributed with a known standard deviation of 2.1 hours -Select the relevant null and the alternative hypotheses
H0: μ ≥ 7; HA: μ < 7
a quality control officer believes that the average time of use for AAA batteries differs from the claimed 8.5 hours. The QC take a random sample of 30 AAA batteries and finds that the sample mean is 8.7 hours. State the null and alt hypothesis for testing the claim
Ho: u = 8.5 Ha: u not equal 8.5
specify the competing hypotheses that would be used to determine whether the population mean is less than 150
Ho: u >= 150 vs Ha: u <150
an auditor for a small company suspects that the mean customer account balances have fallen below $550 per month, the bag amount for all customer accounts over the past 5 years. She takes a random sample of 40 accounts and computes the sample mean as $543. State the hypothesis for testing the auditors claim
Ho: u >= 550 and Ha: u < 550
specify the competing hypotheses that would be used in order to determine whether the population mean differs from 15
Ho: u=15, versus Ha: u =/ 15
Special law of addition
If A and B are exclusive events then the general addition law can be simplified to the sum of the individual probabilities for A and B
when constructing classes for frequency distribution of quantitative data, which of the following principles should generally be followed
In general, the classes should be the same width. The classes should be mutually exclusive. data should only fit in one class The classes should be exhaustive
Are the events "overweight" and "obese" exhaustive?
No because you may not be either overweight or obese
in hypothesis testing, two incorrect decisions are possible
Not rejecting the null hypothesis when it is false Rejecting the null hypothesis when it is true
which scenarios use the nominal scale
Noting the racial composition of an undergraduate classroom Designating males as 1 and females as 2
Uniform distribution
One of the simplest discrete models. It describes a random variable with a finite number of consecutive integer values from a to b. The entire distribution depends on only two parameters.
if X has a normal distribution with u=100 and ó=5, then the prob P(90<X<95) can be expressed in terms of the standard normal random variable Z as
P(-2<Z<-1)
if X has a normal distribution with u=100 and ó=5, then the prob P(100<X<110) can be expressed in terms of the standard normal random variable Z as
P(0<Z<2)
The addition rule for two events A and B is **formula
P(A∪B)= P(A)+P(B)-P(A∩B)
sample space S= (win,loss).
P(win)=.8 ; P(loss)=.2
Fine the first and third quartiles from the following data set using the method of medians: 2, 3, 3, 5, 6, 8, 12.
Q1=3, Q3=8
Standardized date
Redefine each observation in terms of its distance from the mean in standard deviations
Law of Large Numbers
The as the number of trials increases any empirical probability approaches its theoretical limit.
Range (R)
The difference between the largest and smallest observations. =MAX(Data)-MIN(Data)
Which of the following is NOT an example of an experiment?
The winner of last weeks lottery drawing. (has only one outcome) Examples: (multiple outcomes) -asking someone who they think will win the world series -the number of computers that will be sold next month at a comp. store -selecting a card from a deck of cards
T/F: when constructing a joint probability table, the cell in the lower right corner must always equal 1.0
True
True or false: υ^2 represents the variance
True
Center for Studying Health System Change Variance and the standard deviation
V 1.2800 +- .0005 SD 1.1314 +- .0005
Are the events "overweight" and "obese" mutually exclusive
Yes because you cannot be both overweight and obese
An economist reports that 506 out of a sample of 1,200 middle-income American households actively participate in the stock market. -Can we conclude that the proportion of middle-income Americans who actively participate in the stock market is not 50%
Yes, since the confidence interval does not contain the value 0.50
An article in the National Geographic News argues that Americans are increasingly skimping on their sleep -Can we conclude with 95% confidence that the mean sleep time of all adult residents in this Midwestern town is not 7 hours?
Yes, since the confidence interval does not contain the value 7.
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -At α = 0.05, do you reject the null hypothesis?
Yes, since the p-value is smaller than α
which is true about a sample statistics such as the sample mean or sample proportion
a sample statistic is a random variable
the significance level is the probability of making
a type 1 error
The following relative frequency histogram summarizes the salaries (in $1,000,000s) for the 30 highest-paid players in the National Basketball Association (NBA) for the 2012 season (www.nba.com, data retrieved March 2012).
a) positively skewed b) 3 earned bet 20-24 mill c) 26 earned bet 12-20 mill
The average number of customers arriving at Jimmy's Burgers in a minute is 1.7. Which expression would one use to calculate the probability that at least 4 customers arrive in a randomly chosen minute? a. 1-P(X</=3) b. P(C</=3) c. 1-P(X</=4) d. P(X>4)
a. 1-P(X</=3)
When monitoring a process distribution, both the ________ and the ____________ must be tracked.
center, variability
which of the following is true
a=the prob of committing a type 1 error. B=the prob of committing a type 2 error
The intersection of events A and B, denoted A∩B, contains
all outcomes that are in A and B
if an experiment is selecting a card from a deck of cards, then the sample space is
all the cards in the deck
The following contingency table can help determine if residents' location is independent of whether or not the resident purchases a season pass to the pool. [The pool is located on the West side.] East West Total SPass 35 50 85 NPass 25 10 35 Total 60 60 120 Which comparison of probabilities would you choose to show dependence between a resident's location and whether or not they purchased a season pass? a. P(Season pass and lives West) and P(Lives East) b. P(Season pass/Lives West) P(Season Pass) c. P(Season Pass/Lives East) P(Season Pass) d. P(Season Pass and Lives East) and P(no Season Pass)
b. P(Season pass/Lives West) P(Season Pass) and c. P(Season Pass/Lives East) P(Season Pass)
which can be used to determine the proportion of data points that fall within a specified number of standard deviations from the mean
chebyshev's theorem
game show contestant chooses between $1500 in cash or a hidden prize. Prize is worth thousands or nothing. The expected value of the prize is 2500. If contestant is risk neutral, they will
choose the prize because the expected value is higher than the cash value
Which of the following are examples of conditional probabilities? [check all the apply] a. The probability of Amir purchasing a video game or the probability of Natasha purchasing a video game. b. The probability of Marilyn going to the football game and Tom going to the football game. c. If Neil has already purchased groceries, then the probability of Colleen purchasing groceries. d. The probability of Angel going to the movie, given that Derrick is going to the movie.
c. If Neil has already purchased groceries, then the probability of Colleen purchasing groceries. and d. The probability of Angel going to the movie, given that Derrick is going to the movie.
Which of the following random variables meets the criteria for a hyper-geometric distribution? a. The average number of adults who have a graduate degree is 0.7/household. Let X be the number of adults in a household who have a graduate degree. b. Suppose 30% of the population have a graduate degree. Define X to be the number of adults in a sample of 20 who have earned a graduate degree. c. Out of 50 adults, 10 who have a graduate degree. A sample of 20 is taken. Define X to be the number of adults in the sample with a graduate degree.
c. Out of 50 adults, 10 who have a graduate degree. A sample of 20 is taken. Define X to be the number of adults in the sample with a graduate degree.
for a continuous random variable X, the number of possible values
cannot be counted
Actuarial science
estimating empirical probabilities
the central limit theorem states that, for any distribution, as n gets larger, the sampling distribution of the sample mean becomes
closer to a normal distribution
The ________ formula is used to determine the number of different ways to arrange a group of (x) objects from a total of (n) objects and the order of the objects is irrelevant
combination
The ________ formula is used to determine the number of different ways to arrange a group of x objects from a total of n objects and the other order of the objects is irrelevant.
combination
relative frequency distributions are generally more useful than frequency distributions when
comparing data sets of the same size
For any given event, the sum of the probability of that event and the probability of its ____________ must equal one.
complement
the inverse transformation, x = u +zó is used to
compute x values for given probabilities
A tree diagram has ___________ probabilities at the terminal end of each branch.
conditional
a ____ probability is the prob of an event given that another event has already occurred
conditional
parameter
constant
The sum of the probabilities of all the outcomes in the sample space is: a. Impossible to determine without more information b. 0 (zero) c. 0.5 (one half) d. 1 (one)
d. 1 (one)
A festival has become so popular that it must limit the number of tickets it issues. People who hope to attend the festival send in a request for tickets, and requests are filled by random selection. Only 21% of the ticket requests are fulfilled. what are the odds of not receiving a ticket for a random applicant? a. 3.34 to 1 b. 3.88 to 1 c. 2.98 to 1 d. 3.76 to 1
d. 3.76 to 1 1-0.21/0.21 = 3.76
Which of the following is not an example of an experiment? a. Picking a team that will wine the World Series b.Buying a computer base on repair history. c. Selecting a care from a deck of cards. d. Pick the team that won last year's World Series.
d. Pick the team that won last year's World Series.
Using the multiplication rule, the joint probability of even A and event B is computed by multiplying the conditional probability of event A given event B by the probability of: a. event B given event A b. event A c. the union of A and B d. event B
d. event B
The correlation coefficient describes the degree of ______ between two ________ variables. a. nonlinearity; qualitative b. nolinearity; quantitative c. linearity; qualitative d. linearity; quantitative
d. linearity; quantitative
branch of stats that summarizes impt aspects of a data set is
descriptive statistics
"the number of ppl in a household." This variable is best categorized as a
discrete variable
Bimodal or Multimodal
distribution occurs when dissimilar populations are combined into one sample.
Emperical
estimated from observed outcome frequnecy
Using the multiplication rule, the probability that event A and event B both occur is computed by multiplying the conditional probability of event A given event B by the probability of
event B
Complement (A')
everything in the sample space S except event A
Percentile
ex. 83rd percentile means 83% of the test takers are below you
Events that cannot occur at the same time are mutually ________ events.
exclusive
Events that include all outcomes in the sample space are known as _________ events
exhaustive
events that include all outcomes in the sample space are known as ____ events
exhaustive
if X is normally distributed with expected value u and SD ó, then x- is normally distributed with
expected value u and SD ó/square root n
trial, or process, that produces several possible outcomes is referred to as a(n)
experiment
for a continuous random variable, one characteristic of its probability density function f(x) is that
f(x) >_ 0 for all values x of X
T/F. the expected value and the variance of the standard normal random variable Z are both zero
false
T/F; a discrete random variable can assume an uncountable number of values
false
True or false: Chebyshev's Theorem should only be applied to data sets that are normally distributed.
false
chebyshevs theorem should only be applied to normal distributed data sets. T/F
false; all
chebyshevs theorem provides the proportion of observations that lie within k SDs of the mean. the value k must be
greater than 1
discrete random variable
has a countable number of distinct values. Some random variable have a clear upper limit and others do not
graphical tool best used to display the relative frequency of grouped, quantitative data
histogram
which of the following graphical depictions is used for observing the spread of the data for a single variable
histogram
Special law of multiplication
if events A and B are independent then P(A and B) = P(A)P(B)
Outlier (z-score classification)
if the absolute value of z1>3 (beyond u +/- 3o)
Collectively exhaustive
if their union is the entire sample space S (all the events that could possibly occur)
one of the primary goals of constructing a frequency distribution for quantitative data is to summarize the data
in a manner that accurately depicts the data as a whole
the probability that a customer will purchase a product is 0.15. The probability that a customer is a male is 0.5. The prob that a customer is a male and will purchase a product is 0.075. The events purchasing a product and being a male are
independent
branch of stats that draws conclusions about a large set of data based on a smaller set of data is
inferential statistics
branch of stats that uses statistics to estimate a population parameter or test a hypothesis about such a parameter is best referred to as
inferential statistics
for a continuous random variable X, how many distinct values can it assume over an interval
infinite
the probability distribution of a discrete random variable is called its probability
mass function
all our characteristics of the normal distribution except
it is a discrete distribution
Empirical Rule says that data from a normal data distribution, we expect the interval u +/- ko to contain a known percentage of the data:
k=1 68.26% will lie within u +/- 1o k=2 95.44% will line within u+/- 2o k=3 99.73 % will line within u+/- 3o
Chebyshev's Theorem says that for any population with mean u and standard deviation o:
k=2 at least 75% will lie within u+/-2o k=3 at least 88.9% will line withing u +/- 3o k=4 at least 93.8% will lie within u +/- 4o
the further a population deviates from p=0.50, the ___ sample size required in order to satisfy a normal approximation
larger
stem and leaf. Stem consists of the ___ and the leaf consists of the ___
leftmost digits; last digit
the interval scale of measurement is
less sophisticated than the ratio scale
in general, the variability between sample means is ___ the variability between observations
less than
if the value of the test statistic falls in the rejection region, the p-value must be
less than a
for a continuous random variable X, the cumulative distribution function F(x) provides the probability that X is
less than or equal to any value x
Independence
look at conditional probabilities to determine independence
most widely used measure of central location
mean
owner of a grocery store wanted to determine the brands of soda that customers purchase at the store. When summarizing the data, the meaningful measure of central location is the
mode
when summarizing a qualitative data set, the __ is the best measure of central location
mode
the ordinal scale of data measurement is
more sophisticated than the nominal scale
mode
most frequently occurring value of a data set
in general, the null and alternative hypothesis are
mutually exclusive
Binary events
mutually exclusive collectively exaustive
events that cannot both occur on the same trial of an experiment are
mutually exclusive events
the sum of the probabilities of a list of mutually exclusive and exhaustive events is
one
simple event
or elementary event is a single outcome
An analyst assigns a sample of bond issues to one of the following credit ratings, given in descending order of credit quality (increasing probability of default): AAA, AA, BBB, BB, CC, D.
ordinal
qualitative data that can be categorized and ranked are measured on the
ordinal scale
we can reject the null when the
p-value < a
one method of graphical presentation for qualitative data is a
pie chart
expected value of a distribution is also referred to as the
population mean
u
population mean
ó^2
population variance
selection bias occurs when
portions of the population are excluded from the consideration for the sample
based on the pictured histogram (going down from left to right), the distribution is
positively skewed, skewed right
numerical value that measures the likelihood of an uncertain event is a
probability
the probability distribution of a continuous random variable is called its
probability density function
relative frequency distribution for quantitative data identifies the
proportion of observations that occur in each class
the expected value of p- is the
proportion successes in the population
a variable that is described verbally rather than numerically is called a
qualitative variable
all are examples of cross-sectional data except
quarterly sales for a computer company for the last 5 years
statistic
random variable
An investor collects data on the weekly closing price of gold throughout a year.
ratio
least accurate
ratio scale is used for a qualitative variable
if an event is getting a letter grade of A in your stats class, what is the complement of receiving an A
receiving any grade except an A
Subjective approch
reflects informed judgement about the likelihood of an event. This method is needed when there is no repeatable random experiment.
suppose the competiting hypotheses for a test are Ho: u<= 33 vs Ha: u>33. If the p-value for the hypothesis test is 0.027 and the chosen level of significance is 0.05, then the correct conclusion is
reject Ho and conclude that the population mean is greater than 33 at the 5% sig level
in hyp testing, if the sample data provides significant evidence that the null hyp is incorrect, then we
reject the null hyp
the type 1 error occurs when we
reject the null hypothesis when it is actually true
in hypothesis testing, two correct decisions are possible
rejecting the null hypothesis when it is false not rejecting the null when it is true
the critical value approach specifies a region of values, called the ___. If a test statistic falls into this region, we reject the ___
rejection region, null hypothesis
when calculating the probability of x successes in n trials of a binomial experiment, the probability of success and the probability of failure
remain the same, even when a probability is calculated for a different value of x
a simple random sample is a sample of observations that is
representative of the population from which it was chosen
hypothesis testing enables us to determine if the collected ___ data is inconsistent with what is stated in the null hypothesis
sample
population consists of all items of interest in a statistical problem, whereas a ___ is a subset of a pop
sample
s^2
sample variance
in inferential stats, we use ____ information to make inferences about an unknown ____ parameter
sample, population
the square root of the average of the sum of squared deviations from the mean is the
standard deviation
we calculate a z-score by dividing the deviation of the sample value from the mean by the
standard deviation
in _____, the population is divided up into strata and then randomly selected observations and taken proportionately from each stratum
stratified random sampling
A probability assigned by a person that is based on that person's judgement or experience is a ___________ probability.
subjective
probability based on personal judgment rather than on observation or logical analysis is best referred to as
subjective probability
a normal random variable X is transformed into Z by
subtracting the mean, and then dividing by the SD
The mean of a Bernoulli distribution is π, called the probability of ___________.
success
The expected value of the sum of two or more random variables is equal to the _________ of their individual values.
sum
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Suppose a sample of 200 drivers under the age of 18 results in 150 who still use a cell phone while driving. What is the value of the test statistic? What is the p-value?
test stat -2.58 ± 0.20 p-value 0.0049 ± 0.005
example of inferential statistics
testing the longevity of all light bulbs based on a sample of 100 light bulbs
relationship between the variance and the SD
the SD is the positive square root of the variance
researcher wants to compare the variability of two data sets that have different units of measurement. Which measures is most useful as a relative measure of dispersion
the coefficient of variation
the null hypothesis in a hypothesis test refers to
the default state of nature
least accurate
the height of each rectangle represents cumulative frequency or cumulative relative frequency
all are examples of random variables that likely follow a normal distribution except
the number of states in the USA
which can be represented by a continuous random variable
the temp in Tampa, FL during july
the critical value of a hypothesis test is
the value that separates the rejection region from the non-rejection region
data that are collected by recording a characteristic of a subject over several time periods are referred to as
time series data
In order to convert a contingency table into a joint probability table, the frequency of each cell is divided by the
total number of outcomes in the sample space
in order to convert a contingency table into a joint probability table, the frequency of each cell is divided by the
total number of outcomes in the sample space
T/F: for a given sample size n, a type 1 error can only be reduced at the expense of a higher type 2 error
true
T/F: if we had access to data that included the entire population, then the values of the parameters would be known an no statistical inference would be required
true
T/F: the SD of a discrete probability distribution measures how dispersed the values are from the mean
true
T/F: the optimal values of type 1 and type 2 errors requires a compromise in balancing costs of each type of error
true
empirical rule should be applied to normally distributed data sets
true
compound event
two or some simple events
A prior probability for event A is the ____________ probability whereas the posterior probability of event A is the ___________ probability.
unconditional conditional
manager of women's clothing store is projecting next months sales. Her low-end estimate of sales is $25,000 and her high-end estimate is $50,000. She decides to treat all outcomes of sales between these 2 values equally likely. If we define the random variable X as sales, then X follows the
uniform distribution
the addition rule is used to calculate
union of two events
an ogive is a graph that plots the cumulative frequency, or cumulative relative freq, against the
upper limit of the corresponding class
Annual growth rates for individual firms in the toy industry tend to fluctuate dramatically, depending on consumers' tastes and current fads. Consider the following growth rates (in percent) for two companies in this industry, Hasbro and Mattel.
variability for: Hasbro: 8.61% Mattel: 6.56% greater variability? Hasbro
a characteristic of interest that differs among various observations is referred to as a ___
variable
Calculate the variance and the standard deviation of this probability distribution.
variance 31.5 SD 5.6
two widely used measures of dispersion are
variance and SD
most accurate statement about variance
variance is the avg of the squared deviations from the mean
To calculate the union for two mutually exclusive events A and B,
we add the probability of A to the probability of B
to calculate the union for two mutually exclusive events A and B
we add the probability of A to the probability of B
the expected value of the discrete random value variable X is
weighted avg of all possible values of X
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -Is the sampling distribution of the sample proportion is approximately normal
yes
for a hypothesis test of u when ó is known, the value of the test statistic is calculated as
z = (x- - uo)/ (ó/sqrt n )
when performing a hyp test on u when ó is known, Ho can never be rejected if
z >= 0 for a left-tailed test
we calculate the __ to find the relative position of a sample value within a data set
z-score