MegaStat New
Independence
look at conditional probabilities to determine independence
Chapter 2: A __________________ is a subset of a population
sample
Venn diagram
Illustration of A' and A
Chapter 1: Some experts prefer to call statistics...
data science
statistic
random variable
when rolling a pair of dice and summing the two values rolled, which of the following are exhaustive events?
A value of 6 or more and a value of 8 or less A value of 7 or more and a value of 6 or less An even and an odd value
The conditional probability of A given B is calculated by diving the intersection of A and B by the probability of
B
For any given event, the probability of that event and the probability of the _______ of the event must sum to one.
Complement
Covariance
Measures the degree to which the values of X and Y change together.
Bimodal or Multimodal
distribution occurs when dissimilar populations are combined into one sample.
General law of addition
the probability of the union of two events A and B is the sum of their probabilities less the probability of their intersection
Chapter 5: Sample space
the set of all possible outcomes
statistics is used
to make informed decisions based on data
Chapter 5: 5!=?
120
Chapter 4: The arithmetic mean is the "__________" with which most of us are familiar.
average
simple event
or elementary event is a single outcome
The mean, or average, has the property that distances from the mean to the individual points always sum to:
zero
if an experiment is selecting a card from a deck of cards, then the sample space is
all the cards in the deck
which scales of data measurement are associated with quantitative data
interval and ratio
the expected value of p- is the
proportion successes in the population
Center for Studying Health System Change None will delay or go without medical care?
0.1678 +- 0.001
randall racer runs 100 m dash in an average 10.4 sec with SD of 0.1 sec. Bell shaped, what proportion of his time will fall between 10.3 and 10.5 secs
0.68
Center for Studying Health System Change No more than two will delay or go without medical care?
0.7969 ± 0.001
an investment strategy has an expected return of 12% and a SD of 10%. If investment returns are normal, the probability of earning a return of more than 32% is closest to
2.5%
Chapter 4: Calculate the standardized score for the following data value. Assume the mean = 100 and the standard deviation = 25: x = 60, z = ?
-1.6 (100 divided by 60)
probability that Z is greater than 1.32
.0934
What is the probability that a randomly selected adult is either overweight or obese?
.688
20% of a restaurants customers order the chef's special. 230 customers are anticipated to dine at the restaurant tonight. The STANDARD DEVIATION for this binomial distribution is _____________ (round your answer to 3 decimal places)
6.066 230 x 0.2 x 0.8 = 36.8 square root of 36.8 = 6.066
If Fund A has a coefficient of 1.1 and Fund B has a coefficient of variation of 0.9, Fund ____ has the greater relative dispersion.
A
Chapter 1: True or False: Business managers want to see detailed numerical explanations in technical reports.
False
True False: 0! = 0
False (0! = 1)
Calculate the conditional probability that a country club member plays golf given that they play tennis. T No T Total G 180 No G 50 100 Total 230 400 a. 0.5 b. 0.45 c. 0.125 d. 0.783
d. 0.783
The sum of the probabilities of all the outcomes in the sample space is: a. Impossible to determine without more information b. 0 (zero) c. 0.5 (one half) d. 1 (one)
d. 1 (one)
if X is normally distributed with expected value u and SD ó, then x- is normally distributed with
expected value u and SD ó/square root n
chebyshevs theorem should only be applied to normal distributed data sets. T/F
false; all
The ____________ mean is the appropriate measure to use when evaluating growth rates.
geometric
graphical tool best used to display the relative frequency of grouped, quantitative data
histogram
The probability of State College winning a football game is 0.60 The probability of University of State winning a football game is 0.65. Given that State College has won its football game, the probability that Univ of State wins its game is 0.65. The teams are not playing each other. The events State College and U of State winning are
Independent
The probability that a customer will purchase a product is 0.15. The probability that a customer is a male is 0.5. The probability that a customer is a wale and will purchase a product is 0.075. The events purchasing a product and being a male are
Independent
Chapter 5: The __________ (one word) Of two events A and B contains only those outcomes that are in both A and B
Intersection
Median
Middle value in sorted array =MEDIAN(data)
if X has a normal distribution with u=100 and ó=5, then the prob P(100<X<110) can be expressed in terms of the standard normal random variable Z as
P(0<Z<2)
Chapter 5: Probability of the complement of a is found by subtracting the probability of A from 1
P(A') = 1-P(A)
If A and B are independent events, then
P(A)=P(A|B)
the addition rule for 2 events A and B is
P(AUB) = P(A) + P(B) - P(AnB)
The addition rule for two events A and B is **formula
P(A∪B)= P(A)+P(B)-P(A∩B)
Chapter 5: Marginal
P(S3) = 17/67 = .2537
Center for Studying Health System Change Variance and the standard deviation
V 1.2800 +- .0005 SD 1.1314 +- .0005
Special law of multiplication
if events A and B are independent then P(A and B) = P(A)P(B)
Outlier (z-score classification)
if the absolute value of z1>3 (beyond u +/- 3o)
Chapter 4: Which of the following situations are valid reasons for removing an outlier from a data set?
if the data point was typed incorrectly into the spreadsheet, if the observed value was taken from a population different from the one under study
one of the primary goals of constructing a frequency distribution for quantitative data is to summarize the data
in a manner that accurately depicts the data as a whole
Chapter 4: Midhinge
is the average of the first and third quartiles. The midhinge is always exactly halfway between Q1 and Q3, while the median Q2 can be anywhere within the "box," which suggests a new way to describe skewness:
Chapter 6: The symbol pie in the binomial PDF
is the probability of success, is in the interval [0,1]
Expected value E(X)
is the weighted average that measures center, the variance
stem and leaf. Stem consists of the ___ and the leaf consists of the ___
leftmost digits; last digit
the interval scale of measurement is
less sophisticated than the ratio scale
Chapter 5: Probability
likelihood that a particular event will occur
the probability distribution of a discrete random variable is called its probability
mass function
If the continuous uniform random variable, Xassumes the values a and bas its lower and upper a + b limits respectively, then represents the of
mean
Chapter 4: The best measure of central location when using qualitative data set is the
mode
owner of a grocery store wanted to determine the brands of soda that customers purchase at the store. When summarizing the data, the meaningful measure of central location is the
mode
mode
most frequently occurring value of a data set
which is an example of a continuous random variable?
normal random variable
most accurate
normally distributed, 95% of data will fall within 2 SDs of the mean
a two tailed test of the population mean is conducted at a=0.10. The calculated test statistic is z=1.55 and P(Z>=1.55)=0.0606. The null should
not be rejected since the p-value = 0.1212 > 0.10
the p-value is calculated assuming the
null hypothesis is true
we use sample data because
obtaining data from the population is often expensive
when testing u and ó is known, Ho can never be rejected if z <= 0 for a
right tailed test
s^2
sample variance
in inferential stats, we use ____ information to make inferences about an unknown ____ parameter
sample, population
Chapter 2: Which type of error is unavoidable when sampling from a population?
sampling error
Chapter 3: Which of the following graphical depictions allows you to examine the relationship between two variables?
scatter plot
the variance of x, which is equal to ó^2/n, is
smaller than the variance of the individual observation ó^2
Chapter 4: A box plot is constructed using several different values. Which of the following values are included in a box plot?
the second quartile, the third quartile, the smallest value
the square root of the average of the sum of squared deviations from the mean is the
standard deviation
we calculate a z-score by dividing the deviation of the sample value from the mean by the
standard deviation
in _____, the population is divided up into strata and then randomly selected observations and taken proportionately from each stratum
stratified random sampling
A probability assigned by a person that is based on that person's judgement or experience is a ___________ probability.
subjective
Mutually exclusive
their intersection is the empty set (contains no elements). One event precluded the other from occurring.
Chapter 5: Bayes' _________ is used to revise probabilities when one has new ______\\
theorem, information
T/F: for a given sample size n, a type 1 error can only be reduced at the expense of a higher type 2 error
true
Chapter 3: Which of the following is NOT a common error one should beware
unembellished charts that do not contain sound or animationSt
the addition rule is used to calculate
union of two events
Chapter 2: Data
usually are entered into a spreadsheet or database as an n × m matrix. Specifically, each column is a variable (m columns) and each row is an observation (n rows).
Chapter 3: Column
vertical display of data
Chapter 4: Multiplying data values by a fraction (where the fractions add to 1) and summing results in a ______________ mean
weighted
cluster sampling works best
when most of the variation of a population is within groups and not between groups
When monitoring a process distribution, both the ________ and the ____________ must be tracked.
center, variability
True or false: Chebyshev's Theorem should only be applied to data sets that are normally distributed.
false
XYZ. 1% of 100,000 widgets are defective. Value is
1000
Intersection
The event consisting of all outcomes in the sample space S that are contained in both event A and B.
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -What is the conclusion
The law has been effective since the value of the test statistic falls in the rejection region.
Chapter 6: choose the logical binomial random variable
The number of late shipments out of the next 12 sent by one company
the formula to calculate probability for the hypergeometric distribution, what does the denominator represent?
The number of ways a sample size n can be selected from the population of size N
Chapter 4: Which of the items below describes the usefulness of a standard deviation?
To gauge the relative position of data values within the data set
Chapter 1: True or False: Statistics support critical thinking by helping one identify illogical conclusions or to see holes in another's argument.
True
a continuous random variable X can assume
an infinite number of values over some interval
probability that a discrete random variable X assumes a particular value x is
between 0 and 1
Chapter 5: If 230 out of 400 country club members play tennis, we would place 230 in the following contingency table in the cell with the letter
c
Chapter 3: Stacked Dot Plot
can be used to compare two or more groups.
Events that include all outcomes in the sample space are known as _________ events
exhaustive
The measure of center where half the values of the data set lie above this measure and half the values of the data set lie below this measure is known as the
median
Leptokurtic
peaked more sharply than normal
the type 1 error occurs when we
reject the null hypothesis when it is actually true
a continuous random variable X follows the uniform distribution with a lower limit of a and an upper limit of b. The __ of X is calculated using the formula square root (b-a)^2 / 12
standard deviation
probability based on personal judgment rather than on observation or logical analysis is best referred to as
subjective probability
which of the following is an example of a conditional probability
the probability that Lisa purchases groceries, given that Neil has already purchased groceries
The mode for the data set: 4, 5, 6, 9 is
there is no mode
Chapter 6: In which of the following ways are binomial distributions and hypergeometric distributions similar?
they are both discrete distributions, they have two possible outcomes success or failure
histograms can be used to
Determine the shape of the data. Observe the spread or the variability of the data
How many outcomes of an experiment constitute a simple event? a. more than one b. one c. two d. none
b. one
4, 5,6,9 mode
none
Chapter 4: The ___________ measures the difference between the smallest and largest values in a data set.
range
standard deviation of p- equals
square root p(1-p)/n
Chapter 4: Center
typical or middle values, where the data values are concentrated
average score on a stats exam was 75 with a standard dev of 15. if you scored a 60, your z-score is
-1
Calculate the standardized score for the following data value. Assume the mean = 100 and the standard deviation = 25: x=60, z=_____.
-1.6
zα/2 for 88%
1.56
Chapter 4: Median
(denoted M) is the 50th percentile or midpoint of the sorted sample data set x1, x2, ..., xn. It separates the upper and lower halves of the sorted observations
Chapter 4: Population variance
(denoted σ 2, where σ is the lowercase Greek letter "sigma") is defined as the sum of squared deviations from the mean divided by the population size:
Chapter 6: An apartment complex rents an average of 2.3 new units per week. If the number of apartments rented each week has a Polsson distribution, then the probability of renting exactly three apartments in a week is
(e^-2.3)(2.3^3) / 3! = 0.203
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -A consumer advocate comments that the majority of consumers spend over $8,000 on a debit card. Find a flaw in this statement.
0.3372 ± .005
5!=?
120 5x4x3x2x1
A random variable X has... 25....5.....5x... Y = ?
125
Chapter 5: P(A)=0
Event cannot occur
Chapter 5: Complement
P(A) + p(A') = 1
Symmetric data
The mean and median are about the same. Tails of the histogram are balanced (low/high values offset) mean [almost equals] median
Skewed right (positively skewed)
The mean exceeds the median. Long tail of histogram points right (most data on left but a few high values) Mean>Median
one method of graphical presentation for qualitative data is
bar chart
77777 One characteristic of a well-defined probability density function of a continuous random variable X is that the area under the curve, f(x) over all values of x is
equal to one
Platykurtic
A population that is flatter than normal
Sample correlation coefficient
A well-know statistic that describes the degree of linearity between paired observations on two quantitative variable X and Y
For hotels in NYC, a travel web site wants to provide information comparing hotel costs (high,average,low) versus the quality ranking of the hotel (excellent, good,fair,bad). A useful way to summarize this data is to construct a
Contingency table
parameter
constant
Nadia purchased 400 shares of XYZ stock at $20 per share. When the stock decreased in value to $16 a share, Nadia purchased 600 more shares of XYZ stock. The weighted average price per share that Nadia paid for XYZ stock is $_____ (use 2 decimal places).
$17.60
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Test using the critical value approach with α = 0.05 Critical Value:
-1.64 ± .01
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the value of the test statistic
-1.75 ± 0.04
suppose you are performing a hypothesis test on u and the value of ó is known. At the 10% sig level, the critical value(s) for a left tailed test is (are)
-z0.10
Which of the following events are mutually exclusive? [either this will happens, or that?]
-Receiving an 'A' and receiving a 'B' as a final grade in an Accounting class. -Being on time and being late for an appointment Not: -Being of German descent and being of Mexican descent. -Passing a stats test and passing an english test.
The following table represents the number of cartons of milk that childrenat Hoover-Wood Elementary school purchase for lunch. Milk Probability 0 .2 1 .7 2 3+ .02 The probability that a student purchases 2 milk cartons is ____________.
.08
cartons of milk that the children at Hoover-wood school purchase for lunch The probability that a student purchases 2 milk cartons is
.08
Chapter 6: The following table represents the number of cartons of milk that the children at Hoover-Wood Element school purchase for lunch.
.08 1.0-0.2-07-0.02
An apartment complex rents an average of 2.3 new units per week. If the number of apartments rented each week has a Poisson distribution, then the probability of renting exactly three apartments in a week is __________ [round your answer to 3 decimal places and do not enter a percentage]
.203 (e^-2.3)(2.3^3)/3! = 0.203
Chapter 5: Calculate the marginal probability that a country club member does not play golf
.25 100/400 = .25
The average number of customers arriving at Jimmy's Burgers is 1.7 per minute. What is the probability that only 1 customer arrives in the next minute? ______________ (Round to three decimals and do not enter a percentage.)
.311 1.7^1e^-1.7/1! = .311
What is the probability that a randomly selected adult is neither overweight nor obese?
.312
Professor Stats has 40 students in her statistics class. 24 of the students are male. If she randomly selects 6 of her students at, without replacement, the probability of selecting 4 men in the sample is _________ [round your answer to three decimal places.)
.332
What is the probability that a randomly selected passenger car gets more than 35 mpg?
.3429
What is the probability that a randomly selected worker is an IT professional?
.4286
Chapter 5: calculate the marginal probability that a country club member plays tennis
.575 230/400=.575
If a randomly selected worker is a government professional, what is the probability that he/she slept on the job?
.64
find probability Z is greater than -2.22
.9868
Chapter 5: XYZ Corp. has filled 100,000 purchase orders during its existence. 1,100 of the purchase orders have had errors. Using an empirical probability, the probability of the next purchase order having an error is __________ (round your answer to three decimal places and enter as a probability not a percentage)
0.011
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -standard error
0.0384 ± 0.0013
What is the probability that the average mpg of 4 randomly selected passenger cars is more than 35 mpg?
0.1714 ± 0.01
random samples of 400 are taken from a population whose pop proportion is 0.25. The expected value of the sample proportion is
0.25
The probability of randomly selecting a "spade" from a deck of cards is
0.25 (because 4 suits in a deck)
Chapter 6: The average number of customers arriving at Jimmy's Burgers in a minute is 1.7. Which expression would one use to calculate the probability that at least 4 customers arrive in a randomly chosen minute?
1 - P(X<_ 3)
in order: steps to calculate the mean absolute deviation
1) calc the arithmetic mean 2) find the absolute diff between each value and the mean 3) sum the absolute differences 4) divide by the sample (or the population) size
The maximum value of a data set is 200 and the minimum value is 80. The midrange is equal to ____________.
140
Chapter 4: The maximum value of a data set is 200 and the minimum value is 80. The midrange is equal to...
140 (80-200 / 2 = 60, 80+60 = 140 and 200-60 = 140)
Chapter 4: Pat's time in the 1600 meter run placed Pat in the 85th percent in the school. what percentage of students are faster than Pat?
15
Chapter 2: Discrete (example)
A variable with a countable number of distinct values is (the number of dots face up on a roll of a pair of die)
PDF - probability distribution function
Defined by either a list of X-values and their probabilites or by mathematical equations. A discrete PDF shows the probability of each X-value
stem and leaf diagrams can be used to
Determine how dispersed the data is. Analyze the shape of the data Observe individual data points
Chapter 5: Union
Two events consist of all outcomes in the sample space capital S that are contained either an event A or an event B or in both
Chapter 3: In general, the ___________ limit is included in the bin, while the __________ limit is excluded.
lower, upper
Binary events
mutually exclusive collectively exaustive
7777 Binomial probabilities can be difficult to calculate, and therefore appropriate to approximate,
n is large
Chapter 6: Which of the following criteria indicate it would be acceptable to approximate the hypergeometric with the binomial?
n/N < .05
77777 Which of the following is an example of a continuous random variable?
normal
Which of the following can be used to determine the proportion of data points that fall within a specified number of standard deviation from the mean? a. The mode b. percentiles c. Chebyshev's Theorem d. The empirical rule - assuming a normal distribusion
c. Chebyshev's Theorem and d. The empirical rule - assuming a normal distribusion
Chapter 4: The summary measures for grouped data are
only approximate values
Chapter 1: Distribution of inventory items in a big box store
operations management
An analyst assigns a sample of bond issues to one of the following credit ratings, given in descending order of credit quality (increasing probability of default): AAA, AA, BBB, BB, CC, D.
ordinal
qualitative data that can be categorized and ranked are measured on the
ordinal scale
based on the pictured histogram (going down from left to right), the distribution is
positively skewed, skewed right
A numerical value that measures the likelihood of an uncertain even occurring is a ________________.
probability
game show contestant chooses between $1500 in cash or a hidden prize. Prize is worth thousands or nothing. The expected value of the prize is 2500. If contestant is risk neutral, they will
choose the prize because the expected value is higher than the cash value
suppose you were told that the delivery time of your new washing machine is equally likely over the time period 9am-12. If we define the random variable X as delivery time, then X follows the
continuous uniform distribution
Chapter 1: Data science
a trilogy of tasks involving data modeling, analysis, and decision making.
the significance level is the probability of making
a type 1 error
suppose the competiting hypotheses for a test are Ho: u<= 33 vs Ha: u>33. If the p-value for the hypothesis test is 0.027 and the chosen level of significance is 0.05, then the correct conclusion is
reject Ho and conclude that the population mean is greater than 33 at the 5% sig level
in hyp testing, if the sample data provides significant evidence that the null hyp is incorrect, then we
reject the null hyp
in hypothesis testing, two correct decisions are possible
rejecting the null hypothesis when it is false not rejecting the null when it is true
the critical value approach specifies a region of values, called the ___. If a test statistic falls into this region, we reject the ___
rejection region, null hypothesis
Chapter 3: The vertical (y axis) for an ogive can be labeled as...
relative cumulative frequency, cumulative frequency
a simple random sample is a sample of observations that is
representative of the population from which it was chosen
Chapter 4: The interquartile range of a data set
represents the middle 50% of the data is calculated by subtracting the first quartile from the third quartile
an example of cross-sectional data
results of market research testing consumer preferences for soda
a ___ is a subset of a population
sample
hyper-geometric distribution
similar to the binomial except that sampling is without replacement from a finite population of N items. Therefore the trails are not independent and the porbability of success is not constant from trial to trail.
Chapter 5: to calculate a union for two mutually exclusive events,
simply add the individual probabilities of the two events
Chapter 2: Which of the following are examples of the nominal scale?
social security numbers, specialty sandwich names at a fast food restaurant, designating males as 1 and females as 2 to compare gender performance on an aptitude test.
Chapter 4: Place the steps for using the method of medians in finding quartiles in the proper order
sort the observations, find the median of the entire data set Q2, find the median of the data values that lie below Q2
Chapter 1: A company code of ethics addresses things such as (check all that apply)
sources of data inaccuracy, conflicts of interest, policies on confidentiality
Chapter 3: Pareto chart
special type of column chart used in business. displays categorical data, with categories displayed in descending order of frequency, so that the most common categories appear first.
put the following steps in the p-value approach to hypothesis testing in the correct order.
specify the null and alternative hypothesis specify the significance level calculate the value of the test statistic and its p-value state the conclusion and interpret results
Chapter 4: Variability
spread of data values or dispersion
A coach believes laurie has a 0.5 prob of getting a hit against a pitcher that she has never batted against before. This type of probability is best characterized as
subjective probability
The mean of a Bernoulli distribution is π, called the probability of ___________.
success
The expected value of the sum of two or more random equal to the _______ their individual expected values
sum
Chapter 4: The numerical measure, 0xy, is used frequently by financial frequency portfolio managers. This measure is called the
covariance
the two equivalent methods to solve a hypothesis test are the
critical value approach p-value approach
contingency table
cross-tabulations of frequencies
branch of stats that summarizes impt aspects of a data set is
descriptive statistics
The following relative frequency histogram summarizes the salaries (in $1,000,000s) for the 30 highest-paid players in the National Basketball Association (NBA) for the 2012 season (www.nba.com, data retrieved March 2012).
a) positively skewed b) 3 earned bet 20-24 mill c) 26 earned bet 12-20 mill
Define event A = {1, 2, 3, 4} and event B = {2, 3, 6, 7}. A or B = a. {1, 2, 3, 4, 6, 7} b. 5 c. {1, 4, 6, 7} d. {2, 3}
a. {1, 2, 3, 4, 6, 7}
the conclusions of a hypothesis test that are drawn from the p-value approach versus the critical value approach are
always the same
Chapter 3: An outlier is defined as...
an extreme value that is located at the tail of the histogram (Mean < Median), an extreme value that might have arisen from a different cause (Mean ≈ Median), an extreme value is might have arisen from measurement error (Mean > Median).
Chapter 3: Relative frequencies
are calculated as the absolute frequency for a bin divided by the total number of data values.
Chapter 6: Discrete random variables
are countable, can have finite or infinite number of values, have a set of distinct values
Chapter 4: Quartiles
are scale points that divide the sorted data into four groups of approximately equal size, that is, the 25th, 50th, and 75th percentiles, respectively.
Chapter 2: Numerical data (also called quantitative data)... example
arise from counting, measuring something, or some kind of mathematical operation. For example, we could count the number of auto insurance claims filed in March (e.g., 114 claims) or sales for last quarter (e.g., $4,920), or we could measure the amount of snowfall over the last 24 hours (e.g., 3.4 inches).
Chapter 6: Which of the following is an example of a binomial experiment
asking randomly selected people if they are a member of Facebook, ask customers at a movie theater if they spent $20 or more on concessions
a type 2 error occurs when we
do not reject the null hypothesis when it is actually false
relationship between the variance and the SD
the SD is the positive square root of the variance
if two events do not influence each other, the events are
independent
branch of stats that draws conclusions about a large set of data based on a smaller set of data is
inferential statistics
actuarially fair
insurance program must collect as much overall revenue as it pays out in claims. Premiums must be set to reflect empirical experience with the insured group.
The _____________ of two events A and B contains only those outcomes that are in both A and B.
intersection
Mode
Most frequently occurring data value. =MODE.SNGL(data)
7777 Which Excel expression would be used to find the X value associated with the highest 10% if has a mean = 100 and a standard eviation = 20
NORM.INV(.99,100,20)
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Can you conclude that the machine is working improperly
No
Are the events "overweight" and "obese" exhaustive?
No because you may not be either overweight or obese
in hypothesis testing, two incorrect decisions are possible
Not rejecting the null hypothesis when it is false Rejecting the null hypothesis when it is true
The five nines rule
Prime business customers expect public carrier-class telecommunications data links to be available 99.999% or the times. This equates to 5 min downtime per year.
A numerical value that measures the likelihood of an uncertain event is a ________?
Probability
Chapter 2: Observation
is a single member of a collection of items that we want to study, such as a person, firm, or region. An example of an observation is an employee or an invoice mailed last month.
Chapter 4: Standard deviation
is a single number that helps us understand how individual values in a data set vary from the mean. Because the square root has been taken, its units of measurement are the same as X (e.g., dollars, kilograms, miles).
event
is any subset of outcomes in the sample space
Interquartile range
The first and third quartiles Q1 and Q3 indicate the center because they define the boundaries for the middle 50% of the data, but Q1 and Q3 also indicate variability because the interquartile range Q3-Q1 (IQR) measure the degree of spread in the data (middle 50%)
77777 It is meaningful to compute the probability that a continuous random variable
is between two numbers, is less than or equal to the number, is greater than or equal to a number
Chapter 4: Trimmed mean
is calculated like any other mean, except that the highest and lowest k percent of the observations in the sorted data array are removed. The trimmed mean mitigates the effects of extreme high values on either end. For a 5 percent trimmed mean, the Excel function is =TRIMMEAN(Data, 0.10) because .05 + .05 = .10. For the J.D.
a continuous random variable has the uniform distribution on the interval [a,b] if its probability density function f(x)
is constant for all x between a and b, and 0 otherwise
Chapter 3: If the data were collected from a random sample we must allow for __________ error.
Sampling
all our characteristics of the normal distribution except
it is a discrete distribution
Empirical Rule says that data from a normal data distribution, we expect the interval u +/- ko to contain a known percentage of the data:
k=1 68.26% will lie within u +/- 1o k=2 95.44% will line within u+/- 2o k=3 99.73 % will line within u+/- 3o
Chebyshev's Theorem says that for any population with mean u and standard deviation o:
k=2 at least 75% will lie within u+/-2o k=3 at least 88.9% will line withing u +/- 3o k=4 at least 93.8% will lie within u +/- 4o
CDF - cumulative distribution distribution function
The CDF shows the cumulative sum of probabilities, adding from the smallest to the largest X-value.
Chapter 4: When the data are skewed right (or positively skewed)...
the mean exceeds the median.
Chapter 4: When calculating a mean for grouped data
the midpoint of each bin is used to approximate the individual values in that bin
Chapter 4: Mode
the most frequently occurring data value.
Chapter 3: The length of a bar of height of a column in a bar or column chart represents the ______________ of a category.
value
Chapter 2: The sample size is determined by the _____________ in the population of the interest and desired ______________ parameter being estimated.
variability, precision
an example of a random variable that closely followed the normal distribution is
weight of a box of cookies
Chapter 4: Coefficient of variation
which is a unit-free measure of dispersion:
the probability distribution of the sample mean is commonly referred to as the
x-
The z score associated with the highest 10% is closest to
1.28
Variance
=VAR.S(Data)
Range (R)
The difference between the largest and smallest observations. =MAX(Data)-MIN(Data)
a type 1 error is commonly denoted as
a (alpha)
Chapter 5: Simple event
an elementary event, a single outcome
we can reject the null when the
p-value < a
example of inferential statistics
testing the longevity of all light bulbs based on a sample of 100 light bulbs
Trimmed mean
Same as the mean except omit highest and lowest k% of data values (e.g., %) =TRIMMEAN(Data, Percent)
Suppose a data set has 80 data points. A 5% trimmed mean would be calculated by removing the ___________ highest values and the __________ lowest values.
4 4
Chapter 4: The mode(s) for the data set: 4, 4, 5, 6, 9, 9 is
4 and 9
Chapter 4: If a company sold 1000 units in its first year if operation, and 1400 units in its second year of operation, then the growth rate of the company's sales is _______ %
40
Chapter 6: 20% of a restaurant's customers order the chef's special. customers are anticipated to dine tonight at the restaurant. The expected number of chef's specials that will be ordered tonight is
46 (230x0.2=46
20% of a restaurant's customers order the chef's special. 230 customers are anticipated to dine tonight at the restaurant. The expected number of chef's specials that will be ordered tonight is ______________.
46 230 x 0.2 = 46
Chapter 5: which statement given below is correct
A and B are dependent events because P(B | A) ≠ P(B) or 35/60 ≠ 85/120 A and B are dependent events because P(A | B) ≠ P(A) or 35/85 ≠ 60/120
Chapter 2: Continuous (example)
A numerical variable that can have any value within an interval is (the finishing times for running the 100 meter dash)
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Calculate the critical value(s) at a 5% level of significance
+- 2.03 ± 0.001
Chapter 6: In the binomial distribution which expression represents the probability of failure?
1-PIE SYMBOL
Chapter 6: A company receives an average of 0.8 purchase orders per minute. The company wants to determine the probability of receiving 6 purchase orders in 5 minutes. What value should the company use for the mean to help alculate the probability?
4
For a continuous random variable X with a = 10 and b = 20, P(X <= 14) =
4
new dining plan. residents were asked to give a grade of A, B, C, D, F to the new plan. Of 482 residents, 234-A, 148-B, 87-C, 9-D. How many F's
4
find z value that satisfies P(Z>z) = 0.0951
1.31
Chapter 4: The population standard deviation of the data set 3, 4, 5, 6, and 7 is ________________. (round your final answer to 1 decimal place)
1.4
z values that satisfies p(Z<_z) = 0.9207
1.41
Chapter 4: The sample standard deviation of the data set 3, 4, 5, 6, and 7 is ____________.
1.6
Center for Studying Health System Change Expected number of individuals who will delay or go without medical care?
1.6 ± 1
zα/2 for 90%
1.65
Chapter 4: The range for the data set: 2, 5, 5, 7, and 10 is:
8
Midrange)
=0.5*(MIN(Data)+MAX(Data))
Geometric mean (G)
=GEOMEAN(Data)
Sample standard deviation (s)
=STDEV.S(Data)
Standard deviation
=STDEV.S(Data)
Sample variance (s^2)
=VAR.S(Data)
we do not reject the null hypothesis when the p-value is
>= a
Binomial shape
A binomial distribution is skewed right if Pi<.50 and skewed left if Pi>.50 and symmetric only if Pi=.50 Use =BINOM.DIST(x, n, pi, cumulative)
Chapter 5: A travel website wants to provide information comparing hotel costs versus the quality ranking of the hotel for hotels in New York City. One way to summarize this data would be...
A contingency table
Consider a random variable X that denotes a random delivery time anywhere between 9 am and 10 am . X would reasonably be
A continuous uniform random variable
Bernoulli experiment
A random experiment that has only two outcomes.
random experiment
An observational process whose results cannot be known in advance
How does an ogive differ from a polygon?
An ogive is a graphical depiction of a cumulative frequency or cumulative relative frequency distribution, while a polygon is a graphical depiction of a frequency or relative frequency distribution
Chapter 5: Which theorem provides a method for revising probabilities to reflect new information ?
BAYES THEOREM
Chapter 4: Place the following steps in order, from beginning to end, to create a box plot
Calculate the five number summary values plot the five-number summary values in numerical order on a horizontal or vertical axis, draw a box from Q1 to Q3, then add lines from Q1 to the minimum value and Q3 to the maximum value
Chapter 2: Census or Sample A) Budget constraints can make this necessary. B) Legal requirements sometimes mandate this. C) An examination of all items in a population D) Looking at only selected items in a population
Census: B and C Sample: A and D
Empirical approach / relative frequency approach
Counting the frequency of observed outcomes (f) defined in our experimental sample space and dividing by the number of observations (n). The estimated probability if f/n.
for a discrete random variable, the variance of X is calculated as
E (xi-u)^2 P(X=xi)
Independent
Each occurrence has no effect on the probability of the other events occuring
A subset of the sample space is an __________
Event
Chapter 5: A subset of the sample space is a/an
Event
Chapter 5: Events that cannot both occur at the same time are mutually ________ events
Exclusive
A trial, or process, that produces several possible outcomes is referred to as an_______
Experiment
The _______ formula is used to determine the number of possible ways to arrange (n) items when there are no groups
Factorial
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Select the null and the alternative hypotheses to test the policy analyst's objective.
H0: p ≥ 0.82; HA: p < 0.82
when constructing classes for frequency distribution of quantitative data, which of the following principles should generally be followed
In general, the classes should be the same width. The classes should be mutually exclusive. data should only fit in one class The classes should be exhaustive
Chapter 6: Generally, a hypergeometric distribution will be used instead of a binomial distribution when
Independence of the trials is uncertain, there is a finite population, sampling is done without replacement
avg of the absolute differences between the values of the data set and the mean
MAD mean absolute deviation
Chapter 4: Median < Midhinge ⇒ Skewed right (longer right tail) Median ≅ Midhinge ⇒ Symmetric (tails roughly equal) Median > Midhinge ⇒ Skewed left (longer left tail)
Median < Midhinge⇒ Skewed right (longer right tail) Median ≅ Midhinge⇒ Symmetric (tails roughly equal) Median > Midhinge⇒ Skewed left (longer left tail)
Chapter 5: Random experiment
Observational process whose results cannot be known in advance
Chapter 2: Coding (Example)
On occasion the values of the categorical variable might be represented using numbers (a database might code payment methods using numbers: 1 = cash 2 = check 3 = credit/debit card 4 = gift card).
For two events A and B, the multiplication rule is
P(A∩B)= P(A|B)xP(B)
Chapter 1: Which of the following are examples of inferential statistics?
Prof. Stats randomly selects 50 female students at State University to estimate the average height of all female students at State. A manufacturer of light bulbs randomly selects 100 light bulbs to test the longevity of all light bulbs that the company produces.
Excel quartiles
Q1: =QUARTILE.EXC(Data 1) (25% below) Q2 =QUARTILE.EXC(Data2) (50% below) Q3 =QUARTILE.EXC(Data3) (75% below)
Fine the first and third quartiles from the following data set using the method of medians: 2, 3, 3, 5, 6, 8, 12.
Q1=3, Q3=8
A variable that assigns numerical values to the outcomes of a random experiment
Random
Chapter 5: A trial, or process, that produces several possible outcomes that cannot be known in advance is called an/a ......
Random experiment
Standardized date
Redefine each observation in terms of its distance from the mean in standard deviations
Chapter 6: 20% of a restaurants customers order the chef's special230 customers are anticipated to dine at the restaurant tonight. The standard deviation for this binomial distribution is (round your answer to 3 decimal places).
SQUARE ROOT... 230x0.2x0.8 = 6.066
Chapter 2: Systematic Sample
Select every kth item from a list or sequence (restaurant customers).
Chapter 2: Cluster Sample
Select random geographical regions (e.g., zip codes) that represent the population.
Chapter 2: Stratified Sample
Select randomly within defined strata (by age, occupation, gender).
Chapter 2: Binary Variable (example)
Some categorical variables have only two values (employment status, employed or unemployed, mutual fund type, load or no-load), and marital status, currently married or not currently married...Binary variables are often coded using a 1 or 0. a variable such as gender could be coded as: 1 = female 0 = male).
z-score
Standardized values that can tell us how far away from the mean each observation lies
Poisson distribution (model of rare events)
The number of occurrences within a randomly chosen unit of time or space. All characteristics are determined by its mean λ. The standard divination of the square root is its mean.
Chapter 2: Which of the following characteristics of interest is a variable?
The number of pizzas ordered from Pizza Hut per day.
7777 The probability that a continuous random variable takes on a particular value is zero because of which of the following reasons?
The area under a curve capital AT a certain point is zero
Law of Large Numbers
The as the number of trials increases any empirical probability approaches its theoretical limit.
Chapter 3: Which characteristic below is NOT a rule of thumb for displaying categorical data on a column chart?
The height of each column should be the same
To calculate the variance of the sum of two independent random variables
The individual variances are summed
Skewed left (negatively skewed)
The mean is below the median. Long tail of histogram points left (a few low values but most data on right) Mean<median
Chapter 6: In which of the following situations is it appropriate to use a Poisson process?
The number of customers who purchase concessions every five minutes, while a movie is playing at a theater The number of landfills per county in the state of Texas
Joint Probability
The probability A and B
Chapter 5: Which of the following are examples of conditional probabilities?
The probability of aunt Joe going to the movie, given that Derek is going to the movie. if Neil has already purchased groceries, then the probability of cam purchasing groceries.
Chapter 3: A relative frequency distribution for quantitative data identifies...
The proportion of observations that occur in each bin
Exercise 1-5 Recent research suggests that depression significantly increases the risk of developing dementia later in life (BBC News, July 6, 2010). In a study involving 949 elderly persons, it was reported that 22% of those who had depression went on to develop dementia, compared to only 17% of those who did not have depression.
The sample consists of 949 elderly people The population is all elderly people The numbers 22% and 17% represent the sample statistics
Mean absolute deviation (MAD)
This statistic reveals the average distance from the center. Absolute values must be used since otherwise the deviations around the mean would be zero. =AVEDEV(Data)
Chapter 4: True or False: The empirical rule should be applied to data sets that are normally distributed or nearly normally distributed.
True
Chapter 5: True or false: probability is a number that describes uncertainty
True
T/F: when constructing a joint probability table, the cell in the lower right corner must always equal 1.0
True
Chapter 2: Simple Random Sample (and example)
Use random numbers to select items from a list (Visa cardholders).
classical approach
Using deduction to determine probability
Chapter 2: XYZ Corporation made a profit of #3 million last year. ABC Corporation made a profit of #6 million last year. Based on the ratio scale, which of the following is an accurate statement about the relationship between ABC's profit and XYZ's profit?
XYZ was half as profitable as ABC.
7777 It is appropriate to approximate the Poisson distribution with the normal distribution when
Y (upside down) is large
Are the events "overweight" and "obese" mutually exclusive
Yes because you cannot be both overweight and obese
An economist reports that 506 out of a sample of 1,200 middle-income American households actively participate in the stock market. -Can we conclude that the proportion of middle-income Americans who actively participate in the stock market is not 50%
Yes, since the confidence interval does not contain the value 0.50
An article in the National Geographic News argues that Americans are increasingly skimping on their sleep -Can we conclude with 95% confidence that the mean sleep time of all adult residents in this Midwestern town is not 7 hours?
Yes, since the confidence interval does not contain the value 7.
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -At α = 0.05, do you reject the null hypothesis?
Yes, since the p-value is smaller than α
coefficient of variation
a relative measure of dispersion
One common way to describe a Poisson process is a. the model of departure times. b. the model of arrivals c. the model of dependent events
b. the model of arrivals
Chapter 4: or small data sets, you can find the quartiles using the method of medians:
Step 1: Sort the observations. Step 2: Find the median Q2. Step 3: Find the median of the data values that lie below Q2. Step 4: Find the median of the data values that lie above Q2.
Chapter 3: Cumulative relative
frequencies accumulate relative frequency values as the bin limits increase.
a ____ is a way to organize qualitative data into categories and record the number of observations in each category
frequency distribution
random variable
function or rule that assigns a numerical value to each outcome in the sample space of a random experiment
Chapter 3: Bar
horizontal display of data
the sample size required to approximate the normal distribution depends on
how much the population varies from normality
Chapter 4: If covariance is positive, then as one variable increases the other variable will generally
increase
An investor collects data on the weekly closing price of gold throughout a year.
ratio
Chapter 1: Descriptive statistics
refers to the collection, organization, presentation, and summary of data (either using charts and graphs or using a numerical summary).
The screening process for detecting a rare disease is not perfect. Researchers have developed a blood test that is considered fairly reliable. It gives a positive reaction in 98% of the people who have that disease. However, it erroneously gives a positive reaction in 3% of the people who do not have the disease. Answer the following questions using the null hypothesis as "the individual does not have the disease." Type 2 error prob
0.02
The screening process for detecting a rare disease is not perfect. Researchers have developed a blood test that is considered fairly reliable. It gives a positive reaction in 98% of the people who have that disease. However, it erroneously gives a positive reaction in 3% of the people who do not have the disease. Answer the following questions using the null hypothesis as "the individual does not have the disease." -Type 1 error
0.03
The High Roller Casino puts the odds of a certain baseball team winning the World Series at 1 to 30 (1:30). Based on those odds, what is the probability that this baseball team will win the WS?
0.032 (1/30?)
Chapter 5: The sports book at the highroller Casino put the odds of a certain baseball team to win the World Series at one; 25 (1 to 25). Based on those odds, what is the probability that this baseball team will win the World Series?
0.0385 1 / (1+25) equal to 0.0385
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the p-value to test the researcher's claim at α = 0.01
0.0401 ± 0.004
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -Approximate the p-value.
0.05< p-value < 0.02
at a local diner, 30% order the special. What is the probability that exactly 1 of the next 5 order the special.
0.36
data set has a pop SD of 4 units and a pop mean of 10 units, the coeff of var is
0.4
Chapter 5: The highest possible probability, of the choices below, for an event is
1.0
True or false: A Bernoulli experiment has three possible outcomes.
false
The odds against a horse winning a race were set at 7 to 1. The probability of that horse not winning the race is ___________ . Answer should be in decimal form, using 3 decimal places.
0.875 (7/1+7)= 0.875
Kareem is trying to decide which college to attend full time next year. Kareem believes there is a 55% chance that he will attend State College and a 33% change that he will attend Northern University. The probability that Kereem will attend either State or Northern is _________ [state your answer as a decimal and round your answer to two decimal places.]
0.88
Chapter 5: 82% of companies ship their products by truck. 47% of companies ship their product by rail. 40% of companies shipped by truck and rail. What is the probability that a company ships by truck or rail?
0.89
In a particular industry, it is known that 82% of companies ship their products by truck and 47% of companies ship their product by rail. Forty percent of companies ship by truck and rail. The probability that a company ships by truck or rail is
0.89 0.82 + 0.47 - 0.40 = 0.89
A list of the top twenty restaurants in Chicago was released. Four of the restaurants specialize in seafood. If five restaurants are selected randomly from the list, the expected value for the number of restaurants specializing in seafood is _______________.
1
A list of the top twenty restaurants in Chicago was released. Four of the restaurants specialize in seafood. five of the restaurants are selected randomly from the listthe expected value for the number of restaurants specializing in seafood is
1
Contingency tables are useful to analyze [check all the apply] a. one quantitative variable b. relative frequencies c. the results of a survey d. one qualitative variable
b. relative frequencies and c. the results of a survey
The range is the difference between a. the first and third quartiles b. the largest and smallest values c. the mean and the median
b. the largest and smallest values
A box plot is constructed using several different values. Which of the following values from a data set are included in a box plot? a. the fifth quartile b. the largest value c. the first quartile d. the mean e. the mode
b. the largest value and c. the first quartile
Which of the items describes the usefulness of a standard deviation? a. to compare variables with different unites of measure b. to gauge the relative position of data values within the data set c. to help estimate the mean d. to determine the median of the data set
b. to gauge the relative position of data values within the data set
which best represents an empirical probability
based on past data, a manager believes there is a 70% chance of retaining an employee for at least one year
Chapter 3: Pie chart
because of their visual appeal, pie charts appear daily in company annual reports and the popular press
777777 The probability that a discrete random variable equals any of its values is
between zero and one, inclusive
Chapter 2: In sampling, ____________ refers to over or underestimate a population parameter of interest.
bias
The probability of randomly selecting a "spade" from a deck of cards is a. 0.33 b. 0.019 c. 0.25 d. 0.077
c. 0.25
Chapter 2: Data Set
consists of all the values of all of the variables for all of the observations we have chosen to observe.
Chapter 2: Data set
consists of all the values of all of the variables for all of the observations we have chosen to observe.
the alternative hypothesis
contests the status quo for which a corrective action may be required
for hotels in NYC, at site wants to provide info comparing hotel costs versus the quality ranking of the hotel. A useful way to summarize this data is to construct a(n)
contingency table
a random variable X with an equally likely chance of assuming any value within a specified range is said to have which distribution
continuous uniform distribution
Calculate the marginal probability that a country club member plays tennis. T No T Totals G 180 No G 50 100 Totals 230 400 a. 0.25 b. 0.125 c. 0.45 d. 0.575
d. 0.575 230/400 = 0.575
Chapter 4: Quartiles divide the data into __________ equal parts.
four
Chapter 1: Statistic
is a single measure, reported as a number, used to summarize a sample data set
The _________ is the measure of the center that identifies the most frequently occurring value in the data set.
mode
when summarizing a qualitative data set, the __ is the best measure of central location
mode
the ordinal scale of data measurement is
more sophisticated than the nominal scale
in general, the null and alternative hypothesis are
mutually exclusive
Chapter 4: For a population (N items or infinite) they are
parameters
conditional probability
probability of event A given B has occured
least accurate
ratio scale is used for a qualitative variable
if an event is getting a letter grade of A in your stats class, what is the complement of receiving an A
receiving any grade except an A
Chapter 1: Inferential statistics
refers to generalizing from a sample to a population, estimating unknown population parameters, drawing conclusions, and making decisions.
Chapter 3: The rectangles of a histogram...
represent grouped data, represent the class width and frequency/relative frequency of the respective class, are drawn with no space gaps between them except when there is no data in a particular bin
a continuous random variable X follows the uniform distribution with a lower limit of a and an upper limit of b. The expected value of X is
(a+b)/2
Chapter 4: Geometric mean
(denoted G) is a multiplicative average, obtained by multiplying the data values and then taking the nth root of the product. This is a measure of central tendency used when all the data values are positive (greater than zero).
Quartiles
(denoted Q1, Q2, Q3) are scale points that divided the sorted data into four groups of approximately equal size, that is, the 25th, 50th, and 75th percentile respectively
Chapter 6: An apartment complex rents an average of 2.3 new units per week. If the number of apartments rented each week is Poisson distributed, then the probability of renting no more than 1 apartment in a week is
(e^-2.3)(2.3^1) / 1! + (e-2.3)(2.3^0) / 0!
in order to approximate class width for a frequency distribution of quantitative data, we calculate:
(largest value - smallest) / n
Chapter 5: The phone relative frequencies help determine if a residence location affects whether the residents get season passes to the pool located on the west side. What is the probability that a resident lives on the west side and does not have a season pass?
0.1
Chapter 5: The probability of a customer purchasing popcorn at the movie theater is 0.3. What is the probability that a customer will not purchase popcorn?
0.7
prob that a customer orders popcorn is 0.4. The prob that the order a drink is 0.65. The prob that they order popcorn and a drink is 0.3. If they have already ordered popcorn, what is the prob that they will order a drink?
0.75
An analyst has developed the following probability distribution of the rate of return for a common stock. Rate of return is
1%
Chapter 4: A certain value has a standardized score = 1.75. How many standard deviations from the mean does this value fall? Is this value greater than or less than the mean?
1.75 greater than the mean
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -test statistic
2 ± 0.50
The median of an exponential distribution with lambda = 0.3 arrivals per minute is
2.3
zα/2 for 98%
2.33
Chapter 4: A company sold 1000 units in its first year of operation, 1400 units in its second year of operation, and 1680 units in the third year of operation. The average growth rate of the company's sales from years 1 to 3 is _______ %
29.61 (2 above square root... in square root, 1680 / 1000, out of square root, - 1, which equals = .2961)
median for data set 10,6,4,9,5
6
An article in the National Geographic News argues that Americans are increasingly skimping on their sleep. A researcher in a small Midwestern town wants to estimate the mean weekday sleep time of its adult residents. He takes a random sample of 80 adult residents and records their weekday mean sleep time as 6.4 hours. Assume that the population standard deviation is fairly stable at 1.8 hours. -Calculate a 95% confidence interval for the population mean weekday sleep time of all adult residents of this Midwestern town
6.01 ± 0.02 to 6.79 ± 0.02
Chapter 6: Generally, a positive covariance between assets^ prime ' returns in a portfolio indicates
A greater risk in investing in the portfolio
Chapter 2: Match the survey type to an issue for consideration. A) Mail B) Telephone C) Interviews D) Web 1) Expect low response rates due to a poorly targeted list of people. 2) Growing in popularity but need to be well-targeted. 3) Can be expensive but result are often high-quality. 4) Requires a well targeted, current list of people.
A) 4 B) 1 C) 3 D) 4
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Make an inference
Americans do not sleep less than the recommended 7 hours of sleep
The relative frequency of an event is used to calculate what type of probability?
An empirical probability
Chapter 5: Three views of probability and their meaning
Empirical; estimated from observed outcome frequency, example there's a 3.2% chance of twins in a randomly chosen birth classical; known a prioriti by the nature of the experiment, example there is a 50% chance of heads on a coin flip subjective; based on informed opinion or judgment, example there is a 60% chance that Toronto will bid for the 2024 winter Olympics
Chapter 4: True or False: the standard deviation can be a negative value
False
Chapter 3: Which variables could be displayed on a log scale?
GDP, real estate values in a fast moving market
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively. -Select the null and the alternative hypotheses to determine if the machine is working improperly, that is, it is either underfilling or overfilling the cereal boxes.
H0: µ = 1.20; HA: µ ≠ 1.20
the alternative hypothesis for a two sided test for a population mean would be denoted as
Ha: u =/ (not equal) uo
Chapter 4: mean
It is the sum of the data values divided by the number of data items. For a population we denote it μ, while for a sample we call it x⎯⎯x¯.
Chapter 3: Set bin limits
Just as choosing the number of bins requires judgment, setting the bin limits also requires judgment. For guidance, find the approximate width of each bin by dividing the data range by the number of bins: Bin width≈ xmax-xmin / k
Are the events "IT Professional" and "Slept on the Job" independent?
No because P("IT" | "Yes") ≠ P("IT").
Uniform distribution
One of the simplest discrete models. It describes a random variable with a finite number of consecutive integer values from a to b. The entire distribution depends on only two parameters.
sample space S= (win,loss).
P(win)=.8 ; P(loss)=.2
Chapter 5: The following contingency table can help determine if a residence location is independent of whether or not the resident purchases a season pass to the pool. The pool is located on the west side. For which comparison of the probabilities would you choose to show dependent switching a residence location and whether or not they purchased a season pass
P, season pass and lives west. P, seasons pass P, seasons pass and lives east. P, seasons pass
Chapter 5: Event
Subset of outcomes in the sample space
Chapter 5: A conditional probability is calculated by dividing the probability of the intersection of two events by the probability of...
The given event
Kurtosis
The relative length of the tails and the degree of concentration in the center
sample space
The set of all possible outcomes (S) for the experiment
Which of the following is NOT an example of an experiment?
The winner of last weeks lottery drawing. (has only one outcome) Examples: (multiple outcomes) -asking someone who they think will win the world series -the number of computers that will be sold next month at a comp. store -selecting a card from a deck of cards
Chapter 1: True or False: Successful businesses expect their employees to have some knowledge of statistics.
True
Chapter 1: True or False: There are generally accepted statistical methods for dealing with missing data and unusual data.
True
True or false: E(X)=u
True
True or false: υ^2 represents the variance
True
Chapter 3: Dot plots can show which features of a data set?
center, variability, shape
total area under the normal curve is
equal to 1
The _______ mean is the multiplicative average of the data set.
geometric
Chapter 1: The specialized vocabulary of statistics crosses __________ barriers to improve problem solving for multinational businesses.
language
events that cannot both occur on the same trial of an experiment are
mutually exclusive events
77777 The z score associated with the lowest 2.5% of a normal distribution will be
negative
A Poisson process is used to observe the number of occurrences
overtime or in space
one method of graphical presentation for qualitative data is a
pie chart
Chapter 4: The first step to determine the median is to
place the data in numerical order
all possible outcomes of an experiment is the ___ space
sample
hypothesis testing enables us to determine if the collected ___ data is inconsistent with what is stated in the null hypothesis
sample
The set of all possible outcomes from a random experiment is called a _________ __________.
sample space
Chapter 3: A __________ ____________ is a type of data that allows researchers to investigate the relationship between two variables.
scatter plot
At a small firm in Boston, seven employees were asked to report their one-way commute time (in minutes) into the city. Their responses were.
shortest: 20 longest: 90 mean: 45 median:40 mode:35
Chapter 2: The __________ population contains all the individuals in which one is interested.
target
the null hypothesis in a hypothesis test refers to
the default state of nature
which can be represented by a continuous random variable
the temp in Tampa, FL during july
7777 Which of the following is an example of a discrete random variable?
binomial
Chapter 4: The first quartile and third quartile is also the
center, variability
ó
population SD
a compound event
is expressed using an inequality.
Annual growth rates for individual firms in the toy industry tend to fluctuate dramatically, depending on consumers' tastes and current fads. Consider the following growth rates (in percent) for two companies in this industry, Hasbro and Mattel.
variability for: Hasbro: 8.61% Mattel: 6.56% greater variability? Hasbro
a characteristic of interest that differs among various observations is referred to as a ___
variable
most accurate statement about variance
variance is the avg of the squared deviations from the mean
to calculate the union for two mutually exclusive events A and B
we add the probability of A to the probability of B
the expected value of the discrete random value variable X is
weighted avg of all possible values of X
An example of random variable that closely follows the normal is
weights of newborn babies
for a hypothesis test of u when ó is known, the value of the test statistic is calculated as
z = (x- - uo)/ (ó/sqrt n )
when performing a hyp test on u when ó is known, Ho can never be rejected if
z >= 0 for a left-tailed test
if p- is the value that a normal random variable assumes, then we can transform it into its standard normal value as
(p- - p)/ square root p(1-p)/n
Chapter 4: Which of the following correlation coefficients indicate the strongest inverse relationship between two variables?
-0.87
When rolling a pair of dice and summing the two values rolled, which of the following are exhaustive events?
-a value of 6 or more and a value of 8 or less -a value of 7 or more and a value of 6 or less -an even number and an odd number. Not: a value of 9 or more and a value of 7 or less
Recognizing a Poisson application
-occurs randomly over time or space -average arrival rate remains constant -arrivals are independent of each other -The random variable X is the number of event within an observed time interval
suppose you are performing a hypothesis test on u and the value of ó is known. At the 5% significance level, the critical value(s) for a two tailed test is (are)
-z0.025 and z 0.025
Harris Interactive for job site table
.2214 .3657 .587 .2071 .2057 .413 .4286 .5714 1.00
If a randomly selected worker slept on the job, what is the probability that he/she is an IT professional?
.3772
prob that a customer orders popcorn is 0.4. The prob that the order a drink is 0.65. The prob that they order popcorn and a drink is 0.3. If they have already ordered a drink, what is the prob that they will order popcorn?
.46
What is the probability that a randomly selected worker slept on the job?
.587
The probability of an employee getting a promotion is 0.20. The probability of an employee having an MBA is 0.30. The probability of an employee getting a promotion given that the employee has an MBA is 0.25. The probability that an employee has an MBA and gets a promotion is
0.075 P(A∩B)= P(A|B) x P(B)= 0.25 x 0.30 = 0.075
Chapter 4: For a sample of numerical data, we are interested in three key characteristics:
center, variability, and shape.
the joint probability table examines whether a residents location, east or west side, affects whether the resident gets season passes to the pool on the west side. Given that a resident has a season pass, what is the prob the resident lives on the east side
0.40
The probability of Margaret receiving a promotion at XYZ corp. is 0.70. The prob of Katia receiving a promotion at ABC inc. is 0.60. If the two promotions are independent, then the probability of both Margaret and Katia receiving a promotion is _______________
0.42 Since the events are independent, 0.70 x 0.60 = 0.42
due to symmetry, the probability that the standard normal random variable Z is less than 0 is equal to
0.5
the probability that a normal random variable X is less than its mean is equal to
0.50
Chapter 4: Which of the following characteristics can be seen on a boxplot?
center, variability, shape
a population has a mean of 50 and a SD of 10. A random sample of 256 is selected. The SD of x- is equal to
0.625
The probability that Anthony is on time for work is 0.90. The probability that Anthony takes the train to work is 0.80. Given that Anthony takes the train to work, the probability that he is on time is 0.95. The probability that Anthony is on time for work and takes the train is
0.76 P(on time ∩ train) = P(on time | train) x P(train) = 0.95 x 0.80 = 0.76
to find the P(Z>0.93) find the row containing ______ in the far left column. Then find the column containing _____ in the top row. P(Z>0.93) = 1 - _________ (round to 4 decimals) = ______ (round to four decimal)...(Please use Appendix C-2.)
0.8, 0.03, 0.8238, 0.1762
Chapter 4: The mean absolute deviation for the sample data set: 3, 4, 5, and 8 is
1.5
an investment strategy has an expected return of 12% and a SD of 10%. If investment returns are normal, the probability of earning a return of less than 2%
16%
The Chartered Financial Analyst -confidence interval?
170,180 to 145,820
a population has a mean of 100 and a SD of 12. A random sample of 36 is selected. The SD of x is equal to
2
Mike is placing a bet on an upcoming horse race in which seven horses are running. Mike places a trifecta bet that wins only if he correctly picks the first, second, and third place horses in order. In how many different ways can Mike select three horses when order matters?
210 n!/(n-x)! = 7!/(7-3)! = 5040/24 = 210
Chapter 6: 20% of a restaurant's customers order the chef's special. 230 customers are anticipated to dine tonight at the restaurant. The expected number of chef's specials that will be ordered tonight is 46.
230x0.2 = 46
If the median price for a home is $200,000, than _______% of homes cost less than $200,000.
50
Chapter 4: For the data set 4, 5 , 6, and 9 the arithmetic mean is
6...... add values together and divide by 4
local restaurant, 20% of customers order chefs special. Binomial random variable X is the number that order the special. If they expect 230 customers, then the SD for this is
6.1
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -What is the interquartile range of this distribution
670 ± 5%
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -25th percentile of the amount spent on a debit card
7,455 +- 2.5%
A data set has a mean of 1500 and a standard deviation of 100. Using Chebyshev's theorem, what percentage of the observations fall between: 1300 and 1700? 1100 and 1900?
75% 94%
a 95% confidence interval for the mean value of a store's customer accounts is computer as $850 +- 70, then the null hypothesis of a two tailed hypothesis test would be rejected if the value of uo is less than $____ or greater than $____
780, 920
The range for the data set: 2, 5, 5, 7, and 10 is _____________.
8
data set, 2,5,5,7,10. range is
8
U.S. consumers are increasingly viewing debit cards as a convenient substitute for cash and checks. -75th percentile of the amount spent on a debit card
8,125 ± 2.5%
Chapter 4: Suppose a data set has 80 data points. A 5% trimmed mean would be calculated by removing the ______________ highest values and the ______________ lowest values.
80 x 5% = 4....... answer is 4
trial, or process, that produces several possible outcomes is referred to as a(n)
experiment
Chapter 4: The empirical rule states that approximately _______% of observations will fall within 3 standard deviations of the mean
95% (68% with 1)
The empirical rule states that approximately ________% of observations will fall within two standard deviations of the mean.
95.44%
for a binomial random variable X, the probably of x successes in n Bernoulli trials is calculated as
= (n!/x!(n-x!)) * p^x(1-p)^(n-x)
If an experiment is selecting a card from a deck of cards, then the sample space is
All the cards in the deck (-not just face cards, red cards, aces, etc.)
Population variance (sigma squared)
Sum of the squared deviations from the mean divided by the population size.
Chapter 4: Symmetric
Tails of histogram are balanced (low/high values offset)
Chapter 3: Stem-and-Leaf Plot
The stem-and-leaf plot is a tool of exploratory data analysis (EDA) that seeks to reveal essential data features in an intuitive way. A stem-and-leaf plot is basically a frequency tally, except that we use digits instead of tally marks
which is most accurate
a parameter is a constant although its value may be unknown
which is true about a sample statistics such as the sample mean or sample proportion
a sample statistic is a random variable
A triangular distribution that has a mode = 0 and has a lower 1imit = - 2.45 and an upper limit 2.45 is similar to
a standard normal
The average number of customers arriving at Jimmy's Burgers in a minute is 1.7. Which expression would one use to calculate the probability that at least 4 customers arrive in a randomly chosen minute? a. 1-P(X</=3) b. P(C</=3) c. 1-P(X</=4) d. P(X>4)
a. 1-P(X</=3)
Which of the following are characteristics of a binomial distribution? a. Each trial is independent of the previous trail b. Each trail has only two possible outcomes c. It is continuous distribution d. For each trail the probability of success remains the same
a. Each trial is independent of the previous trail and b. Each trail has only two possible outcomes and d. For each trail the probability of success remains the same
When calculating a percentile, the first step is to arrange the data set in: a. groups of ten unites b. ascending order (from least to greatest) c. descending order (from greatest to least) d. classes of equal width
b. ascending order (from least to greatest)
Chapter 3: There are several guidelines one should follow when creating graphs. Which of the following describe these guidelines?
axes should be clearly labeled, axes that are numerical should be to the appropriate scale, novelty graphs such as pyramids chart introduce ambiguity
The following relative frequencies help determine if residents' location affects whether the residents get season passes to the pool on the west side. E W Total Pass 0.3 0.45 0.75 None 0.15 0.10 0.25 Total 0.45 0.55 1.0 a. 0.55 b. 0.1 c. 0.15 d. 0.45
b. 0.1
Calculate the joint probability that a country club member plays tennis and golf. Ten. No Total G. 180 0 0 no 0 50 100 Totals 230 400 a. 0.25 b. 0.45 c. 0.125 d. 0.575
b. 0.45
A company receives an average of 0.8 purchase orders per minute. The company wants to determine the probability of receiving 6 purchase orders in 5 minutes. What value should she company use for the mean to help calculate the probability? a. 6 b. 4 c. 5 d. 0.8
b. 4
for a continuous random variable X, the number of possible values
cannot be counted
The following contingency table can help determine if residents' location is independent of whether or not the resident purchases a season pass to the pool. [The pool is located on the West side.] East West Total SPass 35 50 85 NPass 25 10 35 Total 60 60 120 Which comparison of probabilities would you choose to show dependence between a resident's location and whether or not they purchased a season pass? a. P(Season pass and lives West) and P(Lives East) b. P(Season pass/Lives West) P(Season Pass) c. P(Season Pass/Lives East) P(Season Pass) d. P(Season Pass and Lives East) and P(no Season Pass)
b. P(Season pass/Lives West) P(Season Pass) and c. P(Season Pass/Lives East) P(Season Pass)
Which of the following experiments is likely to produce a uniform discrete distribution? a. Test scores on college entrance exam, such as the ACT or SAT b. The answer selected on a multiple choice question that has four choices by a student who did not study c. The values that occur from repeated spins of a roulette wheel at a casino d. The number of patrons arriving every 3 minutes at a sandwich shop.
b. The answer selected on a multiple choice question that has four choices by a student who did not study [the student would be guessing so each answer is equally likely.] and c. The values that occur from repeated spins of a roulette wheel at a casino
In which of the following situations is it appropriate to use a Poisson process? [check all that apply] a. The number of people outof 20 who prefer Pepsi-Cola to Coca-Cola in a blind taste test b. The number of customers who purchase concessions every 5 minutes, while a movie is playing at a theater c. Twenty firms are randomly selected from the S&P 500 and asked whether they will increase hiring over the next year. d. The number of landfills per county in the state of Texas.
b. The number of customers who purchase concessions every 5 minutes, while a movie is playing at a theater and d. The number of landfills per county in the state of Texas.
Which of the following should one look for when identifying a hyper-geometric application? [check all the apply] a. sampling with replacement b. a known number of success, s c. a finite population, N d. probability of success is constant
b. a known number of success, s and c. a finite population, N
Chapter 2: The nominal scale of measurement is used to...
categorize unranked data
Which of the following are examples of conditional probabilities? [check all the apply] a. The probability of Amir purchasing a video game or the probability of Natasha purchasing a video game. b. The probability of Marilyn going to the football game and Tom going to the football game. c. If Neil has already purchased groceries, then the probability of Colleen purchasing groceries. d. The probability of Angel going to the movie, given that Derrick is going to the movie.
c. If Neil has already purchased groceries, then the probability of Colleen purchasing groceries. and d. The probability of Angel going to the movie, given that Derrick is going to the movie.
Which of the following random variables meets the criteria for a hyper-geometric distribution? a. The average number of adults who have a graduate degree is 0.7/household. Let X be the number of adults in a household who have a graduate degree. b. Suppose 30% of the population have a graduate degree. Define X to be the number of adults in a sample of 20 who have earned a graduate degree. c. Out of 50 adults, 10 who have a graduate degree. A sample of 20 is taken. Define X to be the number of adults in the sample with a graduate degree.
c. Out of 50 adults, 10 who have a graduate degree. A sample of 20 is taken. Define X to be the number of adults in the sample with a graduate degree.
Which of the following event are mutually exclusive? a. Being of German decent and being of Mexican decent. b. Passing a statistics test and passing an English test c. Rolling an odd number and an even number on the same roll of the die. d. Being on time and being late for an appointment.
c. Rolling an odd number and an even number on the same roll of the die. and d. Being on time and being late for an appointment.
Choose the logical binomial random variable a. The number of late shipments for two different companies b. The number of late shipments and missing shipments out of the next 12 sent c. The number of late shipments out of the next 12 sent by one company d. The number of late shipments over the next month
c. The number of late shipments out of the next 12 sent by one company
A travel web site wants to provided information comparing hotel costs versus the quality ranking of the hotel for hotels in New York City. One way to summarize this data would be a. a histogram b. a frequency distribution c. a contingency table
c. a contingency table
Which of the following should one look for when identifying a hyper-geometric application? [check all the apply] a. probability of success is constant b. sampling with replacement c. a known number of success s d. a finite population N
c. a known number of success s and d. a finite population N
When comparing two data sets with different unites of measurement, what is the relative measure of dispersion? a. the standard deviation b. the range c. the coefficient of variation
c. the coefficient of variation
which is not a step we use when formulating the null and alternative hypotheses
calculate the value of the sample statistic
A festival has become so popular that it must limit the number of tickets it issues. People who hope to attend the festival send in a request for tickets, and requests are filled by random selection. Only 21% of the ticket requests are fulfilled. what are the odds of not receiving a ticket for a random applicant? a. 3.34 to 1 b. 3.88 to 1 c. 2.98 to 1 d. 3.76 to 1
d. 3.76 to 1 1-0.21/0.21 = 3.76
for a continuous random variable, one characteristic of its probability density function f(x) is that
f(x) >_ 0 for all values x of X
Which of the following is not an example of an experiment? a. Picking a team that will wine the World Series b.Buying a computer base on repair history. c. Selecting a care from a deck of cards. d. Pick the team that won last year's World Series.
d. Pick the team that won last year's World Series.
Using the multiplication rule, the joint probability of even A and event B is computed by multiplying the conditional probability of event A given event B by the probability of: a. event B given event A b. event A c. the union of A and B d. event B
d. event B
The correlation coefficient describes the degree of ______ between two ________ variables. a. nonlinearity; qualitative b. nolinearity; quantitative c. linearity; qualitative d. linearity; quantitative
d. linearity; quantitative
Chapter 5: Playing tennis and playing golf are ____________ (choose between dependent and independent)
dependent
Chapter 4: The sample correlation coefficient
describes the degree of linearity between paired observations on two quantitative variables X and Y
Chapter 1: Collecting, organizing and summarizing a particular data set are known as __________ statistics.
descriptive
Chapter 3: Stem-and-leaf displays can be used to
determine central tendency and dispersion, analyze the small samples of integer data
Chapter 4: Standardized data
(called a z-score) by transforming each value of the observed data
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -Calculate the critical value at α = 0.01.
-2.33 ± 0.01
Chapter 5: calculate the marginal probability that a country club member plays tennis and golf
.45
Chapter 5: Calculate the conditional probability that a country club member does not play tennis given that they do not play golf
.5
7777 A normal random variable X has a mean = 100 and a standard deviation n = 20 P(X <= 110) = Round your answer to 4 decimals.
.6915
Chapter 5: Corey and Laurie process purchase orders for Acme Inc. Laurie processes 60% of the purchase orders, Cory process is 40% of the purchase orders. 5% of lorries purchase orders have errors and 2.5% of Coreys purchase orders have errors. A purchase order has an error; the probability that Laurie processed the erroneous purchase order is... Round your answer to two decimal places
.75
Chapter 5: Calculate the conditional probability that a country club member plays golf given that they play tennis
.783
A list of the top twenty restaurants in Chicago was released. Four of the restaurants specialize in seafood. If five of the restaurants are selected randomly from the list, the standard deviation for the number of restaurants specializing in seafood is 7947(round your answer to 4 decimal places).
.7947
Chapter 6: A company receives an average of .64 purchase orders per minute. Assuming a Poisson distribution for the number of purchase orders per minute, what is the standard deviation for this distribution?
.8
A company received an average of .64 purchase orders per minute. Assuming a Poisson distribution for the number of purchase order per minute, what is the standard deviation for this distribution? a. 1.28 b. 2 c. 0.8 d. 0.41
.8 [The standard deviation is the square of the mean in a Poisson distribution.]
in a particular industry, it is known that 82% of companies ship their products by truck and 47% of companies ship their product by rail. 40% of companies ship by truck and rail. The probability that a company ships by truck or rail is
.89
Loans that are 60 days or more past due are considered seriously delinquent. The Mortgage Bankers Association reported that the rate of seriously delinquent loans has an average of 9.1% (The Wall Street Journal, August 26, 2010). Let the rate of seriously delinquent loans follow a normal distribution with a standard deviation of 0.80% Above 8%? Bet 9.5% and 10.5%
.9162 +- 0.005 0.2684 +- 0.005
for a discrete probability distribution, the probability of each value x is
0 <_ P(X=x)<_1
Chapter 4: In which of the following data sets would the arithmetic mean NOT be a good measure of central location
0, 8, 8, 9, 10.... (0 is considered an outlier for this set of data)
Chapter 5: In order to make sure Sam gets to his final exam on time he said three alarm clocks that work independently of each other. Assume the probability of any one of the alarm clocks working is equal to .98. What is the probability that Sam is late to his final exam?
0.000008 P(late) = P(all three alarms fail) = .02x.02x.02
Center for Studying Health System Change At least seven will delay or go without medical care
0.0001 ± 0.001
The manufacturer of liquid laundry detergent has a 0.02 probability that the detergent bottles will be improperly filled. There is a 0.03 probability that the label on the bottle will not be affixed properly. If the events of bottle fill and affixing the label are independent, then the probability of a bottle being filled improperly and having an improperly affixed label is
0.0006 Since the events are independent, we calculate 0.02 x 0.03 = 0.006
XYZ Corp. has filled 100,000 purchase order during its existence. 1,100 of the purchase orders have had errors. Using empirical probability, the probability of the next purchase order having an error is __________ (round your answer to three decimal place and enter as a probability not a percentage).
0.011 1100/100,000 = 0.011
If 4 passenger cars are randomly selected, what is the probability that all of the passenger cars get more than 35 mpg?
0.0138 ± 0.01
the joint probability table examines whether a residents location, east or west side, affects whether the resident gets season passes to the pool on the west side. Prob that a resident lives on the west side and does not have a pass
0.10
new dining plan. 480 residents. 234: B 85: C 13: D 4: F proportion of grades designated as A were
0.30
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -probability that the sample proportion is less than 0.80?
0.3015 ± 0.01
Chapter 6: Professor Stats has 40 students in her statistics class. 24 of the students are male. If she randomly selects 6 of her students at random, without replacement, the probability of selecting 4 men in the sample is
0.332
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -probability that the sample proportion is within ± 0.02 of the population proportion
0.3970 ± 0.01
Let P(A)=0.30 and P(B)=0.40. Suppose A and B are independent events. Calculate P(B | A)
0.40 Since A and B are independent events, P(B)=P(B|A)
area under a normal curve below its expected value is
0.5
The probability of a customer purchasing popcorn at the movie theater is 0.3. What is the probability that a customer does not purchase popcorn?
0.7
Chapter 5: The probability of Anthony being on time for work is 0.9. The probability that Anthony will take the train to work is 0.8. The probability that Anthony will be on time for work if he took the train is 0.95. The probability that Anthony is on time for work and took the train is... round 2 decimal places
0.76 0.95 x 0.8 = 0.76
Chapter 3: Place the following steps in order to explain how to construct a polygon.
1. Construct a frequency distribution 2. Find the midpoint for each class of the frequency distribution 3. The midpoints are plotted based on the frequency for the respective class 4. Neighboring midpoints are connected together by a straight line
Match the following terms with their meaning: 1. Mesokuric 2. Platykurtic 3. Leptokurtic ___A flatter distribution than normal with heavier tails __ Normal bell-shaped distribution -- A sharply peaked distribution with thinner tails
1. Mesokutic is normal bell-shaped distribution 2. Platykurtic is a flatter distribution than normal with heavier tails 3. Leptokirtic is sharply peaked distribution with thinner tails
Chapter 6: The average number of customers arriving at Jimmy's Burgers is 17 per minute . What is the probability only 1 customer arrives in the next minute?(Round to three decimals and do not enter a percentage)
1.7^1e^-1.7 / 1! = .311
Chapter 6: The average number of customers arriving at Jimmy's Burgers is 17 per minute. What is the probability that only 1 customer arrives in the next minute? (Round to three decimals and do not enter a percentage.)
1.7^1e^-1.7 / 1! =.311
Chapter 4: For a given distribution, the range is 60. Assuming the distribution is bell-shaped, the estimated standard deviation =
10 (since the range is 60, xmax - xmin, you divide that number, which is 60, by 6)
exam to 50 students. High was 98, low was 48. Frequency distribution is divided into 5 classes. Class width for the data is
10 points
a population has a mean of 100 and a SD of 10. A random sample of 25 is selected. The expected value of x- is equal to
100
XYZ corporation makes widgets. 1% of the widgets are defective. XYZ manufactures 100,000 widgets, the number of defective widgets is expected to be
1000 100,000 x 0.01
Chapter 4: A data set has 60 data points sorted from lowest to highest value. The 20th percentile value will be the ___________th data point, starting from the lowest value.
12
A random variable X has μ = 25 and σ = 5. A new random variable Y = 5X. The mean of Y is ____________.
125 5(25) = 125
Chapter 5: Mike is placing a bet on an upcoming horse race in which seven horses are running. Mike places a trifecta bet that wins only if he correctly picks the first, second, and third place horses in order. There are how many possible outcomes for the first three horses in the correct order.?
210
Mike is placing a bet on an upcoming horse race in which 7 horses are running. Mike places a trifecta bet that wins only if he correctly picks the first, second, and third place horses in order. There are _________ possible outcomes for the first three horses in the correct order.
210 [permutation] nPr= n! ------ (n-r)!
A company sold 1000 units in its first year of operation, 1400 units in its second year of operation, and 1680 units in the third year of operation. The average growth rate of the company's sale for years one to three is ______%. (round your final answer to a decimal answer with four places and then convert to % with 2 decimals)
29.61% square root of (1680/1000) - 1 = 0.2961
In general, a data point is considered an outlier if it falls more than ________ standard deviations away from the average.
3
A festival has become so popular that is must limit the number of tickets it issues. People who hope to attend the festival send in a request for tickets, and requests are filled by random selection. Only 21% of the ticket requests are fulfilled. The odds that a random applicant does not receive a ticket are
3.76 to 1 - The odds against A occurring equal 1-P(A)/P(A)
Quartile divide the data into _______ (choose a number) of equal parts
4
Suppose 20% of a business's employees commute by bus. How many employees will have to be sampled in order to find the first employee who commutes by bus?
5 1/.2 = 5
Chapter 6: Suppose 20% of a business's employees commute by bus. How many employees will have to be sampled in order to find the first employee who commutes by bus?
5 1/.2=5
a pop has a mean of 50 and a SD of 10. A random sample of 144 is selected. The expected value of x is equal to
50
Chapter 5: Dimitri is the coach of the high school mathleets team. There are 8 mathleets but only five may represent the school of the upcoming math tournament. How many ways can dimitri randomly choose 5 mathleets from the 8 eligible math leets?
56
For the data set 4,5,6, and 9 the arithmetic mean is
6
The median for the data set 10, 6, 4, 9, 5 is
6
If the mean time between arrivals 10 minutes then the average arrival rate is
6 hour
Chapter 6: 20% of restaurants customers order chefs special. 230 customers are anticipated to at the restaurant tonight. The standard deviation for this binomial distribution is your answer to 3 decimal places
6.066
Recent home sales in a suburb of Washington, D.C., are shown in the accompanying ogive. Approximate the percentage of houses that sold for more than $500,000.
60%
which would the arithmetic mean NOT be a good measure of central location
7,8,8,9,25
Chapter 5: Graduates with the top three highest GPAs will be honored as the school's graduation ceremony. The highest GPA will receive the Gold Award, the second highest will received the Silver Award, and the third highest GPA will receive the Bronze Award. If there are 10 graduating seniors, how many different arrangements of honored students are possible?
720 10!(10-3)!
Chapter 4: Using Chebyshev's theorem at least ________________ % of observations should fall within 2.5 standard deviations of the mean.
84
Chapter 5: The following contingency table was created by a marketing firm to show which customers were considered younger and which were considered older as well as showing if they were monthly, weekly shoppers
90
Chapter 5: The following contingency table was created by a middle school to determine the relationship between a student taking a foreign language (either Spanish or French) and whether the student plays in the school band
90
contingency table was created by a middle school to determine the relationship between a student taking a foreign language and whether the student plays in the school band. The number of students that do not play in the band is
90
XYZ. 1% of 100,000 widgets are defective. Variance is
990
Chapter 4: If Fund A has a coefficient of variation of 1.1, and Fund B has a coefficient of variation of 0.9, Fund ______ has a greater
A
Chapter 5: If 180 of the 400 country club members play tennis and golf, 180 would be placed in the contingency table in the cell with the letter
A
Chapter 6: Which of the following are characteristics of the geometric distribution ?
A Series of Bernoulli trials, counts the number of trials until the first success
which are examples of a binomial experiment
Ask customers at a movie if they spent 20$ or more on concessions Asking randomly selected people whether they are a FB member
Chapter 1: Which statistical pitfall does the following statement match? People who belong to health clubs tend to have college degrees therefore exercise increases your IQ
Assuming A Causal Link
Subjective
Based on informed opinion or judgement
Which of the following BEST represents an empirical probability?
Based on past data, a manager believes there is a 70% chance of retaining an employee for at least one year. (not: the probability of tossing a head on a coin is 0.5) - this is an a priori probability
which are mutually exclusive
Being on time and being late for an appointment Receiving an A and receiving a B as a final grade
Chapter 5: A country club has 50 members out of a total of 400 members who play neither tennis nor golf. The value 50 should be inserted into the contingency table in the cell with the letter
C. This cell includes non-tennis players, but it includes golfers and non-golfers.
Chapter 5: for any given event, the sum of probability of that event and the probability of its _________ Must equal one
Complement
A _________ probability is the probability of an event given that another event has already occurred.
Conditional
Chapter 2: If each observation represents a different individual unit (like a person, firm, geographic area) at the same point in time, we have
Cross sectional data
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. -What is the conclusion?
Do not reject H0 since the p-value is greater than α
A machine that is programmed to package 1.20 pounds of cereal is being tested for its accuracy. In a sample of 36 cereal boxes, the mean and standard deviation are calculated as 1.22 pounds and 0.06 pound, respectively -What is the conclusion at the 5% significance level?
Do not reject H0 since the p-value is greater than α.
Chapter 5: True or false: The probability of winning a lottery is .0000000012. The Law of Large Numbers says that because this probability is so small, no one should ever win a lottery.
False
Chapter 5: The people who are involved in which of the following areas talk more commonly about odds rather than speaking of probability
Games of chance, sports
A researcher wants to determine if Americans are sleeping less than the recommended 7 hours of sleep on weekdays. He takes a random sample of 150 Americans and computes the average sleep time of 6.7 hours on weekdays. Assume that the population is normally distributed with a known standard deviation of 2.1 hours -Select the relevant null and the alternative hypotheses
H0: μ ≥ 7; HA: μ < 7
a quality control officer believes that the average time of use for AAA batteries differs from the claimed 8.5 hours. The QC take a random sample of 30 AAA batteries and finds that the sample mean is 8.7 hours. State the null and alt hypothesis for testing the claim
Ho: u = 8.5 Ha: u not equal 8.5
specify the competing hypotheses that would be used to determine whether the population mean is less than 150
Ho: u >= 150 vs Ha: u <150
an auditor for a small company suspects that the mean customer account balances have fallen below $550 per month, the bag amount for all customer accounts over the past 5 years. She takes a random sample of 40 accounts and computes the sample mean as $543. State the hypothesis for testing the auditors claim
Ho: u >= 550 and Ha: u < 550
specify the competing hypotheses that would be used in order to determine whether the population mean differs from 15
Ho: u=15, versus Ha: u =/ 15
Special law of addition
If A and B are exclusive events then the general addition law can be simplified to the sum of the individual probabilities for A and B
Chapter 3: Excel's pivot table features
It allows interactive analysis, summarize categorical data, categorizes discrete numerical data
Classical
Known a priori by the nature of the experiement
Chapter 4: Skewed left (negative skewness)
Long tail of histogram points left (a few low values but most data on right)
Chapter 4: Skewed right (positive skewness)
Long tail of histogram points right (most data on left but a few high values)
Recognizing hyper-geometric application
Look for a finite population (N) containing a known number of successes (s) and sampling without replacement (n items in the sample) where the probability of success is not constant for each sample item drawn.
which about the MAD is most accurate
MAD is denominated in the same units as the original data
which scenarios use the nominal scale
Noting the racial composition of an undergraduate classroom Designating males as 1 and females as 2
if X has a normal distribution with u=100 and ó=5, then the prob P(90<X<95) can be expressed in terms of the standard normal random variable Z as
P(-2<Z<-1)
Chapter 5: If events A and B are independent, then
P(A∩B)=P(A)P(B)
Chapter 5: If A and B are mutually exclusive events, then P(A ∩ B) = 0 and the general addition law can be simplified to the sum of the individual probabilities for A and B, the special law of addition.
P(A∪B)=P(A)+P(B)(addition law for mutually exclusive events) For example, if we look at a person's age, then P(under 21) = .28 and P(over 65) = .12, so P(under 21 or over 65) = .28 + .12 = .40 because these events do not overlap.
Chapter 5: Conditional
P(S3 | T1) = 1/16 = .0625
it is known that the length of a certain product X is normally distributed with u=20 inches. How is the probability P(X>16) related to the probability P(X<16)
P(X>16) is greater than P(X<16)
due to symmetry, the probability that the normal random variable Z is greater than 1.5 is equal to
P(Z<-1.5)
7777 Which probability statement below represents a cumulative probability
P(Z<_-2.34)
Assume the sample space S={win,loss}. Select which fulfills the requirements of the definition of probability.
P({win})=0.8, P({loss})=0.2 (equal to 1)
The ________ formula is used to determine the number of ways to arrange (x) objects from a group of (n) objects where the order of the objects matters.
Permutation
A _________ of an experiment contains all possible outcomes of the experiment
Sample space
Chapter 5: A probability is signed by a person that is based on that person's judgment or experiences is a _______ Probability
Subjective
A probability based on personal judgment rather than on observation or logical analysis is best referred to as a
Subjective probability (review correlated, empirical, and a priori, prob)
Coefficient of variation
The standard deviation expressed as a percent of the mean.
7777 Which of the following statements about the variance of a continuous random variable are true?
The standard deviation is the square root of the variance, the variance is the weighted average of the squared deviations from the mean
Mean
The statistical measure of center. Sum of the data values divided by the number of data items. =AVERAGE(data)
True False: When constructing a joint probability table, the cell in the lower right corner must always equal 1.0
True (the lower right cell represents all outcomes in the sample space, which is the probability 1.)
bar chart, which is most accurate
a bar chart is a useful graphical tool for qualitative data
Binomial distribution
a binomial random variable X is the sum of n independent Bernoulli random variables
Tree diagrams
a diagram used to display events and probabilities to help visualize all possible outcomes.
Chapter 6: Which of the following should one look for when identifying a hypergeometric application?
a known number of success, s a finite population, N
Consider rolling two dice. Which of the following describe two events that are collectively exhaustive? a. Event 1 A value of 7 or more. Event 2: A value of 6 or less b. Event 1: A value of 9 or more. Event 2: A value of 7 or less c. Event 1: A value of 6 or more. Event 2: A value of 8 or less d. Event 1: rolling an even number. Event 2: Rolling an odd number
a. Event 1: A value of 7 or more. Event 2: A value of 6 or less [Their union included all possible outcomes of rolling two dice] and c. Event 1: A value of 6 or more. Event 2: A value of 8 or less [The union includes all possible outcomes of rolling two dice] and d. Event 1: rolling an even number. Event 2: Rolling an odd number [Their union includes all possible outcomes of rolling two dice]
Which expression(s) below would be correct for calculating the probability that no more than 2 of 4 patients has health insurance? [check that apply] a. P(X </= 2) b. P(X < 2) c. P(X = 0)+P(X = 1)+P(X = 2) d. P(X = 2)+P(X = 3)+P(X = 4)
a. P(X </= 2) and c. P(X = 0)+P(X = 1)+P(X = 2)
In which of the following ways are binomial distributions and hyper-geometric distributions similar? [check all that apply] a. They both have two possible outcomes: success or failure b. They are both discrete distributions c. They both assume each trial is independent of each other. d. The probability of success is constant in each trail.
a. They both have two possible outcomes: success or failure and b. They are both discrete distributions
The binomial distribution can be approximated using the Poisson [check all that apply] a. When n is large and π is less than or equal to .05 b. if n < 10 and π > .5 c. by letting λ = nπ d. when n is small and π is large
a. When n is large and π is less than or equal to .05 and c. by letting λ = nπ
Generally, a negative covariance between market demands indicated [check all the apply] a. a centralized distribution will decrease the aggregate demand variance b. a centralized distribution center can reduce the need for safety stock c. a centralized distribution center will decrease the aggregate average demand d. a centralized distribution center will increase the need for safety stock.
a. a centralized distribution will decrease the aggregate demand variance and b. a centralized distribution center can reduce the need for safety stock
Which of the following are characteristics of the geometric distribution? [check all that apply] a. counts the number of trial until the first success b. the mean is 2/π c. the probability of success is not constant d. a series of Bernoulli trials
a. counts the number of trial until the first success and d. a series of Bernoulli trials
A binomial distribution is skewed left when the probability of success is a. greater than .5 b. less than .5 c. equal to .5
a. greater than .5
Discrete random variables: [check all that apply] a. have a set of distinct values b. are countable c. can have a finite or infinite number of values d. must have a clear upper limit
a. have a set of distinct values and b. are countable and c. can have a finite or infinite number of values
The probability of State College winning a football game is 0.6. The probability of University of State winning a football game is 0.65. Given that State College has won its football game, the probability of University of State winning its game is over 0.65. The teams are not playing each other. The event State College winning and University of State winning are: a. independent b. mutually exclusive c. dependent
a. independent
The probability that a customer will purchase a product is 0.15. The probability that a customer is male is 0.5. The probability that a customer is a male and will purchase a product is 0.075. The events purchasing a product and being male are. a. independent b. dependent c. mutually exclusive
a. independent
The interquartile range of a data set: (check all that apply) a. is calculated by subtracting the first quartile from the third quartile b. represents the middle 75% of the data c. is the range between the first and fourth quartiles d. represents the middle 50% of the data
a. is calculated by subtracting the first quartile from the third quartile and d. represents the middle 50% of the data
The symbol π in the binomial PDF a. is the probability of success b. is in the interval [0,1] c. is greater than 1 d. can be either negative or positive values
a. is the probability of success and b. is in the interval [0,1]
Which of the following criteria indicate it would be acceptable to approximate the hyper-geometric with a binomial? a. n/N < .05 b. n/N > .05 c. n/N < .10
a. n/N < .05
When estimating sigma using the following formula: Xmax-Xmin/6, one is assuming the distribution is: a. normal b. highly skewed c. Bimodal d. No assumption on the distribution
a. normal
The box plot is constructed using several different values. Which of the following values from a data set are included in a box plot? a. the largest value b. the mean c. the mode d. the first quartile e. the fifth quartile
a. the largest value and d. the first quartile
A box plot is constructed using several different values. Which of the following values are included in a box plot? (select all that apply) a. the smallest value b. the standard deviation c. the 90th percentile d. the second quartile e. the third quartile
a. the smallest value d. the second quartile e. the third quartile
which of the following is true
a=the prob of committing a type 1 error. B=the prob of committing a type 2 error
Chapter 3: Cumulative frequency distributions show...
accumulated counts up to and including the current bin as the bin limits increase
Chapter 1: Which of the following are responsibilities of a data analyst?
accurately reporting information, identifying degrees of uncertainty
The intersection of events A and B, denoted A∩B, contains
all outcomes that are in A and B
Chapter 4: To calculate the arithmetic mean
all the data points must be added together, then divided by the number of data points.
if the population from which the sample is drawn is normally distributed, then the sampling distribution of the sample mean is
always normally distributed
weakness of the ordinal scale
an inability to measure differences between the ranked values
Chapter 4: The owner of BevaMart wants to study the relationship between the temperature and hot chocolate sales. The owner computed the covariance between temperature and hot chocolate sale to be -81.46. Based on the covariance, which option best describes the linear relationship between temperature and hot chocolate?
as the temperature increases, hot chocolate sales decrease
Chapter 4: When calculating a percentile, the first step is to arrange the data set in
ascending order (from least to greatest)
discrete probability distributions
assigns probability to each value of a discrete random variable X.
Chapter 1: A sample of errors from invoice statements
auditing
Consider rolling two dice. Which of the following describe two events that are collectively exhaustive? [select all that apply] a. Event 1: A value of 9 or more. Event 2: A value of 7 or less. b. Event 1: rolling an even number. Event 2: Rolling an odd number c. Event 1: A value of 7 or more. Event 2: A value of 6 or less. d. Event 1: A value of 6 or more. Event 2: A value of 8 or less
b. Event 1: rolling an even number. Event 2: Rolling an odd number and c. Event 1: A value of 7 or more. Event 2: A value of 6 or less. and d. Event 1: A value of 6 or more. Event 2: A value of 8 or less
The following contingency table can help determine if residents' location is independent of whether or not the resident purchases a season pass to the pool. The pool is located on the West side. East West Total Pass 35 50 85 None 25 10 35 60 60 120 a. P(pass and East) and P(none) b. P(Pass and East)P(pass) c. P(pass and West)P(pass) d. P(pass and West) and P(East)
b. P(Pass and East)P(pass) and c. P(pass and West)P(pass)
The first step to determine the median is to a. fine the average of the data set b. place the data in numerical order c. fine the range of the data set
b. place the data in numerical order
a left tailed test of the population mean is conducted at a=0.10. The calculated test statistic is z= -1.55 and P(Z< -1.55)= 0.0606
be rejected since the p-value=0.0606<0.10
the graph depicting the normal probability density function is
bell shaped
if a sample statistic consistently over or under estimates a population parameter, then there is ____
bias
A ____ random variable is the sum of repeated Bernoulli trials
binomial
which statement is not correct concerning the p-value and critical value approaches to hyp testing
both approaches use the same decision rule concerning when to reject Ho
Chapter 4: A useful tool of exploratory data analysis (EDA) is the ______ ________ (also called a box-and-whisker plot) based on the five-number summary:
box plot
Chapter 6: The binomial distribution can be approximated using the Poisson
by letting ʎ = nπ when n is large and π is less than or equal to .05
Chapter 3: One of the primary goals of constructing a frequency distribution of quantitative data is to summarize the data...
by showing frequency of values that lie within a class or bin
if 230 out of 400 country club members play tennis, the value 230 should be interred in the contingency table in the cell with the letter
c
The addition rule is used to calculate a. the conditional probability of two events b. The intersection of two events c. the union of of two events d. the independence of two events
c. the union of of two events
Chapter 4: Place in order, from beginning to end, the steps to calculate the mean absolute deviation
calculate the arithmetic mean for the data set, find the absolute difference between each value and the mean, sum the absolute differences, divide by the sample (or the population) size
Redundancy
can increase system reliability even when individual component reliability is low
a theorem that allows us to use the normal probability distribution to approximate the sampling distribution of the sample mean whenever the same size is large is known as the
central limit theorem
which can be used to determine the proportion of data points that fall within a specified number of standard deviations from the mean
chebyshev's theorem
Chapter 5: A probability that can be deducted through logical reasoning before an experiment is performed is what type of probability?
classical
the central limit theorem states that, for any distribution, as n gets larger, the sampling distribution of the sample mean becomes
closer to a normal distribution
Chapter 4: When comparing two data sets with different units of measurement, what is the relative measure of dispersion?
coefficient of variation (CV)
Chapter 5: events are considered ___________ ___________ if the union of these events is The entire sample space
collectively exhaustive
The ________ formula is used to determine the number of different ways to arrange a group of (x) objects from a total of (n) objects and the order of the objects is irrelevant
combination
The ________ formula is used to determine the number of different ways to arrange a group of x objects from a total of n objects and the other order of the objects is irrelevant.
combination
Chapter 4: The skewness coefficient can be used to
compare two samples with different measurement units, compare one sample to a known reference distribution
relative frequency distributions are generally more useful than frequency distributions when
comparing data sets of the same size
For any given event, the sum of the probability of that event and the probability of its ____________ must equal one.
complement
the inverse transformation, x = u +zó is used to
compute x values for given probabilities
A tree diagram has ___________ probabilities at the terminal end of each branch.
conditional
Chapter 5: The tree diagram has how many probabilities at the terminal end of each branch
conditional
a ____ probability is the prob of an event given that another event has already occurred
conditional
An economist reports that 506 out of a sample of 1,200 middle-income American households actively participate in the stock market. -Construct a 90% confidence interval for the proportion of middle-income Americans who actively participate in the stock market
confidence interval 0.397 ± .01 to 0.446 ± .01
Chapter 2: Identify which of the sampling techniques listed are non-random.
convenience focus group (and judgement)
Chapter 3: Because the intent of the analysis is to study the S&P 500 companies at a point in time, these are ________________ data.
cross-sectional
data collected about many subjects at the same point in time or without regards to differences in time is known as ___ data
cross-sectional
for a continuous random variable X, the function used to find the area under f(x) up to any value x is called the
cumulative distribution function
when a researcher examines quantitative data and wants to know the number of observations that fall below the upper limit of a particular class, the researcher is best served by creating a ____
cumulative frequency distribution
Compound events
cumulative probabilities can be evaluated by summing individual X probabilities.
"the number of ppl in a household." This variable is best categorized as a
discrete variable
Chapter 3: Characteristics of a bar chart include...
display horizontal bars when the axis labels are long or there are many categories, length or height of bar reflects frequency of a category
Chapter 3: A line chart can be used to...
display time series data, spot trends
Chapter 3: A pie chart is never used to
display time series data.
Chapter 6: Which of the following are characteristics of a discrete uniform distribution?
distribution is symmetric, the probability of each value of the random variable is the same, the random variable has a finite number of outcomes
suppose the competing hypotheses for a test are Ho: u <= 10 vs Ha: u >10. If the value of the test statistic is 1.90 and the CV at the 1% sig level is z 0.01 = 2.23, then the correct conclusion is
do not reject Ho and conclude that the pop mean does not appear to be greater than 10 at the 1% sig level
suppose the competing hypothesis for a test are Ho: u=100 vs Ha: u =/ 100. If the p-val for the hyp test is 0.07 and the chosen level of sig is 0.05, then the correct conclusion is
do not reject Ho and conclude that the population mean does not differ from 100 and the 5% sig level
5!=? a. 25 b. 24 c. 100 d. 5 e. 120
e. 120 A factorial is the product of all number from 1 to n. Therefore 5! = 5*4*3*2*1 = 120
For each trial the probability of success remains the same
each trial has only two possible outcomes, each trial is independent of the previous trial, for each trial the probability of success remains the same
relative frequency of an event is used to calculate what type of probability
empirical probability
a particular value of an estimator is called an
estimate
Emperical
estimated from observed outcome frequnecy
Actuarial science
estimating empirical probabilities
when a sample statistic is used to make inferences about a population parameter, it is referred to as an
estimator
Chapter 1: Surveys of corporate recruiters show that ______ and _________________ rank high on their list of hiring criteria.
ethics, personal integrity
a subset of the sample space is a/an
event
Independent events
event A is independent of event B if and only if P(A/B)=P(A)
Chapter 5: Using the multiplication rule, the joint probability of event A and event B is computed by multiplying the conditional probability of event A given event B by the probability of
event B
Using the multiplication rule, the probability that event A and event B both occur is computed by multiplying the conditional probability of event A given event B by the probability of
event B
Chapter 5: P(A)=1
event is certain to occur
Complement (A')
everything in the sample space S except event A
Percentile
ex. 83rd percentile means 83% of the test takers are below you
Events that cannot occur at the same time are mutually ________ events.
exclusive
events that include all outcomes in the sample space are known as ____ events
exhaustive
Chapter 4: The correlation coefficient values
fall between -1 and +1, inclusive
77777 True or false continuous rondom can have a finite set of Integer values,
false
Chapter 4: True or False: The geometric mean does not mitigate the effect of outliers
false
Chapter 5: True or false: The General Law of Multiplication is used to calculate the probability of the union of two events
false
T/F. the expected value and the variance of the standard normal random variable Z are both zero
false
T/F; a discrete random variable can assume an uncountable number of values
false
True or false: The geometric mean does not mitigate the effect of outliers.
false
Chapter 1: Risk assessment of an investment
finance
Chapter 3: A log scale is useful for
financial data that is expected to grow rapidly
Chapter 4: Standard deviations can be compared
for data sets with the same measurement units and similar magnitude, for data sets with the same measurement units
all are conditions of the binomial experiment (bernoulli process) except
for each trial, the probability of success equals the probability of failure
Method of medians
for small data sets, you can fine quartiles this way. 1. sort observations 2. find the median Q 3. find the median of the data values that lie below Q 4. find the median of the data values that lie above Q2
to summarize qualitative data, a useful tool is a
frequency distribution
in descriptive stats, a polygon is a
graph that plots the midpoints of each class of a frequency distribution
there are several guidelines to follow when constructing graphs that summarizes statistical data. Which statement is LEAST accurate
graphs should have a lot of adornments
Chapter 6: A binomial distribution is skewed left when the probability of success is
greater than .5
chebyshevs theorem provides the proportion of observations that lie within k SDs of the mean. the value k must be
greater than 1
discrete random variable
has a countable number of distinct values. Some random variable have a clear upper limit and others do not
Chapter 2: Categorical data (also called qualitative data)
have values that are described by words rather than numbers. For example... Diners at a restaurant are asked to rate the food based on the following scale, excellent, good, average, below average, poor.
which of the following graphical depictions is used for observing the spread of the data for a single variable
histogram
which graphical depictions are useful to observe the shape of a data set for a single variable
histogram stem and leaf polygon
Unusual (z-score classification)
if absolute value of z1>2 (beyond u +/- 2o)
Collectively exhaustive
if their union is the entire sample space S (all the events that could possibly occur)
stratified sampling is preferred to cluster sampling when the objective is to
increase precision
Chapter 5: The probability of State College winning a football game is.6. The probability of University of State winning a football game is 65. Given that State College has won its football game, the probability of University of State winning its game is .65The teams are not playing each other The events State College winning and University of State winning are
independent
the probability that a customer will purchase a product is 0.15. The probability that a customer is a male is 0.5. The prob that a customer is a male and will purchase a product is 0.075. The events purchasing a product and being a male are
independent
branch of stats that uses statistics to estimate a population parameter or test a hypothesis about such a parameter is best referred to as
inferential statistics
for a continuous random variable X, how many distinct values can it assume over an interval
infinite
Chapter 2: Variable
is a characteristic of the subject or individual, such as an employee's income or an invoice amount.
Chapter 3: histogram
is a graphical representation of a frequency distribution... appearance in identical
Chapter 3: Ogive
is a line graph of the cumulative frequencies. It is useful for finding percentiles or in comparing the shape of the sample with a known benchmark such as the normal distribution (that you will be seeing in the next chapter).
Chapter 3: Frequency polygon
is a line graph that connects the midpoints of the histogram bin intervals, plus extra intervals at the beginning and end so that the line will touch the X-axis.
Chapter 1: Statistics
is a set of tools which helps organizing, presenting information, and extracting meaning from raw data.
Chapter 4: Weighted means
is a sum that assigns each data value a weight wj that represents a fraction of the total (i.e., the k weights must sum to 1)
Chapter 3: Frequency distribution
is a table formed by classifying n data values into k classes called bins (we adopt this terminology from Excel). The bin limits define the values to be included in each bin.
Chapter 3: Dot Plot
is another simple graphical display of n individual values of numerical data. The basic steps in making a dot plot are to (1) make a scale that covers the data range, (2) mark axis demarcations and label them, and (3) plot each data value as a dot above the scale at its approximate location.
Tthe triangular distribution
is bounded by the interval [a,c], is useful for what if analysis
77777 A random variable is said to be continuous if it
is measured over an interval, can have decimal values
Chapter 4: Range
is the difference between the largest and smallest observations
Chapter 4: Midrange
is the point halfway between the lowest and highest values of X. It is easy to calculate but is not a robust measure of central tendency because it is sensitive to extreme data values.
Variance Var(X) of a discrete random diviation
is the sum of the squared deviation about its expected value, weighted by the probability of each X-value.
Weighted mean
is the sum that assigned each data value a weight w1 that represents a fraction of the total (i.e. the k weights must sum to 1)
Chapter 3: Line chart
is used to display a time series, to spot trends, or to compare time periods. Line charts can be used to display several variables at once. If two variables are displayed, the right and left scales can differ, using the right scale for one variable and the left scale for the other.
which best describes a frequency distribution for qualitative data
it groups data into categories, and records the number of observations in each category
The normal distribution is the most extensively used distribution in statistical studies because
it has important features used in sampling and estimation, economic and financial data often displayed bell shaped distribution's, many physical measurements have a bell shaped distribution
Chapter 5: The multiplication rule is used to calculate what type of probability
joint
the further a population deviates from p=0.50, the ___ sample size required in order to satisfy a normal approximation
larger
Chapter 5: An important theorem in probability states that as the number of trials increase, an empirical probability approach the theoretical probability. This theorem is called the ___________ of _______ _______
law of large numbers
in general, the variability between sample means is ___ the variability between observations
less than
if the value of the test statistic falls in the rejection region, the p-value must be
less than a
for a continuous random variable X, the cumulative distribution function F(x) provides the probability that X is
less than or equal to any value x
Chapter 1: Which of the following are rules for a data analyst?
maintain data integrity, know and follow accepted procedures, protect confidential information
samples are primarily used to
make inferences about population parameters
Chapter 2: Which of the following are examples of the interval scale of measurement?
many Likert scales, a golfer's score relative to par
Chapter 1: Identifying repeat customers
marketing
most widely used measure of central location
mean
Chapter 4: An additional measure of dispersion is the ____________ ___________ deviation (MAD). This statistic reveals the average distance from the center. Absolute values must be used; otherwise the deviations around the mean would sum to zero
mean absolute
Chapter 4: The average of absolute differences between the values of the data set and the mean is the
mean absolute deviation
the normal distribution is completely described by these two parameters
mean and variance
Chapter 4: When monitoring a process distribution, both the __________ and the __________ must be tracked.
mean, variation
Chapter 4: covariance
measures the degree to which the values of X and Y change together.
Chapter 4: Generally, the ______________ is the best measure of center when outliers are present.
median
Chapter 4: In a neighborhood there are five houses listed for sale for the following amounts: $250,000, $275,000, $280,000, $295,000, and $515,000. What is the best measure of center for the price of a house in that neighborhood?
median
Chapter 4: The measure of center where half the values of the data set lie above this is measure and half the values of the data set lie below this measure is known as the
median
measure of central location where half the values of the data set lie above this measure and half the values of the data set lie below this measure is known as the
median
the ___ is the best measure of central location when outliers are present
median
Chapter 4: The second quartile is also the
median, 50th percentile
Bayes Theorem
method of revising probabilities to reflect new information. Prior (unconditional) probability of an event B is revised after event A has occurred to yield a posterior (conditional) probability.
Chapter 4: The __________ is the measure of center that identifies the most frequently occurring value in the data set
mode
Chapter 4: The owner of a grocery store wanted to determine the brands of soda that customers purchase at the store. When summarizing the data about soda brand purchase the meaningful measure of center is the
mode
Chapter 3: Identify the characteristics below that does NOT describe a Pareto chart.
most common categories appear to the far right of the graph
Chapter 5: if two events are independent, then the joint probability of the two events is calculated by...
multiplying the probability of the two events
Chapter 2: A significant weakness of the ordinal scale is...
no clear meaning to differences between the ranked values
The dean of the business school at a local university categorizes students by major (i.e., accounting, finance, marketing, etc.) to help in determining class offerings in the future.
nominal
Mesokurtic
normal bell shaped population
when testing u, the p-value is the probability of obtaining a sample mean at least as large or at least as small as the one derived from a given sample, assuming the ___ hypothesis is true
null
Chapter 3: Histograms can be used to...
observe the spread or variability of the data, determine the shape of the data,
when performing a hypothesis test on u, the p-value is defined as the
observed probability of making a type 1 error
which of the following graphical depictions displays cumulative data
ogive
how does an ogive differ from a polygon?
ogive is a graph of a cumulative (relative) frequency distribution, while a polygon is a graph of a (relative) frequency distribution
Chapter 4: Accuracy of grouped estimates depends on
on the number of bins distribution of data within bins bin frequencies.
7777 The standard deviation of the standard normal distribution is
one
the sum of the probabilities of a list of mutually exclusive and exhaustive events is
one
Chapter 2: Parameter V.S. Statistic examples A) The average age of all students currently enrolled at the Leads School of Business (population of students is only current students). B) The average starting salary for 25 students from this year's Leeds' MBA graduating class of 110 students. C) The average GPA for a sample of 40 students from this year's graduating class at The Leeds School of Business.
parameter: A statistic: B and C
Chapter 2: A __________________ includes all members of the group of interest.
population
expected value of a distribution is also referred to as the
population mean
u
population mean
ó^2
population variance
selection bias occurs when
portions of the population are excluded from the consideration for the sample
numerical value that measures the likelihood of an uncertain event is a
probability
the probability distribution of a continuous random variable is called its
probability density function
Chapter 5: Subjective approach
probability is needed when there's no repeatable random experiment for example what is the probability that Fords new supplier of plastic fasteners will be able to meet the September 23 shipment deadline or what is the probability that a new truck product program will show a return on investment of at least 10%, or what is the probability that the price of Forte stock will raise within the next 30 days
Marginal probabilites
probability of an event relative to the frequency that is found by dividing a row or column total by the total sample size
Chapter 5: Subjective
probability reflects someone's informed judgment about the likelihood of an event
cumulative relative frequency distribution for qualitative data identifies the
proportion of observations that fall below the upper limit of each class
relative frequency distribution for quantitative data identifies the
proportion of observations that occur in each class
a variable that is described verbally rather than numerically is called a
qualitative variable
when constructing a histogram what values/labels go on the horizontal (x) axis and the vertical (y) axis
quantitative class limits on the horizontal axis; frequency or relative frequency on the vertical axis.
all are examples of cross-sectional data except
quarterly sales for a computer company for the last 5 years
Subjective approch
reflects informed judgement about the likelihood of an event. This method is needed when there is no repeatable random experiment.
when calculating the probability of x successes in n trials of a binomial experiment, the probability of success and the probability of failure
remain the same, even when a probability is calculated for a different value of x
population consists of all items of interest in a statistical problem, whereas a ___ is a subset of a pop
sample
Chapter 5: The set of all possible outcomes from a random experiment is called a
sample space
Chapter 2: A ______________ is a numerical summary of a sample whereas ______________ is a numerical summary that describes a population.
statistic parameter
Chapter 4: Descriptive measures derived from a sample (n items) are
statistics
Chapter 2: Identify which of the sampling techniques listed are random
stratified clustered (simple and systematic)
a normal random variable X is transformed into Z by
subtracting the mean, and then dividing by the SD
The expected value of the sum of two or more random variables is equal to the _________ of their individual values.
sum
Chapter 4: Shape
symmetrical, skewed, sharply peaked, flat, bimodal
In October 2010, Massachusetts enacted a law that forbids cell phone use by drivers under the age of 18. A policy analyst would like to determine whether the law has decreased the proportion of drivers under the age of 18 who use a cell phone -Suppose a sample of 200 drivers under the age of 18 results in 150 who still use a cell phone while driving. What is the value of the test statistic? What is the p-value?
test stat -2.58 ± 0.20 p-value 0.0049 ± 0.005
Chapter 5: Suppose in a population of adults there are 10% that are avid fly fishermen/womenIf we were to choose a random sample of 10 adults from this population the law of large numbers says
that the percent of fishermen/women in this sample might not equal 10% but as the sample gets farger the percent will get closer to 10%.
Chapter 3: Stacked column chart
the bar height is the sum of several subtotals. Areas may be compared by color to show patterns in the subgroups, as well as showing the total. Stacked column charts can be effective for any number of groups but work best when you have only a few. Use numerical labels if exact data values are of importance.
Chapter 3: When constructing bins for a frequency distribution of quantitative data, which if the following principles should generally be followed?
the bins should be exhaustive, bins should be mutually exclusive, bins should have the same width
researcher wants to compare the variability of two data sets that have different units of measurement. Which measures is most useful as a relative measure of dispersion
the coefficient of variation
Chapter 4: Which of the following can be used to determine the proportion of data points that fall within a specified number of standard deviation from the mean?
the empirical rule-assuming a normal distribution, Chebyshev's Theorem
least accurate
the height of each rectangle represents cumulative frequency or cumulative relative frequency
Chapter 3: The pictured graph from the New York Times op-ed piece has some misleading components to it: which of the following elements are misleading?
the higher by position the category icon the lower the rating, all the ratings arrows end at the same point regardless of the actual rating
Chapter 4: When the data are skewed left (or negatively skewed)...
the mean is below the median.
Chapter 2: Which of the following are examples of time series data?
the monthly Consumer Confidence Index for the past three years. average annual credit card debt over the past decade.
Chapter 6: In which of the following situations is it appropriate to use a Poisson process?
the number of calls arriving to a customer help line in a one hour period
probability
the number of measure the relative likelihood that the event will occur. P(A)
which characteristic of interest is a variable
the number of pizzas ordered from pizza hut per day
all are examples of random variables that likely follow a normal distribution except
the number of states in the USA
Chapter 5: General law of addition
the probability of the union of two events A and B is the sum of their probabilities less the probability of the intersection P(A∪B)=P(A)+P(B)−P(A∩B)
the z table provides the cumulative probabilities for a given z. What does "cumulative probabilities" mean
the probability that Z is less than or equal to a given z value
Chapter 3: A relative frequency distribution for quantitative data identifies...
the proportion of observations that occur in each bin
When a data set is symmetrical the mean and the median are approximately
the same
for an alternative hypothesis of Ha: u > uo, we might possibly reject the null hypothesis if
the sample mean is greater than uo
the central limit theorem states that the distribution of the sample mean will be approximately normal if
the sample size is sufficiently large; as a general guideline n>= 30
the critical value of a hypothesis test is
the value that separates the rejection region from the non-rejection region
Chapter 6: Which of the following experiments is likely to produce a uniform discrete distribution?
the values that occur from repeated spins of a roulette wheel at a casino the answer selected on a multiple choice question that has four choices by a student who did not study
Chapter 3: Identify the problem with the pictured graph
the vertical axis limit is too high, there is no 0 value on the vertical axis, the time ranged needs to be specified
Chapter 3: Tables are frequently used to display data because
they are the simplest form of data display, a well-designed table can communicate the meaning of data at a glance
When a random process counts the number of arrivals over a random time interval we are also interested in the _______ between two arrivals.
time
Chapter 2: ______________ _______________ data are quantities that represent or track the values taken by a variable over equally spaced periods
time series
data that are collected by recording a characteristic of a subject over several time periods are referred to as
time series data
Chapter 3: Column or bar charts can be used to display ________ __________ data using the time periods as the _________ __________.
time series, category labels
77777 What is the use of the cumulative distribution function, F(x) of a continuous random variable?
to find the probability that X is less than or equal to any value X
Chapter 5: Do you calculate a joint probability from a contingency table, the frequency of each cell is divided by the...
total number of outcomes in the sample space
In order to convert a contingency table into a joint probability table, the frequency of each cell is divided by the
total number of outcomes in the sample space
in order to convert a contingency table into a joint probability table, the frequency of each cell is divided by the
total number of outcomes in the sample space
T/F: if we had access to data that included the entire population, then the values of the parameters would be known an no statistical inference would be required
true
T/F: the SD of a discrete probability distribution measures how dispersed the values are from the mean
true
T/F: the optimal values of type 1 and type 2 errors requires a compromise in balancing costs of each type of error
true
T/F: we choose a value for a before conducting a hypothesis test
true
True or false: The arithmetic mean is the average of a data set.
true
True or false: The trimmed mean can mitigate the effect of outliers.
true
True or false: Under appropriate circumstances, many discrete random variables can be described by the normal distribution.
true
empirical rule should be applied to normally distributed data sets
true
variance represents the O^2
true
Chapter 5: Intersection
two events A and B is the event consisting of all outcomes in the sample space S that are contained in both event A and event B
Union
two events consist fo all outcomes in the sample that are contained in wither A or B or both
compound event
two or some simple events
A prior probability for event A is the ____________ probability whereas the posterior probability of event A is the ___________ probability.
unconditional conditional
Chapter 5: prior probability for event A is the _______ probability whereas the prosterior probability event A is the __________ probability
unconditional conditional
manager of women's clothing store is projecting next months sales. Her low-end estimate of sales is $25,000 and her high-end estimate is $50,000. She decides to treat all outcomes of sales between these 2 values equally likely. If we define the random variable X as sales, then X follows the
uniform distribution
an ogive is a graph that plots the cumulative frequency, or cumulative relative freq, against the
upper limit of the corresponding class
Random variables are represented by___case letters while particular values of random variables are represented by___case letters
upper, lower
Chapter 5: Classical approach
use deduction to determine P(A)
Box plots
useful for EDA (exploratory data analysis) - center (position of the median Q2). This plot shows varaibility (width of the box) defined by Q1 and Q3 and the range between xmin and xmax. A box plot shows shape
which is likely to produce a discrete uniform distribution
values from repeated wheel spins at a casino
Calculate the variance and the standard deviation of this probability distribution.
variance 31.5 SD 5.6
two widely used measures of dispersion are
variance and SD
To calculate the union for two mutually exclusive events A and B,
we add the probability of A to the probability of B
Chapter 3: Sorting data is helpful because
we can see the range of values we can see the frequency of each data value
Multiplying data by a fraction (where the fractions add to 1) and summing results in a _____ mean.
weighted
when a mean is calculated and some observations are given greater importance, we refer to this measure of central location as a
weighted mean
which is not an example of an experiment
winner of last weeks lottery
A recent study by Allstate Insurance Co. finds that 82% of teenagers have used cell phones while driving -Is the sampling distribution of the sample proportion is approximately normal
yes
we calculate the __ to find the relative position of a sample value within a data set
z-score
in order to transform a value x into its standardized value z, we use the following formula
z=(x-u)/ó
probability that a continuous random variable X assumes a particular value x is
zero
Chapter 5: define event A = {1, 2, 3, 4} and event B = {2, 3, 6, 7}. AuB =
{1,2,3,4,6,7}