Statistic Study Guide Final EXAM
If you know that the probability of committing a Type II error (β) is 5%, you can tell that the power of the test is A) 2.5%. B) 95%. C) 97.5%. D) unknown.
95%.
QUARTILES Q2
= observation at the 50th percentile (median of entire data set) (Will give you a rainked value)
Ethical Issues regarding confidence Intervals
A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate. The level of confidence should always be reported. The sample size should be reported. An interpretation of the confidence interval estimate should also be provided.
Categorical Nominal variable
A set of labels or names applied to groups composed of individuals with similar characteristics (They cannot be ordered) ex) Cellular provider -- responses: AT&T, Sprint, Verison ex) Type of Investment: Growth, Value, other
Determines what tail test used
Alternative hypothesis less than -- left greater than -- right
Null hypothesis
Always about parameter, never sample statistic
Central Limit Theorem
As the sample size (n) gets large enough, the sampling distribution of the sample mean becomes almost normal regardless of the shape of the population Properties of Central Limit Theorem For most distributions, regardless of shape of population, the sampling distribution of the mean is approximately normally distributed if 𝑛≥30 If the distribution of the population is fairly symmetrical, the sampling distribution of the mean is approximately for samples as small as 5. If the population is normally distributed, the sampling distribution of the mean is normally distributed, regardless of the sample size.
Critical Value
Level of confidence associated with confidence interval (proportion of times the parameter (true value) will be covered by the calculated CI using the correct methods)
Margin of error
Measure of how accurate the point estimate is
Variable Type: How long did the mobile app update take to download?
Numerical continuous
Variable Type: How many text messages have you sent in the past three days?
Numerical discrete
Collectively Exhaustive Events
One of the events must occur The set of events covers the entire sample space
General Multiplication Rule
P(A and B) = P(B) * P(A|B)
General Addition Rule
P(A or B) = P(A) + P(B) - P(A and B) If A and B are mutually exclusive, then P(A and B) = 0 So rule can be simplified, P(A or B) = P(A) + P(B)
If the variances are EQUAL in a two independent populations,
POOLED df = n1 + n2 - 2
Confidence Level
Probability that this method produces an interval that contains (covers) the parameter and associated critical value (e.g. z-score or t-score) The level of confidence is denoted (1−α)100%
For a Z-distribution, if the p-value is less than or equal to α
Reject Ho
P-value in terms of α
Reject Ho: If the P-value is less than or equal α Fail to reject Ho: if the P-value is greater than the α
Rules for Confidence intervals of dependent/Independent two sample tests
Rule #1: If the LL and UL are both greater than 0, this suggests that group A has a greater mean Rule #2: If LL and UL are both less than 0, this suggests that group B has a greater mean Rule: #3: If the LL is less than 0 and the UL is greater than 0, then neither group has a clear greater mean.
EXAMPLE: Cereal plant Operations Manager (OM) must ensure that the mean weight of filled boxes is 368 grams to be consistent with the labeling on those boxes. To determine whether the mean weight is consistent with the expected amount of 368 grams, the OM selects a random sample of size 100 filled boxes that had a sample mean of 369.27 grams. Past experience states the standard deviation of the fill amount is 15 grams. Based on the 95% confidence interval is there evidence to suggest that anything is wrong with the cereal filling process?
"(365.40, 373.14)" Because the interval includes 368, there is no evidence to suggest that anything is wrong with the cereal filling process.
𝛼
(1 - confidence level)
Power
(1-β) is the probability of rejecting H0 when it is false P(REJECTING a H0)
Dependent Sample Tests
(paired-sample test) compare scores on two different variables but for the same group of cases
What is the probability that at least two(2) new cars needs a warranty repair in the first 90 days? (Just write the equation)
(𝑋 ≥ 2) = 1 − (𝑃(𝑋 = 0) + 𝑃(𝑋 = 1))
Example Using Chebyshev Rule A population of 2-liter bottles of cola is known to have a mean fill-weight of 2.06 liter and a standard deviation of 0.02 liter. However, the shape of the population is unknown, and you cannot assume that it is bell-shaped. Describe the distribution of fill-weights.
(𝜇 − 𝜎 , 𝜇+ 𝜎) = 2.06 ±0.02 = (2.04 , 2.08) (𝜇 −2𝜎 , 𝜇+2𝜎) = 2.06 ±2(0.02) = (2.02 , 2.10) (𝜇 −3𝜎 , 𝜇+3𝜎) = 2.06 ±3(0.02) = (2.00 , 2.12) Is it very likely that a bottle will contain less than 2 liters of cola? Between 0% and 11.11% of the bottles will contain less than 2 liters
Example: The Health and Nutrition Examination Study of 1976-1980 (HANES) studied the heights of adults (aged 18-24) and found that the heights follow a normal distribution with the following: Women Mean (): 65.0 inches standard deviation (): 2.5 inches Men Mean (): 70.0 inches standard deviation (): 2.8 inches Find the proportion of men with heights between 67.2 inches and 72.8 inches. Using Empirical Rule
(𝜇−𝜎)=67.2 (𝜇+𝜎)=72.8 Proportion of men with heights are between 67.2 (µ - σ) inches and 72.8 (µ + σ) inches is 0.68 (68%) per the Empirical Rule.
Understanding Confidence Intervals
**A 95% confidence interval is formed under the knowledge: 95% of all the possible intervals based on every possible sample from the population Would cover the parameter and the other 5% would miss Twenty-five samples from the same population give these 95% confidence intervals. In the long run, 95% of all such intervals cover the true population proportion
Sample Size Determination for Proportion
**when you have no prior knowledge of 𝑝 set 𝑝=.50 𝑝= population proportion 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution MOE = is the margin of error(sampling error)
If two events are mutually exclusive, what is the probability that both occur at the same time? a. 0. b. 0.50. c. 1.00. d. Cannot be determined from the information given
0
The symbol for the confidence coefficient of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.
1 - α.
The symbol for the power of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.
1 - β.
Steps in Hypothesis Testing
1. Determine the null and alternative hypotheses 2. Select a level of significance, α 3. Compute the test statistic 4. Decision based on the critical value approach, p-value approach, or confidence interval 5. State the conclusion
Sales prices of cards have a mean sale price of $5.25 and a standard deviation of $2.80. Suppose a random sample of 100 cards are selected. 1. Describe the sampling distribution for the sample mean sale price of the selected cards? 2. What is the probability of the mean sale prices of cards are greater than $6.00?
1. Find U(x-bar) = $5.25 since it is normal distribution the population mean is equal to the the sampling distribution of the sample mean. σ(x-bar)= σ/ (the square root of the sample size) =.28 Bell shape since the sample size is greater than 30 Thus, the distribution is approximately normal with a mean of $5.25 and a standard error of $.28 2. P (X-bar is greater than $6.00) = 1 - .9963 .0037
Assumptions for independent hypothesis testing
Both samples must be randomly selected Observations within each sample must be independent Distributions of the sample mean must be normal
Controlling the probability of a Type I Error
By the choice of the significance level
Chebyshev Rule
Can't use the Empirical Rule for heavily skewed data sets States that for any data set, regardless of shape, the percentage of values found within k standard deviations of the mean must be at least: (𝜇 − 𝜎 , 𝜇+ 𝜎) at least 0% (𝜇 − 2𝜎 , 𝜇 + 2𝜎)at least 75% (𝜇 −3𝜎 , 𝜇+ 3𝜎)at least 88.89%
Variable Type: Do you have a facebook profile?
Categorical
Critical Value Approach
Compares the critical value with the test statistic Reject Ho: If the test statistic falls in the critical region Fail to reject Ho: If the test statistic does not fall in the critical region
Population Standard deviation
Computed by taking the square root of the population variance
How can we achieve a narrower confidence interval?
Decrease the level of confidence OR increase the sample size
Interval Scale Numerical Variable
Defined by distinct classes, magnitude, and equal intervals but no true zero points. Assumes that differences between scores of equal magnitude really represent equal differences in the variable measured Ex) Standardized exam scores Ex) Temperatures
Level of Significance (α)
Determines how much evidence against Ho we are require to reject Ho and find in favor of the alternative hypothesis, H1 It is the priability of rejecting the null hypothesis when the null hypothesis is true
Histogram
Displays a quantitative variable across different grouping of values Groups must cover the same range so have equal width Height used to compare the frequency of each range of values
right-tailed test
Equal versus greater than 𝐻𝑜 : parameter ≤ null value 𝐻1: parameter > null value
left-tailed test
Equal versus less than 𝐻𝑜 : parameter ≥ null value 𝐻1: parameter < null value
Two-tailed test
Equal versus not equal hypothesis 𝐻𝑜 : parameter = null value 𝐻1: parameter ≠ null value
Measurement Error
Errors not related to the act of selecting a sample (processing errors, poorly worded questions, deliberate inaccuracies in responses)
Non-response Error
Failure to collect data on all items in the sample
Type II Error
Failure to reject a false null hypothesis Probability of this error is β Can only occur if Ho is false
A new drug is advertised as being 80% effective. A consumer advocacy group thinks that it isn't that effective and is looking for evidence that it doesn't work well. What is the null hypothesis and the alternative hypothesis?
H0: p = 0.80 H1: p < 0.80
Qualities that increases or decreases the width of confidence intervals
Larger sample size -- narrower interval Lower level of confidence -- narrower interval Smaller sample size -- wider interval Higher level of confidence -- wider interval
Example 1: A doctor is researching side effects with a new pain medication. A clinical trial including random sample of 340 people who took a new pain relief medication reveals that 23 suffered some side effects. At the α=.05 level of significance, is there evidence that less than 10% of all patients who take the medication will experience side effects? Use the p-value approach.
H0: p ≥ .10 H1: p < .10 α=0.05, n=340 Calculate Test Statistic Zstat = -1.99 Determine Critical Value Zα =Z.05 = 1.645 Decision and Interpretation −1.99 < −1.645 Reject H0 There is sufficient evidence to conclude that that fewer than 10% of all patients taking this medication experience side effects. 3
Example 2: Gasoline pumped from a supplier's pipeline is supposed to have an octane rating of 87.5. A random sample of 13 days had the following octane readings. Is there evidence, at the .05 level of significance, that the mean octane reading differs from 87.5? (𝑋ത = 87.08, 𝑆 = 0.649)
H0: µ = 87.5 H1: µ ≠ 87.5 α=0.05, n=13 Calculate Test Statistic Tstat = -2.307 Determine Critical Value 𝑑𝑓 = 13 − 1 = 12, 𝑡𝛼/2 =𝑡.025= 2.1788 Decision and Interpretation −2.307 < −2.1788 Reject H there is sufficient evidence to reject the null hypothesis and to conclude that the long run mean octane reading differs from 87.5.
EXAMPLE: An environmentalist takes samples at a nearby river to study the average concentration level of a contaminant. He wants to find out, using a .10 level of significance, if the average concentration level exceeds the acceptable level for safely consuming fish from the river. Describe a Type I & Type II Error and potential consequences
H0: µ is at or lower than acceptable level HA: µ exceeds acceptable level Describe a Type I Error for this problem: Researcher determines that the concentration levels are too high when, in fact, they are safe. Potential consequence: People are not allowed to fish in the river when it is really safe to fish. Describe a Type II Error for this problem: Researcher determines that the concentration levels are NOT too high, and it is safe to fish when it is really NOT safe to fish. Potential consequence: People consume contaminated fish.
Example: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. Data is collected to see if there is evidence, at the .05 level of significance, that less than 30% of students will attend this year? Describe Type I & II Errors
H1: p<0.30 What parameter am I interested in? Proportion Describe a Type I Error for this problem: Committee determines that less than 30% of students will attend the event, but in fact, 30% WILL attend the event. Potential consequence: The committee reserves a space that is too small. Describe a Type II Error for this problem: Committee determines there is no evidence that less than 30% of students will attend but, in fact, less than 30% attend the event. Potential consequence: The committee reserves a space that is too large.
Locating Extreme Outliers from Z-Score
If Z-score is positive, ABOVE the mean If Z-score is negative, BELOW the mean A data value is considered an extreme outlier if its Z-score is less than -3.0 or greater than +3.0. The larger the absolute value of the Z-score the farther the data value is from the mean
Sampling Distribution of the Mean
If a population is normal with a mean (μ) and standard deviation (σ) the sampling distribution of x̅ is also normally distributed with mean (μx = μ)
Summarizing Independent Events
If one of the following are true, all are true P(A|B) = P(A) P(B|A) = P(B) P(A and B) = P(A) x P(B)
Rule of Thumb
If the larger sample standard deviation is more than twice the smaller sample standard deviation, then perform the T-test using the UNPOOLED method
Fail to Reject Ho (the null hypothesis)
If the test statistic does not fall in the critical region
Reject Ho (the null hypothesis)
If the test statistic falls in the critical region
Regression Line
Independent variable = X Dependent Variable = Y Changes in Y are ASSUMED to be related to changes in X ** The Simple linear regression equation predicts an estimate for the population regression line
Confidence interval
Interval containing the "most believable" values for a parameter (provides additional information about the variability of the estimate) Takes into account MOE (margin of error) or sampling error Constructed by using a point estimate and adding and subtracting the margin of error (that is, the critical z-score times the standard error) Point estimate ± margin of error. ESTIMATES OF THE POPULATION
Ratio Scale Numerical Variable
It is defined by distinct classes, magnitude and, equal intervals, but has a true zero point. Assumes that differences between scores of equal magnitude really represent equal differences in the variable measured Ex) age, cost of a computer
Non-probability Samples
Items are chosen without regard to their probability of occurrence. Either through Judgement (collect a sample that an expert thinks is representative of the population) or Convenience (collect the sample that is easiest to access)
Rules of Quartiles
Rule 1: If the ranked value is a whole number, the quartile is equal to the measurement that corresponds to the ranked value Rule 2: If the ranked value is a fractional half (2.5, 3.5,5.5, etc), the quartile is equal to the average of the measurements that corresponds to the two ranked values Rule 3: If the ranked value is neither a whole number or fractional half, the quartile is equal to the measurement that corresponds to the rounded nearest integer.
Sample standard deviation symbol
S
Contingency Table
Shows the values of the data categories for more than one variable and the frequencies or proportions/percentages for each of the Joint Responses
Point Estimate
Single value that serves as an estimate of a population parameter
Hypothesis
Statement regarding a characteristic of one or more populations
The effect of the sample size (n) on σx
Taking a larger sample results in less variability in the sample means from sample to sample As n increases, σx decreases Resulting in a more taller and narrower graph
What affects the margin of error?
The level of confidence which determines the value of Z Standard error which is a function of sample size
The probability that a new advertising campaign will increase sales is assessed as being 0.80. The probability that the cost of developing the new ad campaign can be kept within the original budget allocation is 0.40. If the two events are independent, the probability that the cost is kept within budget and the campaign will increase sales is: a. 0.20 b. 0.32 c. 0.40 d. 0.88
Using the multiplication rule for independent events 𝑃(𝐴 𝑎𝑛𝑑 𝐵)=𝑃(𝐴)𝑃(𝐵) .80 x .40 = .32
How large is "large enough" for the sampling distribution of p?
The shape of the sampling distribution of 𝑝 is approximately normal provided 𝑛𝑝≥5 and 𝑛(1−𝑝)≥5
Which of the following statements is not true about the level of significance in a hypothesis test? A) The larger the level of significance, the more likely you are to reject the null hypothesis. B) The level of significance is the maximum risk we are willing to accept in making a Type I error. C) The significance level is also called the α level. D) The significance level is another name for Type II error.
The significance level is another name for Type II error.
The Standard error of the Proportion
The standard deviation of the sampling distribution p = population proportion
Fail to reject (Do not reject) the null hypothesis
There is insufficient evidence to support the alternative hypothesis
Reject the null hypothesis:
There is sufficient evidence to support the alternative hypothesis
Measures of Variation
Total Sum of Squares = regression sum of squares + error sum of squares
True or False: Suppose, in testing a hypothesis about a mean, the p-value is computed to be 0.043. The null hypothesis should be rejected if the chosen level of significance is 0.05.
True
True/False If two events are mutually exclusive and collectively exhaustive, the probably that one or the other occurs is 1
True
True/False the larger value of S, the more spread out the variable or data are
True
True/false The mean is strongly affected by extreme value The median is less sensitive than the mean to extreme values
True, True
If a researcher rejects a true null hypothesis, she has made a(n) ________ error.
Type I
If a researcher does not reject a false null hypothesis, she has made a(n) ________ error.
Type II
The difference between hypothesized parameters and its true value increase..... Type II Error (β) _______________
Type II Error (β) increases
When alpha (α) (probability of Type I Error) decreases
Type II Error (β) increases
When population standard deviation (σ) increases, the probability of Type II Error (β) _______________
Type II Error (β) increases
When the sample size decreases, the probability of Type II Error (β) _______________
Type II Error (β) increases
If the variances are UNEQUAL in a two independent populations,
UNPOOLED
Percentage Polygon
Uses midpoints of each class and can combine data from two groups to allow easier comparison
Cumulative Percentage Polygon
Uses the cumulative percentage distribution (lower limits) to play the cumulative percentages along the Y axis
Suppose we wish to test H0 : μ ≤ 47 versus H1 : μ > 47. What will result if we conclude that the mean is greater than 47 when its true value is really 52? A) We have made a Type I error. B) We have made a Type II error. C) We have made a correct decision. D) None of the above are correct.
We have made a correct decision.
Confidence Interval Approach for Hypothesis testing
When testing a null hypothesis for a two tailed test...... If the confidence interval CONTAINS the null value, we DO NOT REJECT the null hypothesis IF the confidence interval DOES NOT contain the null value, we REJECT the null hypothesis
Empirical Rule for Normal Distributions
Within 1 std dev of the mean ~ 68% Within 2 std dev of the mean ~ 95% Within 3 std dev of the mean ~ 99.7%
Confidence Interval Conclusion
You can be ______% confident that the population proportion of all ______________ who _________________ lies within the interval ___________ and ________________.
Confidence level 50% What is Z-critical value?
Z- Critical Value .67
Confidence level 70% What is Z-critical value?
Z-Critical Value 1.04
Confidence level 80% What is Z-critical value?
Z-Critical Value 1.28
Confidence level 90% What is Z-critical value?
Z-Critical Value 1.645
Confidence level 95% What is Z-critical value?
Z-Critical Value 1.96
Confidence level 99% What is Z-critical value?
Z-Critical Value 2.58
Confidence level 60% What is Z-critical value?
Z-Critical Values 0.84
Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the Z-score for a test score of 120. Find Z-score and state above or below the mean and whether it is an outlier or not
Z-score = -3.7 standard deviations below the mean Would be considered an outlier and below the means
If an economist wishes to determine whether there is evidence that mean family income in a community exceeds $50,000, A) either a one-tail or two-tail test could be used with equivalent results. B) a one-tail test should be utilized. C) a two-tail test should be utilized. D) None of the above.
a one-tail test should be utilized.
Hypothesis testing
a procedure that checks sample data against a claim or assumption about the population
Variable
a property of an object or event that can take on different values
Categorical Ordinal Variable
a set of labels applied to groups composed of individuals with similar characteristics where the labels indicated more or less of a quality or CAN BE RANK ORDERED ex) Student class designation: freshman, sophomore, Junior, Senior
Critical Value
a table value based on the sampling distribution of the point estimate and the desired level of confidence
It is possible to directly compare the results of a confidence interval estimate to the results obtained by testing a null hypothesis if A) a two-tail test for μ is used. B) a one-tail test for μ is used. C) Both of the previous statements are true. D) None of the previous statements is true.
a two-tail test for μ is used.
If an economist wishes to determine whether there is evidence that mean family income in a community equals $50,000, A) either a one-tail or two-tail test could be used with equivalent results. B) a one-tail test should be utilized. C) a two-tail test should be utilized. D) None of the above.
a two-tail test should be utilized.
Which of the following about the binomial distribution is not a true statement? a) The probability of the event of interest must be constant from trial to trial. b) Each outcome is independent of the other. c) Each outcome may be classified as either "event of interest" or "not event of interest." d) The variable of interest is continuous.
d) The variable of interest is continuous.
Which of the following statements is true for the normal distribution? a. The highest point occurs at 𝜇. b. It has a mean of 𝜇 and a standard deviation of 𝜎. c. It has inflection points at 𝜇 − 𝜎 and 𝜇 + 𝜎. d. All the above are true.
d. All the above are true.
If two events are mutually exclusive, what is the probability that one or the other occurs? a. 0. b. 0.50. c. 1.00. d. Cannot be determined from the information given
d. Cannot be determined from the information given
Independent Variable (aka predictor, explanatory, variable)
a variable that, according to theory, has a casual influence on the dependent variable
In its standardized form, the normal distribution.... a) has a mean of 0 and a standard deviation of 1. b) has a mean of 1 and a variance of 0. c) has an area equal to 0.5. d) cannot be used to approximate discrete probability distributions.
a) has a mean of 0 and a standard deviation of 1.
Collectively exhaustive
all data values must be recorded in the categories created
Event
any collection of outcomes from the experiment
Which of the following about the binomial distribution is a true statement? a. the variable X is continuous b. the probability of event of interest 𝑝 is stable from trial to trial. c. the number of trials n must be at least 30. d. the results of one trial are dependent on the results of the other trials
b. the probability of event of interest 𝑝 is stable from trial to trial.
r
called the sample coefficient of correlation ranges from -1 to 1 The closer r is to 0, the weaker the relationship. the closer r is to -1 or 1, the stronger the relationship Strength of association Small (.1 to .3) Medium (.3 to .5) Large. (.5 to 1) (Side note: p is the population coefficient of correlation)
Coverage Error
certain groups are excluded from the sampling frame Results in selection bias
r^2
coefficient of determination It is the proportion of the total variation in the dependent variable (Y-axis) that is explained by the variation in the independent variable (X-axis)
Independent Sample test
compare the scores on the same variable but for two different groups
Marginal probability
consists of a set of joint probabilities
If a researcher does not reject a true null hypothesis, she has made a(n) ________ decision.
correct
If a researcher rejects a false null hypothesis, she has made a(n) ________ decision.
correct
A ________ is a numerical quantity computed from the data of a sample and is used in reaching a decision on whether or not to reject the null hypothesis. A) significance level B) critical value C) test statistic D) parameter
critical value
The value that separates a rejection region from a non-rejection region is called the ________.
critical value
As the sample size increases, the standard error of the mean (the standard deviation of the sampling distribution) ______________
decreases
As the standard error ______________, the values become more concentrated around the mean
decreases
Measure of Variation
describe the spread or variability or dispersion of the data for a particular variable Range Interquartile Range Variance Standard deviation
Mutually exclusive
each data value is placed in one and only one category
mutually exclusive events
events that cannot happen at the same time
For Z-distribution, if the p-value is greater than the α
fail to reject Ho
True or False: "What conclusions and interpretations can you reach from the results of the hypothesis test?" is not an important question to ask when performing a hypothesis test.
false
True or False: In a hypothesis test, it is irrelevant whether the test is a one-tail or two-tail test.
false
True or False: In instances in which there is insufficient evidence to reject the null hypothesis, you must make it clear that this has proven that the null hypothesis is true.
false
True or False: Suppose, in testing a hypothesis about a mean, the Z test statistic is computed to be 2.04. The null hypothesis should be rejected if the chosen level of significance is 0.01 and a two-tail test is used.
false
True or False: Suppose, in testing a hypothesis about a mean, the p-value is computed to be 0.034. The null hypothesis should be rejected if the chosen level of significance is 0.01.
false
True or False: The larger the p-value, the more likely you are to reject the null hypothesis.
false
True or False: You should report only the results of hypothesis tests that show statistical significance and omit those for which there is insufficient evidence in the findings.
false
Sampling Distribution of Proportion
follows the binomial distribution
Skewed to the left
if the left "tail" extends much farther out than the right tail *The mean is less than the median
If a test of hypothesis has a Type I error probability (α) of 0.01, it means that A) if the null hypothesis is true, you don't reject it 1% of the time. B) if the null hypothesis is true, you reject it 1% of the time. C) if the null hypothesis is false, you don't reject it 1% of the time. D) if the null hypothesis is false, you reject it 1% of the time.
if the null hypothesis is true, you reject it 1% of the time.
Skewed to the right
if the right "tail" extends much farther out than the left tail *The mean is greater than the median
The probability that the sample mean will fall close to the population mean will always ____________- when the sample size increases
increase
The probability distribution of proportion becomes more peaked when the sample size
increases
Probability Sample
items in the sample are chosen on the basis of known probabilities 4 Types: Simple Random: sample is chose in such a way that every subject is equally likely to be selected for the study Systematic: uses a systematic method k=N/n (i.e. n groups of k items such as, every 10th person) to select the sample Stratified: divide frame into groups (strata). Take a simple random sample from each strata Cluster: divide N items in the frame into clusters and take a random sample of the clusters. Study all items in the cluster
Categorical Data
labels or names used to identify categories of like items MUST BE USED AS A PROPORTIONS p̂ = sample proportion p = population proportion Assumptions: Population with a fixed proportion Random sample from population np has to be greater than or equal to 5 and n(1-p) has to be greater than or equal to 5 THE MEAN OF THE SAMPLES WILL BE EQUAL TO THE POPULATION PROPORTION Up = p
Range
largest value - smallest value
Standard Deviation
measures the average distance of an observation from the mean computed by taking the square root of the sample variance (Controls the spread of the graph)
Variance
measures the average of the squared deviations of each observation from the mean sample variance S^2
SSR
measures the explained variation between X and y
SSE
measures the unexplained variation between X and Y
If, as a result of a hypothesis test, you reject the null hypothesis when it is false, then you have committed A) a Type II error. B) a Type I error. C) no error. D) an acceptance error.
no error.
Z-score
number of standard deviations a data value is from the mean
QUARTILES Q1
observation at the 25th percentile (Will give you a rainked value)
QUARTILES Q3
observation at the 75th percentile (Will give you a rainked value)
Alternative Hypothesis
opposite of null hypothesis Challenges the status quo • Never contains the "=", or "≤", or "≥" sign - If H0 contains "=" --> H1 must contain "≠" - If H0 contains "≤" --> H1 must contain ">" - If H0 contains "≥" --> H1 must contain "<" • Is generally the hypothesis that the researcher is trying to prove
Z- value for Sampling Distribution of the Proportion
p̂ = sample proportion p = population proportion
Z- value for Sampling Distribution of the Proportion (test statistic)
p̂ = sample proportion p = population proportion
Conditional Probability
refers to the probability of event A, given information about the occurrence of another event, B
Sampling Error
reflects the "chance differences" Cause by the act of taking a sample and make the results from a sample different from those of a census (Margin of error)
The power of a test is measured by its capability of
rejecting a null hypothesis that is false
Type I Error
rejecting a true null hypothesis The probability of this error is equal to α (Significance Level) Can only occur if Ho is true
point estimate
sample statistic
Pareto Chart
series of vertical bars showing tallies/frequencies/percentages in descending order *Helps identify the important "few" from the less important "many"
Scatter plot
shows the relationship between two quantitative variables measured on the same individuals X-axis: independent (doing the explaining) Y-axis: dependent (one being explained)
Summary Table
shows the values of the data categories for ONE variable and the frequency (counts) or proportions/percentages for each category
Standard error of the mean
standard deviation of all possible sample means Is the standard deviation of the point estimate
null hypothesis
states the claim of the assertion to be tested ALWAYS about a population parameter, not a sample statistic • Always contains "=", or "≤", or "≥" sign The null hypothesis is assumed true until evidence indicates otherwise ASSUMPTION of true H0 may or may not be REJECTED • BUT the ASSUMPTION is NEVER ACCEPTED
Parameter
summarizes the value of a specific variable for a population
Statistic
summarizes the value of a specific variable for sample data
If the Type I error (α) for a given test is to be decreased, then for a fixed sample size n, A) the Type II error (β) will also decrease. B) the Type II error (β) will increase. C) the power of the test will increase. D) a one-tail test must be utilized.
the Type II error (β) will increase.
Population Variance
the average of the squared deviations of each observation from the POPULATION mean
Sample Space
the collection of all possible outcomes
sampling distribution
the distribution of all the possible values of a sample statistic for a given sample size selected from a population
Skewness
the extent to which the data values are not symmetrical around the mean
Which of the following would be an appropriate null hypothesis?
the mean of the population is equal to 55 the population proportion is not less than .65
Which of the following would be an appropriate alternative hypothesis?
the mean of the population is greater than 55 the population proportion is less than .65
Mode
the most frequent observation of the variable that occurs in the data set
If the p-value is less than α in a two-tail test,
the null hypothesis should be rejected.
p-value
the probability of getting a test statistics to equal or be more extreme than the sample result, given that the null hypothesis, Ho is true To summarize the amount of evidence we have against the null hypothesis *** The smaller the p-value, the more evidence against the null hypothesis
Joint probability
the probability of occurrence involving two or more events
Simple probability
the probability of occurrence of a simple event in which each outcome is equally likely to occur
The power of a statistical test is A) the probability of not rejecting H0 when it is false. B) the probability of rejecting H0 when it is true. C) the probability of not rejecting H0 when it is true. D) the probability of rejecting H0 when it is false.
the probability of rejecting H0 when it is false.
Pointe Estimate
the sample statistic estimating the population parameter of interest For example the sample mean, x̅ is a point estimate of the population mean 𝜇. the sample proportion,p̂, is a point estimate of the population proportion 𝑝. (Doesn't show "how close" the estimate is to the parameter)
Standard Error
the standard deviation of the sample statistic
Median
the value that lies in the middle of the data when arranged in ascending order Rule 1: If the number of values is odd, the median is the measurement associated with the middle ranked value Rule 2: If the number of values is even, the median is the measurement associated with the average of the two middle-ranked values.
Dependent Variable (aka outcome, response, predicted variable)
the variable that is of greatest substantive interest to the researcher -- the variable with real world implications
independent variable
the variable used to predict or explain the dependent variable
Dependent Variable
the variable we wish to predict or explain Always on the y-axis
𝑍_ 𝛼 (pronounced "zsub alpha")
the z-score such that the area under the standard normal curve to the right of 𝑍_𝛼 is 𝛼.
True or False: "Is the intended sample size large enough to achieve the desired power of the test for the level of significance chosen?" should be among the questions asked when performing a hypothesis test.
true
True or False: A proper methodology in performing hypothesis tests is to ask whether a random sample can be selected from the population of interest.
true
True or False: In conducting research, you should document both good and bad results.
true
True or False: In instances in which there is insufficient evidence to reject the null hypothesis, you must make it clear that this does not prove that the null hypothesis is true.
true
True or False: In testing a hypothesis, you should always raise the question concerning the purpose of the study, survey or experiment.
true
True or False: The smaller the p-value, the stronger the evidence is against the null hypothesis.
true
True or False: The statement of the null hypothesis always contains an equality.
true
True or False: The test statistic measures how close the computed sample statistic has come to the hypothesized population parameter.
true
A priori probability
type of probability based on prior knowledge of the process (theoretical) Ex) coin toss, roll a die, draw a card
empiricle probability
type of probability based on the observed data
Subjective probability
type of probability that differs from person to person
If you know that the level of significance (α) of a test is 5%, you can tell that the probability of committing a Type II error (β) is A) 2.5%. B) 95%. C) 97.5%. D) unknown.
unknown.
Regression Analysis
used to predict the value of at dependent variable based on the value of at least one independent variable Explains the impact of changes in an independent variable on the dependent variable
Quanatative data
uses numbers MUST BE USED AS A MEAN ( x̅ ) Conditions 1. If population is bell shaped (normal symmetrical), random sample of any size 2. If population is not bell shaped, a large random sample must be greater than 30 THE MEAN OF THE SAMPLE MEANS WILL BE THE POPULATION MEAN Ux̅ = U The standard deviation of the sample means = to the population standard deviation divided by the square root of the sample size
For a given level of significance (α), if the sample size n is increased, the probability of a Type II error (β) A) will decrease. B) will increase. C) will remain the same. D) cannot be determined.
will decrease.
For a given sample size n, if the level of significance (α) is decreased, the power of the test A) will increase. B) will decrease. C) will remain the same. D) cannot be determined.
will decrease.
sampling proportion
x = number of items having the characteristic of interest
A Type II error is committed when
you don't reject a null hypothesis that is false
A Type I error is committed when
you reject a null hypothesis that is true
The symbol for the level of significance of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.
α.
The symbol for the probability of committing a Type I error of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.
α.
The symbol for the probability of committing a Type II error of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.
β.
Population Mean
μ
Z- Value Sampling Distribution of the Mean
μ = population mean σ = population standard deviation 𝑋 ̅= sample mean
t distribution
μ = the population mean 𝑠 = the sample standard deviation 𝑋 ̅= sample mean n=sample size
Population Standard Deviation Formula
σ "sigma"
EXAMPLE: In a random sample of 100 sale invoice the sample mean is $ 110.27 and a sample standard deviation of $28.95. Determine a 95% confidence interval for the mean amount of all the sale invoices.
𝑋 ̅=110.27, S=28.95, 𝑛=100 𝑑𝑓=99, 𝑡(𝛼/2)=𝑡_(.05⁄2)=𝑡_.025=1.9842 104.53≤𝜇≤116.01 Conclude with 95% confidence that the mean amount of all the sale invoices is between $104.53 and $116.01
EXAMPLE: Suppose the auditing procedures require you to have 95 % confidence in estimating population proportion of sales invoices with errors to within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Determine the sample size needed.
𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 MOE = .07 p = .15 n = 99.96 Therefore, you should select a sample size of 100 ALWAYS ROUND UP
EXAMPLE: An insurance company has the business objective of reducing the amount of time it takes to approve applications for life insurance. Suppose you want to estimate, with 95 % confidence, the population mean processing time to within ± 4 days. On the basis of a study conduction the previous year, you believe that the standard deviation is 25 days. Determine the sample size needed.
𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 MOE = 4 𝜎=25 =150.06 Therefore, you should select a sample of 151 applications. **Always round up to the next integer.
EXAMPLE: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. 80 students are randomly selected, and 15 say that they will come to the event. What is a 95% confidence interval for the proportion of all the university's students who will attend the event?
𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 𝑝 ̂=𝑋/𝑛=15/80=0.1875 𝐿=0.1875−1.96√((0.1875(1−0.1875))/80)=".102044" 𝑈=0.1875+1.96√((0.1875(1−0.1875))/80)="0.272956" Conclude with 95% confidence that the population proportion of all the university's students who will attend the event is between 0.1020 and 0.2730
EXAMPLE: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. 80 students are randomly selected, and 15 say that they will come to the event. What is a 90% confidence interval for the proportion of all the university's students who will attend the event?
𝑍_(𝛼/2)=𝑍_(.10⁄2)=𝑍_.05=1.45 𝑝 ̂=𝑋/𝑛=15/80=0.1875 𝐿=0.1875−1.645√((0.1875(1−0.1875))/80)="0.115778" 𝑈=0.1875+1.645√((0.1875(1−0.1875))/80)="0.259222"
Confidence Interval Estimate for the proportion
𝑍_(𝛼⁄2) = critical vale from the standardized normal distribution p̂ = sample proportion
C.I. estimate for the Mean (𝜎 unknown)
𝑠 = sample standard deviation 𝑡(𝛼⁄2)= is the critical t-value n=sample size
Sample Size Determination
𝜎 = population standard deviation 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution MOE or (E) = is the margin of error(sampling error)
Confidence Interval Estimate for the Mean (𝜎 known)
𝜎 = population standard deviation 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution n=sample size
EXAMPLE: A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. Determine a 95% confidence interval for the true mean resistance of the population.
𝜎 = population standard deviation is known! 𝑋 ̅=2.20, 𝜎=0.35, 𝑛=11 𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 𝐿=2.20−1.96 0.35/√11=2.20−1.96(0.1055)="1.9932" 𝑈=2.20+1.96 0.35/√11=2.20+1.96(0.1055)="2.4068" Conclude with 95% confidence that the population mean of resistance is between 1.9932 and 2.4068 ohms
EXAMPLE: Cereal plant Operations Manager (OM) must ensure that the mean weight of filled boxes is 368 grams to be consistent with the labeling on those boxes. To determine whether the mean weight is consistent with the expected amount of 368 grams, the OM selects a random sample of size 100 filled boxes that had a sample mean of 369.27 grams. Past experience states the standard deviation of the fill amount is 15 grams. Construct a 99% confidence interval estimate of the mean fill amount.
𝜎 = population standard deviation is known! 𝑋 ̅=369.27, 𝜎=15, 𝑛=100 𝑍_(𝛼/2)=𝑍_(.01⁄2)=𝑍_.005=2.58 369.27±2.58 15√100= "(365.40, 373.14)" Conclude with 99% confidence that the population mean is between 365.40 and 373.14 grams