Statistic Study Guide Final EXAM

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

If you know that the probability of committing a Type II error (β) is 5%, you can tell that the power of the test is A) 2.5%. B) 95%. C) 97.5%. D) unknown.

95%.

QUARTILES Q2

= observation at the 50th percentile (median of entire data set) (Will give you a rainked value)

Ethical Issues regarding confidence Intervals

A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate. The level of confidence should always be reported. The sample size should be reported. An interpretation of the confidence interval estimate should also be provided.

Categorical Nominal variable

A set of labels or names applied to groups composed of individuals with similar characteristics (They cannot be ordered) ex) Cellular provider -- responses: AT&T, Sprint, Verison ex) Type of Investment: Growth, Value, other

Determines what tail test used

Alternative hypothesis less than -- left greater than -- right

Null hypothesis

Always about parameter, never sample statistic

Central Limit Theorem

As the sample size (n) gets large enough, the sampling distribution of the sample mean becomes almost normal regardless of the shape of the population Properties of Central Limit Theorem For most distributions, regardless of shape of population, the sampling distribution of the mean is approximately normally distributed if 𝑛≥30 If the distribution of the population is fairly symmetrical, the sampling distribution of the mean is approximately for samples as small as 5. If the population is normally distributed, the sampling distribution of the mean is normally distributed, regardless of the sample size.

Critical Value

Level of confidence associated with confidence interval (proportion of times the parameter (true value) will be covered by the calculated CI using the correct methods)

Margin of error

Measure of how accurate the point estimate is

Variable Type: How long did the mobile app update take to download?

Numerical continuous

Variable Type: How many text messages have you sent in the past three days?

Numerical discrete

Collectively Exhaustive Events

One of the events must occur The set of events covers the entire sample space

General Multiplication Rule

P(A and B) = P(B) * P(A|B)

General Addition Rule

P(A or B) = P(A) + P(B) - P(A and B) If A and B are mutually exclusive, then P(A and B) = 0 So rule can be simplified, P(A or B) = P(A) + P(B)

If the variances are EQUAL in a two independent populations,

POOLED df = n1 + n2 - 2

Confidence Level

Probability that this method produces an interval that contains (covers) the parameter and associated critical value (e.g. z-score or t-score) The level of confidence is denoted (1−α)100%

For a Z-distribution, if the p-value is less than or equal to α

Reject Ho

P-value in terms of α

Reject Ho: If the P-value is less than or equal α Fail to reject Ho: if the P-value is greater than the α

Rules for Confidence intervals of dependent/Independent two sample tests

Rule #1: If the LL and UL are both greater than 0, this suggests that group A has a greater mean Rule #2: If LL and UL are both less than 0, this suggests that group B has a greater mean Rule: #3: If the LL is less than 0 and the UL is greater than 0, then neither group has a clear greater mean.

EXAMPLE: Cereal plant Operations Manager (OM) must ensure that the mean weight of filled boxes is 368 grams to be consistent with the labeling on those boxes. To determine whether the mean weight is consistent with the expected amount of 368 grams, the OM selects a random sample of size 100 filled boxes that had a sample mean of 369.27 grams. Past experience states the standard deviation of the fill amount is 15 grams. Based on the 95% confidence interval is there evidence to suggest that anything is wrong with the cereal filling process?

"(365.40, 373.14)" Because the interval includes 368, there is no evidence to suggest that anything is wrong with the cereal filling process.

𝛼

(1 - confidence level)

Power

(1-β) is the probability of rejecting H0 when it is false P(REJECTING a H0)

Dependent Sample Tests

(paired-sample test) compare scores on two different variables but for the same group of cases

What is the probability that at least two(2) new cars needs a warranty repair in the first 90 days? (Just write the equation)

(𝑋 ≥ 2) = 1 − (𝑃(𝑋 = 0) + 𝑃(𝑋 = 1))

Example Using Chebyshev Rule A population of 2-liter bottles of cola is known to have a mean fill-weight of 2.06 liter and a standard deviation of 0.02 liter. However, the shape of the population is unknown, and you cannot assume that it is bell-shaped. Describe the distribution of fill-weights.

(𝜇 − 𝜎 , 𝜇+ 𝜎) = 2.06 ±0.02 = (2.04 , 2.08) (𝜇 −2𝜎 , 𝜇+2𝜎) = 2.06 ±2(0.02) = (2.02 , 2.10) (𝜇 −3𝜎 , 𝜇+3𝜎) = 2.06 ±3(0.02) = (2.00 , 2.12) Is it very likely that a bottle will contain less than 2 liters of cola? Between 0% and 11.11% of the bottles will contain less than 2 liters

Example: The Health and Nutrition Examination Study of 1976-1980 (HANES) studied the heights of adults (aged 18-24) and found that the heights follow a normal distribution with the following: Women Mean (): 65.0 inches standard deviation (): 2.5 inches Men Mean (): 70.0 inches standard deviation (): 2.8 inches Find the proportion of men with heights between 67.2 inches and 72.8 inches. Using Empirical Rule

(𝜇−𝜎)=67.2 (𝜇+𝜎)=72.8 Proportion of men with heights are between 67.2 (µ - σ) inches and 72.8 (µ + σ) inches is 0.68 (68%) per the Empirical Rule.

Understanding Confidence Intervals

**A 95% confidence interval is formed under the knowledge: 95% of all the possible intervals based on every possible sample from the population Would cover the parameter and the other 5% would miss Twenty-five samples from the same population give these 95% confidence intervals. In the long run, 95% of all such intervals cover the true population proportion

Sample Size Determination for Proportion

**when you have no prior knowledge of 𝑝 set 𝑝=.50 𝑝= population proportion 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution MOE = is the margin of error(sampling error)

If two events are mutually exclusive, what is the probability that both occur at the same time? a. 0. b. 0.50. c. 1.00. d. Cannot be determined from the information given

0

The symbol for the confidence coefficient of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.

1 - α.

The symbol for the power of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.

1 - β.

Steps in Hypothesis Testing

1. Determine the null and alternative hypotheses 2. Select a level of significance, α 3. Compute the test statistic 4. Decision based on the critical value approach, p-value approach, or confidence interval 5. State the conclusion

Sales prices of cards have a mean sale price of $5.25 and a standard deviation of $2.80. Suppose a random sample of 100 cards are selected. 1. Describe the sampling distribution for the sample mean sale price of the selected cards? 2. What is the probability of the mean sale prices of cards are greater than $6.00?

1. Find U(x-bar) = $5.25 since it is normal distribution the population mean is equal to the the sampling distribution of the sample mean. σ(x-bar)= σ/ (the square root of the sample size) =.28 Bell shape since the sample size is greater than 30 Thus, the distribution is approximately normal with a mean of $5.25 and a standard error of $.28 2. P (X-bar is greater than $6.00) = 1 - .9963 .0037

Assumptions for independent hypothesis testing

Both samples must be randomly selected Observations within each sample must be independent Distributions of the sample mean must be normal

Controlling the probability of a Type I Error

By the choice of the significance level

Chebyshev Rule

Can't use the Empirical Rule for heavily skewed data sets States that for any data set, regardless of shape, the percentage of values found within k standard deviations of the mean must be at least: (𝜇 − 𝜎 , 𝜇+ 𝜎) at least 0% (𝜇 − 2𝜎 , 𝜇 + 2𝜎)at least 75% (𝜇 −3𝜎 , 𝜇+ 3𝜎)at least 88.89%

Variable Type: Do you have a facebook profile?

Categorical

Critical Value Approach

Compares the critical value with the test statistic Reject Ho: If the test statistic falls in the critical region Fail to reject Ho: If the test statistic does not fall in the critical region

Population Standard deviation

Computed by taking the square root of the population variance

How can we achieve a narrower confidence interval?

Decrease the level of confidence OR increase the sample size

Interval Scale Numerical Variable

Defined by distinct classes, magnitude, and equal intervals but no true zero points. Assumes that differences between scores of equal magnitude really represent equal differences in the variable measured Ex) Standardized exam scores Ex) Temperatures

Level of Significance (α)

Determines how much evidence against Ho we are require to reject Ho and find in favor of the alternative hypothesis, H1 It is the priability of rejecting the null hypothesis when the null hypothesis is true

Histogram

Displays a quantitative variable across different grouping of values Groups must cover the same range so have equal width Height used to compare the frequency of each range of values

right-tailed test

Equal versus greater than 𝐻𝑜 : parameter ≤ null value 𝐻1: parameter > null value

left-tailed test

Equal versus less than 𝐻𝑜 : parameter ≥ null value 𝐻1: parameter < null value

Two-tailed test

Equal versus not equal hypothesis 𝐻𝑜 : parameter = null value 𝐻1: parameter ≠ null value

Measurement Error

Errors not related to the act of selecting a sample (processing errors, poorly worded questions, deliberate inaccuracies in responses)

Non-response Error

Failure to collect data on all items in the sample

Type II Error

Failure to reject a false null hypothesis Probability of this error is β Can only occur if Ho is false

A new drug is advertised as being 80% effective. A consumer advocacy group thinks that it isn't that effective and is looking for evidence that it doesn't work well. What is the null hypothesis and the alternative hypothesis?

H0: p = 0.80 H1: p < 0.80

Qualities that increases or decreases the width of confidence intervals

Larger sample size -- narrower interval Lower level of confidence -- narrower interval Smaller sample size -- wider interval Higher level of confidence -- wider interval

Example 1: A doctor is researching side effects with a new pain medication. A clinical trial including random sample of 340 people who took a new pain relief medication reveals that 23 suffered some side effects. At the α=.05 level of significance, is there evidence that less than 10% of all patients who take the medication will experience side effects? Use the p-value approach.

H0: p ≥ .10 H1: p < .10 α=0.05, n=340 Calculate Test Statistic Zstat = -1.99 Determine Critical Value Zα =Z.05 = 1.645 Decision and Interpretation −1.99 < −1.645 Reject H0 There is sufficient evidence to conclude that that fewer than 10% of all patients taking this medication experience side effects. 3

Example 2: Gasoline pumped from a supplier's pipeline is supposed to have an octane rating of 87.5. A random sample of 13 days had the following octane readings. Is there evidence, at the .05 level of significance, that the mean octane reading differs from 87.5? (𝑋ത = 87.08, 𝑆 = 0.649)

H0: µ = 87.5 H1: µ ≠ 87.5 α=0.05, n=13 Calculate Test Statistic Tstat = -2.307 Determine Critical Value 𝑑𝑓 = 13 − 1 = 12, 𝑡𝛼/2 =𝑡.025= 2.1788 Decision and Interpretation −2.307 < −2.1788 Reject H there is sufficient evidence to reject the null hypothesis and to conclude that the long run mean octane reading differs from 87.5.

EXAMPLE: An environmentalist takes samples at a nearby river to study the average concentration level of a contaminant. He wants to find out, using a .10 level of significance, if the average concentration level exceeds the acceptable level for safely consuming fish from the river. Describe a Type I & Type II Error and potential consequences

H0: µ is at or lower than acceptable level HA: µ exceeds acceptable level Describe a Type I Error for this problem: Researcher determines that the concentration levels are too high when, in fact, they are safe. Potential consequence: People are not allowed to fish in the river when it is really safe to fish. Describe a Type II Error for this problem: Researcher determines that the concentration levels are NOT too high, and it is safe to fish when it is really NOT safe to fish. Potential consequence: People consume contaminated fish.

Example: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. Data is collected to see if there is evidence, at the .05 level of significance, that less than 30% of students will attend this year? Describe Type I & II Errors

H1: p<0.30 What parameter am I interested in? Proportion Describe a Type I Error for this problem: Committee determines that less than 30% of students will attend the event, but in fact, 30% WILL attend the event. Potential consequence: The committee reserves a space that is too small. Describe a Type II Error for this problem: Committee determines there is no evidence that less than 30% of students will attend but, in fact, less than 30% attend the event. Potential consequence: The committee reserves a space that is too large.

Locating Extreme Outliers from Z-Score

If Z-score is positive, ABOVE the mean If Z-score is negative, BELOW the mean A data value is considered an extreme outlier if its Z-score is less than -3.0 or greater than +3.0. The larger the absolute value of the Z-score the farther the data value is from the mean

Sampling Distribution of the Mean

If a population is normal with a mean (μ) and standard deviation (σ) the sampling distribution of x̅ is also normally distributed with mean (μx = μ)

Summarizing Independent Events

If one of the following are true, all are true P(A|B) = P(A) P(B|A) = P(B) P(A and B) = P(A) x P(B)

Rule of Thumb

If the larger sample standard deviation is more than twice the smaller sample standard deviation, then perform the T-test using the UNPOOLED method

Fail to Reject Ho (the null hypothesis)

If the test statistic does not fall in the critical region

Reject Ho (the null hypothesis)

If the test statistic falls in the critical region

Regression Line

Independent variable = X Dependent Variable = Y Changes in Y are ASSUMED to be related to changes in X ** The Simple linear regression equation predicts an estimate for the population regression line

Confidence interval

Interval containing the "most believable" values for a parameter (provides additional information about the variability of the estimate) Takes into account MOE (margin of error) or sampling error Constructed by using a point estimate and adding and subtracting the margin of error (that is, the critical z-score times the standard error) Point estimate ± margin of error. ESTIMATES OF THE POPULATION

Ratio Scale Numerical Variable

It is defined by distinct classes, magnitude and, equal intervals, but has a true zero point. Assumes that differences between scores of equal magnitude really represent equal differences in the variable measured Ex) age, cost of a computer

Non-probability Samples

Items are chosen without regard to their probability of occurrence. Either through Judgement (collect a sample that an expert thinks is representative of the population) or Convenience (collect the sample that is easiest to access)

Rules of Quartiles

Rule 1: If the ranked value is a whole number, the quartile is equal to the measurement that corresponds to the ranked value Rule 2: If the ranked value is a fractional half (2.5, 3.5,5.5, etc), the quartile is equal to the average of the measurements that corresponds to the two ranked values Rule 3: If the ranked value is neither a whole number or fractional half, the quartile is equal to the measurement that corresponds to the rounded nearest integer.

Sample standard deviation symbol

S

Contingency Table

Shows the values of the data categories for more than one variable and the frequencies or proportions/percentages for each of the Joint Responses

Point Estimate

Single value that serves as an estimate of a population parameter

Hypothesis

Statement regarding a characteristic of one or more populations

The effect of the sample size (n) on σx

Taking a larger sample results in less variability in the sample means from sample to sample As n increases, σx decreases Resulting in a more taller and narrower graph

What affects the margin of error?

The level of confidence which determines the value of Z Standard error which is a function of sample size

The probability that a new advertising campaign will increase sales is assessed as being 0.80. The probability that the cost of developing the new ad campaign can be kept within the original budget allocation is 0.40. If the two events are independent, the probability that the cost is kept within budget and the campaign will increase sales is: a. 0.20 b. 0.32 c. 0.40 d. 0.88

Using the multiplication rule for independent events 𝑃(𝐴 𝑎𝑛𝑑 𝐵)=𝑃(𝐴)𝑃(𝐵) .80 x .40 = .32

How large is "large enough" for the sampling distribution of p?

The shape of the sampling distribution of 𝑝 is approximately normal provided 𝑛𝑝≥5 and 𝑛(1−𝑝)≥5

Which of the following statements is not true about the level of significance in a hypothesis test? A) The larger the level of significance, the more likely you are to reject the null hypothesis. B) The level of significance is the maximum risk we are willing to accept in making a Type I error. C) The significance level is also called the α level. D) The significance level is another name for Type II error.

The significance level is another name for Type II error.

The Standard error of the Proportion

The standard deviation of the sampling distribution p = population proportion

Fail to reject (Do not reject) the null hypothesis

There is insufficient evidence to support the alternative hypothesis

Reject the null hypothesis:

There is sufficient evidence to support the alternative hypothesis

Measures of Variation

Total Sum of Squares = regression sum of squares + error sum of squares

True or False: Suppose, in testing a hypothesis about a mean, the p-value is computed to be 0.043. The null hypothesis should be rejected if the chosen level of significance is 0.05.

True

True/False If two events are mutually exclusive and collectively exhaustive, the probably that one or the other occurs is 1

True

True/False the larger value of S, the more spread out the variable or data are

True

True/false The mean is strongly affected by extreme value The median is less sensitive than the mean to extreme values

True, True

If a researcher rejects a true null hypothesis, she has made a(n) ________ error.

Type I

If a researcher does not reject a false null hypothesis, she has made a(n) ________ error.

Type II

The difference between hypothesized parameters and its true value increase..... Type II Error (β) _______________

Type II Error (β) increases

When alpha (α) (probability of Type I Error) decreases

Type II Error (β) increases

When population standard deviation (σ) increases, the probability of Type II Error (β) _______________

Type II Error (β) increases

When the sample size decreases, the probability of Type II Error (β) _______________

Type II Error (β) increases

If the variances are UNEQUAL in a two independent populations,

UNPOOLED

Percentage Polygon

Uses midpoints of each class and can combine data from two groups to allow easier comparison

Cumulative Percentage Polygon

Uses the cumulative percentage distribution (lower limits) to play the cumulative percentages along the Y axis

Suppose we wish to test H0 : μ ≤ 47 versus H1 : μ > 47. What will result if we conclude that the mean is greater than 47 when its true value is really 52? A) We have made a Type I error. B) We have made a Type II error. C) We have made a correct decision. D) None of the above are correct.

We have made a correct decision.

Confidence Interval Approach for Hypothesis testing

When testing a null hypothesis for a two tailed test...... If the confidence interval CONTAINS the null value, we DO NOT REJECT the null hypothesis IF the confidence interval DOES NOT contain the null value, we REJECT the null hypothesis

Empirical Rule for Normal Distributions

Within 1 std dev of the mean ~ 68% Within 2 std dev of the mean ~ 95% Within 3 std dev of the mean ~ 99.7%

Confidence Interval Conclusion

You can be ______% confident that the population proportion of all ______________ who _________________ lies within the interval ___________ and ________________.

Confidence level 50% What is Z-critical value?

Z- Critical Value .67

Confidence level 70% What is Z-critical value?

Z-Critical Value 1.04

Confidence level 80% What is Z-critical value?

Z-Critical Value 1.28

Confidence level 90% What is Z-critical value?

Z-Critical Value 1.645

Confidence level 95% What is Z-critical value?

Z-Critical Value 1.96

Confidence level 99% What is Z-critical value?

Z-Critical Value 2.58

Confidence level 60% What is Z-critical value?

Z-Critical Values 0.84

Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the Z-score for a test score of 120. Find Z-score and state above or below the mean and whether it is an outlier or not

Z-score = -3.7 standard deviations below the mean Would be considered an outlier and below the means

If an economist wishes to determine whether there is evidence that mean family income in a community exceeds $50,000, A) either a one-tail or two-tail test could be used with equivalent results. B) a one-tail test should be utilized. C) a two-tail test should be utilized. D) None of the above.

a one-tail test should be utilized.

Hypothesis testing

a procedure that checks sample data against a claim or assumption about the population

Variable

a property of an object or event that can take on different values

Categorical Ordinal Variable

a set of labels applied to groups composed of individuals with similar characteristics where the labels indicated more or less of a quality or CAN BE RANK ORDERED ex) Student class designation: freshman, sophomore, Junior, Senior

Critical Value

a table value based on the sampling distribution of the point estimate and the desired level of confidence

It is possible to directly compare the results of a confidence interval estimate to the results obtained by testing a null hypothesis if A) a two-tail test for μ is used. B) a one-tail test for μ is used. C) Both of the previous statements are true. D) None of the previous statements is true.

a two-tail test for μ is used.

If an economist wishes to determine whether there is evidence that mean family income in a community equals $50,000, A) either a one-tail or two-tail test could be used with equivalent results. B) a one-tail test should be utilized. C) a two-tail test should be utilized. D) None of the above.

a two-tail test should be utilized.

Which of the following about the binomial distribution is not a true statement? a) The probability of the event of interest must be constant from trial to trial. b) Each outcome is independent of the other. c) Each outcome may be classified as either "event of interest" or "not event of interest." d) The variable of interest is continuous.

d) The variable of interest is continuous.

Which of the following statements is true for the normal distribution? a. The highest point occurs at 𝜇. b. It has a mean of 𝜇 and a standard deviation of 𝜎. c. It has inflection points at 𝜇 − 𝜎 and 𝜇 + 𝜎. d. All the above are true.

d. All the above are true.

If two events are mutually exclusive, what is the probability that one or the other occurs? a. 0. b. 0.50. c. 1.00. d. Cannot be determined from the information given

d. Cannot be determined from the information given

Independent Variable (aka predictor, explanatory, variable)

a variable that, according to theory, has a casual influence on the dependent variable

In its standardized form, the normal distribution.... a) has a mean of 0 and a standard deviation of 1. b) has a mean of 1 and a variance of 0. c) has an area equal to 0.5. d) cannot be used to approximate discrete probability distributions.

a) has a mean of 0 and a standard deviation of 1.

Collectively exhaustive

all data values must be recorded in the categories created

Event

any collection of outcomes from the experiment

Which of the following about the binomial distribution is a true statement? a. the variable X is continuous b. the probability of event of interest 𝑝 is stable from trial to trial. c. the number of trials n must be at least 30. d. the results of one trial are dependent on the results of the other trials

b. the probability of event of interest 𝑝 is stable from trial to trial.

r

called the sample coefficient of correlation ranges from -1 to 1 The closer r is to 0, the weaker the relationship. the closer r is to -1 or 1, the stronger the relationship Strength of association Small (.1 to .3) Medium (.3 to .5) Large. (.5 to 1) (Side note: p is the population coefficient of correlation)

Coverage Error

certain groups are excluded from the sampling frame Results in selection bias

r^2

coefficient of determination It is the proportion of the total variation in the dependent variable (Y-axis) that is explained by the variation in the independent variable (X-axis)

Independent Sample test

compare the scores on the same variable but for two different groups

Marginal probability

consists of a set of joint probabilities

If a researcher does not reject a true null hypothesis, she has made a(n) ________ decision.

correct

If a researcher rejects a false null hypothesis, she has made a(n) ________ decision.

correct

A ________ is a numerical quantity computed from the data of a sample and is used in reaching a decision on whether or not to reject the null hypothesis. A) significance level B) critical value C) test statistic D) parameter

critical value

The value that separates a rejection region from a non-rejection region is called the ________.

critical value

As the sample size increases, the standard error of the mean (the standard deviation of the sampling distribution) ______________

decreases

As the standard error ______________, the values become more concentrated around the mean

decreases

Measure of Variation

describe the spread or variability or dispersion of the data for a particular variable Range Interquartile Range Variance Standard deviation

Mutually exclusive

each data value is placed in one and only one category

mutually exclusive events

events that cannot happen at the same time

For Z-distribution, if the p-value is greater than the α

fail to reject Ho

True or False: "What conclusions and interpretations can you reach from the results of the hypothesis test?" is not an important question to ask when performing a hypothesis test.

false

True or False: In a hypothesis test, it is irrelevant whether the test is a one-tail or two-tail test.

false

True or False: In instances in which there is insufficient evidence to reject the null hypothesis, you must make it clear that this has proven that the null hypothesis is true.

false

True or False: Suppose, in testing a hypothesis about a mean, the Z test statistic is computed to be 2.04. The null hypothesis should be rejected if the chosen level of significance is 0.01 and a two-tail test is used.

false

True or False: Suppose, in testing a hypothesis about a mean, the p-value is computed to be 0.034. The null hypothesis should be rejected if the chosen level of significance is 0.01.

false

True or False: The larger the p-value, the more likely you are to reject the null hypothesis.

false

True or False: You should report only the results of hypothesis tests that show statistical significance and omit those for which there is insufficient evidence in the findings.

false

Sampling Distribution of Proportion

follows the binomial distribution

Skewed to the left

if the left "tail" extends much farther out than the right tail *The mean is less than the median

If a test of hypothesis has a Type I error probability (α) of 0.01, it means that A) if the null hypothesis is true, you don't reject it 1% of the time. B) if the null hypothesis is true, you reject it 1% of the time. C) if the null hypothesis is false, you don't reject it 1% of the time. D) if the null hypothesis is false, you reject it 1% of the time.

if the null hypothesis is true, you reject it 1% of the time.

Skewed to the right

if the right "tail" extends much farther out than the left tail *The mean is greater than the median

The probability that the sample mean will fall close to the population mean will always ____________- when the sample size increases

increase

The probability distribution of proportion becomes more peaked when the sample size

increases

Probability Sample

items in the sample are chosen on the basis of known probabilities 4 Types: Simple Random: sample is chose in such a way that every subject is equally likely to be selected for the study Systematic: uses a systematic method k=N/n (i.e. n groups of k items such as, every 10th person) to select the sample Stratified: divide frame into groups (strata). Take a simple random sample from each strata Cluster: divide N items in the frame into clusters and take a random sample of the clusters. Study all items in the cluster

Categorical Data

labels or names used to identify categories of like items MUST BE USED AS A PROPORTIONS p̂ = sample proportion p = population proportion Assumptions: Population with a fixed proportion Random sample from population np has to be greater than or equal to 5 and n(1-p) has to be greater than or equal to 5 THE MEAN OF THE SAMPLES WILL BE EQUAL TO THE POPULATION PROPORTION Up = p

Range

largest value - smallest value

Standard Deviation

measures the average distance of an observation from the mean computed by taking the square root of the sample variance (Controls the spread of the graph)

Variance

measures the average of the squared deviations of each observation from the mean sample variance S^2

SSR

measures the explained variation between X and y

SSE

measures the unexplained variation between X and Y

If, as a result of a hypothesis test, you reject the null hypothesis when it is false, then you have committed A) a Type II error. B) a Type I error. C) no error. D) an acceptance error.

no error.

Z-score

number of standard deviations a data value is from the mean

QUARTILES Q1

observation at the 25th percentile (Will give you a rainked value)

QUARTILES Q3

observation at the 75th percentile (Will give you a rainked value)

Alternative Hypothesis

opposite of null hypothesis Challenges the status quo • Never contains the "=", or "≤", or "≥" sign - If H0 contains "=" --> H1 must contain "≠" - If H0 contains "≤" --> H1 must contain ">" - If H0 contains "≥" --> H1 must contain "<" • Is generally the hypothesis that the researcher is trying to prove

Z- value for Sampling Distribution of the Proportion

p̂ = sample proportion p = population proportion

Z- value for Sampling Distribution of the Proportion (test statistic)

p̂ = sample proportion p = population proportion

Conditional Probability

refers to the probability of event A, given information about the occurrence of another event, B

Sampling Error

reflects the "chance differences" Cause by the act of taking a sample and make the results from a sample different from those of a census (Margin of error)

The power of a test is measured by its capability of

rejecting a null hypothesis that is false

Type I Error

rejecting a true null hypothesis The probability of this error is equal to α (Significance Level) Can only occur if Ho is true

point estimate

sample statistic

Pareto Chart

series of vertical bars showing tallies/frequencies/percentages in descending order *Helps identify the important "few" from the less important "many"

Scatter plot

shows the relationship between two quantitative variables measured on the same individuals X-axis: independent (doing the explaining) Y-axis: dependent (one being explained)

Summary Table

shows the values of the data categories for ONE variable and the frequency (counts) or proportions/percentages for each category

Standard error of the mean

standard deviation of all possible sample means Is the standard deviation of the point estimate

null hypothesis

states the claim of the assertion to be tested ALWAYS about a population parameter, not a sample statistic • Always contains "=", or "≤", or "≥" sign The null hypothesis is assumed true until evidence indicates otherwise ASSUMPTION of true H0 may or may not be REJECTED • BUT the ASSUMPTION is NEVER ACCEPTED

Parameter

summarizes the value of a specific variable for a population

Statistic

summarizes the value of a specific variable for sample data

If the Type I error (α) for a given test is to be decreased, then for a fixed sample size n, A) the Type II error (β) will also decrease. B) the Type II error (β) will increase. C) the power of the test will increase. D) a one-tail test must be utilized.

the Type II error (β) will increase.

Population Variance

the average of the squared deviations of each observation from the POPULATION mean

Sample Space

the collection of all possible outcomes

sampling distribution

the distribution of all the possible values of a sample statistic for a given sample size selected from a population

Skewness

the extent to which the data values are not symmetrical around the mean

Which of the following would be an appropriate null hypothesis?

the mean of the population is equal to 55 the population proportion is not less than .65

Which of the following would be an appropriate alternative hypothesis?

the mean of the population is greater than 55 the population proportion is less than .65

Mode

the most frequent observation of the variable that occurs in the data set

If the p-value is less than α in a two-tail test,

the null hypothesis should be rejected.

p-value

the probability of getting a test statistics to equal or be more extreme than the sample result, given that the null hypothesis, Ho is true To summarize the amount of evidence we have against the null hypothesis *** The smaller the p-value, the more evidence against the null hypothesis

Joint probability

the probability of occurrence involving two or more events

Simple probability

the probability of occurrence of a simple event in which each outcome is equally likely to occur

The power of a statistical test is A) the probability of not rejecting H0 when it is false. B) the probability of rejecting H0 when it is true. C) the probability of not rejecting H0 when it is true. D) the probability of rejecting H0 when it is false.

the probability of rejecting H0 when it is false.

Pointe Estimate

the sample statistic estimating the population parameter of interest For example the sample mean, x̅ is a point estimate of the population mean 𝜇. the sample proportion,p̂, is a point estimate of the population proportion 𝑝. (Doesn't show "how close" the estimate is to the parameter)

Standard Error

the standard deviation of the sample statistic

Median

the value that lies in the middle of the data when arranged in ascending order Rule 1: If the number of values is odd, the median is the measurement associated with the middle ranked value Rule 2: If the number of values is even, the median is the measurement associated with the average of the two middle-ranked values.

Dependent Variable (aka outcome, response, predicted variable)

the variable that is of greatest substantive interest to the researcher -- the variable with real world implications

independent variable

the variable used to predict or explain the dependent variable

Dependent Variable

the variable we wish to predict or explain Always on the y-axis

𝑍_ 𝛼 (pronounced "zsub alpha")

the z-score such that the area under the standard normal curve to the right of 𝑍_𝛼 is 𝛼.

True or False: "Is the intended sample size large enough to achieve the desired power of the test for the level of significance chosen?" should be among the questions asked when performing a hypothesis test.

true

True or False: A proper methodology in performing hypothesis tests is to ask whether a random sample can be selected from the population of interest.

true

True or False: In conducting research, you should document both good and bad results.

true

True or False: In instances in which there is insufficient evidence to reject the null hypothesis, you must make it clear that this does not prove that the null hypothesis is true.

true

True or False: In testing a hypothesis, you should always raise the question concerning the purpose of the study, survey or experiment.

true

True or False: The smaller the p-value, the stronger the evidence is against the null hypothesis.

true

True or False: The statement of the null hypothesis always contains an equality.

true

True or False: The test statistic measures how close the computed sample statistic has come to the hypothesized population parameter.

true

A priori probability

type of probability based on prior knowledge of the process (theoretical) Ex) coin toss, roll a die, draw a card

empiricle probability

type of probability based on the observed data

Subjective probability

type of probability that differs from person to person

If you know that the level of significance (α) of a test is 5%, you can tell that the probability of committing a Type II error (β) is A) 2.5%. B) 95%. C) 97.5%. D) unknown.

unknown.

Regression Analysis

used to predict the value of at dependent variable based on the value of at least one independent variable Explains the impact of changes in an independent variable on the dependent variable

Quanatative data

uses numbers MUST BE USED AS A MEAN ( x̅ ) Conditions 1. If population is bell shaped (normal symmetrical), random sample of any size 2. If population is not bell shaped, a large random sample must be greater than 30 THE MEAN OF THE SAMPLE MEANS WILL BE THE POPULATION MEAN Ux̅ = U The standard deviation of the sample means = to the population standard deviation divided by the square root of the sample size

For a given level of significance (α), if the sample size n is increased, the probability of a Type II error (β) A) will decrease. B) will increase. C) will remain the same. D) cannot be determined.

will decrease.

For a given sample size n, if the level of significance (α) is decreased, the power of the test A) will increase. B) will decrease. C) will remain the same. D) cannot be determined.

will decrease.

sampling proportion

x = number of items having the characteristic of interest

A Type II error is committed when

you don't reject a null hypothesis that is false

A Type I error is committed when

you reject a null hypothesis that is true

The symbol for the level of significance of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.

α.

The symbol for the probability of committing a Type I error of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.

α.

The symbol for the probability of committing a Type II error of a statistical test is A) α. B) 1 - α. C) β. D) 1 - β.

β.

Population Mean

μ

Z- Value Sampling Distribution of the Mean

μ = population mean σ = population standard deviation 𝑋 ̅= sample mean

t distribution

μ = the population mean 𝑠 = the sample standard deviation 𝑋 ̅= sample mean n=sample size

Population Standard Deviation Formula

σ "sigma"

EXAMPLE: In a random sample of 100 sale invoice the sample mean is $ 110.27 and a sample standard deviation of $28.95. Determine a 95% confidence interval for the mean amount of all the sale invoices.

𝑋 ̅=110.27, S=28.95, 𝑛=100 𝑑𝑓=99, 𝑡(𝛼/2)=𝑡_(.05⁄2)=𝑡_.025=1.9842 104.53≤𝜇≤116.01 Conclude with 95% confidence that the mean amount of all the sale invoices is between $104.53 and $116.01

EXAMPLE: Suppose the auditing procedures require you to have 95 % confidence in estimating population proportion of sales invoices with errors to within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Determine the sample size needed.

𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 MOE = .07 p = .15 n = 99.96 Therefore, you should select a sample size of 100 ALWAYS ROUND UP

EXAMPLE: An insurance company has the business objective of reducing the amount of time it takes to approve applications for life insurance. Suppose you want to estimate, with 95 % confidence, the population mean processing time to within ± 4 days. On the basis of a study conduction the previous year, you believe that the standard deviation is 25 days. Determine the sample size needed.

𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 MOE = 4 𝜎=25 =150.06 Therefore, you should select a sample of 151 applications. **Always round up to the next integer.

EXAMPLE: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. 80 students are randomly selected, and 15 say that they will come to the event. What is a 95% confidence interval for the proportion of all the university's students who will attend the event?

𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 𝑝 ̂=𝑋/𝑛=15/80=0.1875 𝐿=0.1875−1.96√((0.1875(1−0.1875))/80)=".102044" 𝑈=0.1875+1.96√((0.1875(1−0.1875))/80)="0.272956" Conclude with 95% confidence that the population proportion of all the university's students who will attend the event is between 0.1020 and 0.2730

EXAMPLE: A planning committee needs to estimate the percentage of students at a large university who will attend an upcoming event so that they can determine an appropriate location for the event. 80 students are randomly selected, and 15 say that they will come to the event. What is a 90% confidence interval for the proportion of all the university's students who will attend the event?

𝑍_(𝛼/2)=𝑍_(.10⁄2)=𝑍_.05=1.45 𝑝 ̂=𝑋/𝑛=15/80=0.1875 𝐿=0.1875−1.645√((0.1875(1−0.1875))/80)="0.115778" 𝑈=0.1875+1.645√((0.1875(1−0.1875))/80)="0.259222"

Confidence Interval Estimate for the proportion

𝑍_(𝛼⁄2) = critical vale from the standardized normal distribution p̂ = sample proportion

C.I. estimate for the Mean (𝜎 unknown)

𝑠 = sample standard deviation 𝑡(𝛼⁄2)= is the critical t-value n=sample size

Sample Size Determination

𝜎 = population standard deviation 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution MOE or (E) = is the margin of error(sampling error)

Confidence Interval Estimate for the Mean (𝜎 known)

𝜎 = population standard deviation 𝑍_(𝛼⁄2)= is the critical value from the standardized normal distribution n=sample size

EXAMPLE: A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. Determine a 95% confidence interval for the true mean resistance of the population.

𝜎 = population standard deviation is known! 𝑋 ̅=2.20, 𝜎=0.35, 𝑛=11 𝑍_(𝛼/2)=𝑍_(.05⁄2)=𝑍_.025=1.96 𝐿=2.20−1.96 0.35/√11=2.20−1.96(0.1055)="1.9932" 𝑈=2.20+1.96 0.35/√11=2.20+1.96(0.1055)="2.4068" Conclude with 95% confidence that the population mean of resistance is between 1.9932 and 2.4068 ohms

EXAMPLE: Cereal plant Operations Manager (OM) must ensure that the mean weight of filled boxes is 368 grams to be consistent with the labeling on those boxes. To determine whether the mean weight is consistent with the expected amount of 368 grams, the OM selects a random sample of size 100 filled boxes that had a sample mean of 369.27 grams. Past experience states the standard deviation of the fill amount is 15 grams. Construct a 99% confidence interval estimate of the mean fill amount.

𝜎 = population standard deviation is known! 𝑋 ̅=369.27, 𝜎=15, 𝑛=100 𝑍_(𝛼/2)=𝑍_(.01⁄2)=𝑍_.005=2.58 369.27±2.58 15√100= "(365.40, 373.14)" Conclude with 99% confidence that the population mean is between 365.40 and 373.14 grams


Kaugnay na mga set ng pag-aaral

Series 7 part two Units (6-8) Prac quiz

View Set

Chapter 17 - Process Costing Review

View Set

Частини мови: значення, питання, приклади, лайфхаки

View Set

Product Owner Open (Practice) pt 2

View Set