BUAL 2650 Final

Ace your homework & exams now with Quizwiz!

the sampling distribution of t depends on the sample size n, the t-stat has

(n-1) degrees of freedom(df)

Chapter 5: Sampling Distribution

..

Chapter 9: Design of Experiments and Analysis of Variance

..

Chapter 11: Simple Linear Regression

...

Chapter 12: Multiple Regression

...

Chapter 6: Inferences Based on a Single Sample

...

Chapter 7: Inferences Based on a Single Sample

...

Conditions Required for a Valid Large-Sample Hypothesis Test for p

1. A random sample is selected from a binomial population. 2. The sample size n is large. (This condition will be satisfied if both np0 ≥ 15 and nq0 ≥ 15.)

Conditions Required for a Valid Small-Sample Hypothesis Test for µ

1. A random sample is selected from the target population. 2. The population from which the sample is selected has a distribution that is approximately normal.

Conditions Required for a Valid Hypothesis Test for σ^2

1. A random sample is selected from the target population. 2. The population from which the sample is selected has a distribution that is approximately normal. (we must assume that the population from which the sample is selected has an approx. normal distribution. Unlike small-sample tests for µ based on the t-statistic, slight to moderate departures from normally will render the chi-square test invalid.)

Conditions Required for a Valid Small-Sample Confidence Interval of μ

1. A random sample is selected from the target population. 2. The population has a relative frequency distribution that is approximately normal.

Conditions Required for a Valid Confidence Interval for σ^2

1. A random sample is selected from the target population. 2. The population of interest has a relative frequency distribution that is approximately normal.

Conditions Required for a Valid Large-Sample Hypothesis Test for µ

1. A random sample is selected from the target population. 2. The sample size n is large (i.e., n ≥ 30). (Due to the Central Limit Theorem, this condition guarantees that the test statistic will be approximately normal regardless of the shape of the underlying probability distribution of the population.)

Conditions Required for a Valid Large-Sample Confidence Interval for μ

1. A random sample is selected from the target population. 2. The sample size n is large (i.e., n ≥ 30). Due to the Central Limit Theorem, this condition guarantees that the sampling distribution of is approximately normal. Also, for large n, s will be a good estimator of σ .

Conditions Required for a Valid Large-Sample Confidence Interval for p

1. A random sample is selected from the target population. 2. The sample size n is large. (This condition will be satisfied if both np-hat and nq-hat ≥15. Note that np-hat and nq-hat are simply the number of successes and number of failures, respectively, in the sample.).

Conditions Required for Valid Small-Sample Inferences about µd

1. A random sample of differences is selected from the target population of differences. 2. The population of differences has a distribution that is approximately normal.

One-tailed test

A one-tailed test of hypothesis is one in which the alternative hypothesis is directional and includes the symbol " < " or " >." Upper-tailed (>): "greater than," "larger," "above" Lower-tailed (<): "less than," "smaller," "below"

Two-Tailed Test

A two-tailed test of hypothesis is one in which the alternative hypothesis does not specify departure from H0 in a particular direction and is written with the symbol " ≠." Some key words that help you identify this nondirectional nature are: Two-tailed (≠): "not equal to," "differs from"

Type 1 Error

occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis, when in fact, H0 is true. (reject null when true) The probability of committing a Type 1 error is denoted by a.

One way vs Two way

one way: 1 independent variable 2 levels two way: 2 independent variable ∞ levels

Confidence Level

the confidence coefficient expressed as a percentage. Example: if our confidence level is 95%, then in the long run, 95% of our confidence intervals will contain μ and 5% will not

When σ is unknown and 'n' is large (n≥ 30),

the confidence interval is approx. equal to where 's' os the sample standard deviation

Sampling Distribution

the distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic

Null Hypothesis

the hypothesis that will be accepted unless the data provides convincing evidence that it is false.

Alternative Hypothesis

the hypothesis which will be accepted only if the data provides convincing evidence of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support.

Experimental unit

the object on which the response and factors are observed or measured

T-stat vs Z-stat

the primary difference between the sampling distributions of t and z is that the t-stat is more variable than the z, which follows intuitively when you realize that t contains two random quantities (x-bar and s) whereas z contains only one (x-bar)

If a random sample of n observations is selected from a population with a normal distribution then what?

the sampling distribution of x-bar will be a normal distribution

Single-factor experiments

the treatments were levels of a single factor, with the sampling of experimental units performed using either a completely randomized or a randomized block deisng.

Target Parameter

the unknown population parameter that we are interested in estimating

Dependent variable / response variable

the variable of interest to be measured in the experiment

Values of F well in excess of 1 indicate that

the variation among treatment means well exceeds that within treatments and therefore support the alternative hypothesis that the population treatment means differ

Pooled Variance

the weighted average for evaluating the variances of two independent variables where the mean can vary between samples but the true variance remains the same

Qualitative factors

those that are not (naturally) measured on a numerical scale

All chi-squared curves are skewed

to the right with a mean equal to the degree of freedom

If the sampling distribution of a sample statistic has a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be an _____ _____ of the parameter

unbiased estimate

Randomized Block Design

uses experimental units that are matched sets, assigning one from each set to each treatment. (the matched sets of experimental units are called blocks)

Independent variable/factors

variables whose effect on the response is of interest to the experimenter.

y= ß0 + ß1x

y: dependent variable x: independent variable ß0: least squares estimate of intercept ß1: least squares estimate of slope

x-bar is the minimum-variance unbiased estimator (MVUE) of

µ p-hat is the MVUE of p

what can you conclude if a confidence interval for (μ1 - μ2) contains 0?

If a confidence interval for (μ1 - μ2) contains 0, then there will be no evidence to support claim.(there is no difference between the groups)

Theorem 5.1

If a random sample of n observations is selected from a population with a normal distribution, the sampling distribution of x-bar will be a normal distribution.

what can you conclude if a confidence interval for ud (paired difference test) includes 0

If the confidence interval includes 0, you can conclude that the means differ

p-value =

P (z > 2.12) ( when z=2.12 )

p-value

Probability of obtaining a test statistic more extreme (≤ or ≥) than actual sample value, given H0 is true Called observed level of significance •Smallest value of a for which H0 can be rejected Used to make rejection decision •If p-value ≥ a, do not reject H0 •If p-value < a, reject H0

Power of the test

Probability of rejecting false H0 •Correct decision Equal to 1 - ß Used in determining test adequacy Affected by • True value of population parameter • Significance level a • Standard deviation & sample size n

p-value < alpha:

Reject the null

If a value uses a sample standard deviation in the calculation then what is it

Standard Error

ANOVA F-test

Tests the equality of two or more (k) population means Variables One nominal scaled independent variable - Two or more (k) treatment levels or classifications - One interval or ratio scaled dependent variable Used to analyze completely randomized experimental designs - F = MST/ MSE

Observational vs Designed study example

The analyst cannot control the assignment of the brand to each golf ball (observational) BUT he or she can control the assignment of each ball to the position in the striking sequence (designed)

If the mean of the sampling distribution is not equal to the parameter, the statistic is said to be a ____ _____ of the parameter

biased estimate

Sum of Squares for Treatments (SST)

calculated by squaring the distance between each treatment mean and the overall mean of all sample measurements, then multiplying each squared difference by the number of sample measurements for the treatment, and finally adding the results over all treatments

Chapter 10: Categorical Data Analysis

chapter 9 end notes: pg. 560

Paired t-test (dependent t-test)

compares the means of two related groups to determine whether there is a statistically significant difference between the means

p-value > alpha:

fail to reject the null

The value Za is defined as the value of the standard normal random variable 'z' such that the area 'a' will lie to its right

for a confidence coefficient of .90, we have (1-a) =.90, a=.10 and a/2= .05; z.05= 1.645

P-value (or observed significance level)

for a specific statistical test is the probability (assuming H0 is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis, as the actual one computed from the sample data.

One-way table

having only one variable classified. typically we want to make inferences about the true proportions that occur in the k categories based on the sample information in the one way table

Biased estimate

if the mean of the sampling distribution of a sample statistic is not equal to the population parameter, it's biased. (mean ≠ population parameter = biased)

Unbiased estimate

if the sampling distribution of a sample statistic has a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be unbiased (mean = population parameter = unbiased)

The standard deviation of the sample distribution decreases as the sample size

increases

Complete factorial experiment (two-way classification)

is one in which every factor-level combination is employed-that is, the number of treatments in the experiment equals the total number of factor-level combinations. (ex: factor A is to be investigated at a levels, and factor B at b levels. ab treatments should be included in the experiment)

The power of a test

is the probability that the test will correctly lead to the rejection of the null hypothesis for a particular value of µ or p in the alternative hypothesis. The power is equal to (1 - ß) for the particular alternative considered.

Rejection Region

is the set of possible values of the test statistic for which the researcher will reject H0 in favor of Ha.

Blocking

making comparisons within groups of similar experimental units

Goodness of fit

means how well a statistical model fits a set of observations

Quantitative factors

measured on a numerical scale

chi-squared test

measures the degree of disagreement between the data and the null hypothesis

Sum of Squares for Error (SSE)

measures the variability around the treatment means that is attributed to sampling error

What's the general guideline for the normal distribution to be justified?

n is greater than or equal to 30

ANOVA measures

Completely randomized experimental designs; analysis of variance; ANOVA test finds out if experiment results are significant

Test stat for chi squared

X^2= SUM (Observed - Expected)^2 / Expected)

Factor Levels

are the values of the factor that are used in the experiment Ex: the levels of gender are Male & Female

Conditions Required for Valid Large-Sample Inferences about µd

1. A random sample of differences is selected from the target population of differences. 2. The sample size nd is large (i.e., nd ≥ 30); due to the Central Limit Theorem, this condition guarantees that the test statistic will be approximately normal regardless of the shape of the underlying probability distribution of the population.

Conditions Required for a Valid F-Test for Equal Variances

1. Both sampled populations are normally distributed. 2. The samples are random and independent.

Steps for Calculating the p-value for a test of hypothesis

1. Determine the value of the test statistic z corresponding to the result of the sampling experiment. 2a. If the test is one-tailed, the p-value is equal to the tail area beyond z in the same direction as the alternative hypothesis. Thus, if the alternative hypothesis is of the form > , the p-value is the area to the right of, or above, the observed z-value. Conversely, if the alternative is of the form < , the p-value is the area to the left of, or below, the observed z-value. 2b. If the test is two-tailed, the p-value is equal to twice the tail area beyond the observed z-value in the direction of the sign of z - that is, if z is positive, the p-value is twice the area to the right of, or above, the observed z-value. Conversely, if z is negative, the p-value is twice the area to the left of, or below, the observed z-value.

Properties of ß and Power

1. For fixed n and a, the value of ß decreases, and the power increases as the distance between the specified null value µ0 and the specified alternative value µa increases. 2. For fixed n and values of µ0 and µa, the value of ß increases, and the power decreases as the value of a is decreased. 3. For fixed a and values of µ0 and µa, the value of ß decreases, and the power increases as the sample size n is increased.

Possible Conclusions for a Test of Hypothesis

1. If the calculated test statistic falls in the rejection region, reject H0 and conclude that the alternative hypothesis Ha is true. State that you are rejecting H0 at the a level of significance. Remember that the confidence is in the testing process, not the particular result of a single test. 2.If the test statistic does not fall in the rejection region, conclude that the sampling experiment does not provide sufficient evidence to reject H0 at the a level of significance. [Generally, we will not "accept" the null hypothesis unless the probability ß of a Type II error has been calculated.]

Randomized Block Design procedures

1. Matched sets of experimental units, called blocks, are formed, each block consisting of k experimental units (where k is the number of treatments). The b blocks should consist of experimental units that are as similar as possible. 2. One experimental unit from each block is randomly assigned to each treatment, resulting in a total ofn = bk responses.

Properties of the Sampling Distribution of x-bar

1. Mean of the sampling distribution equals mean of sampled population. μx-bar = E(x-bar)= μ 2. Standard deviation of sampling distribution equals standard deviation of sampled population / square root of sample size σx = σ/ √n.

Sampling Distribution of p-hat

1. Mean of the sampling distribution is equal to the true binomial proportion, p; that is, E(p-hat) = p. Consequently, is an unbiased estimator of p. 2. Standard deviation of the sampling distribution is equal to that is, √p(1-p)/n 3. For large samples, the sampling distribution is approximately normal. (A sample is considered large if np-hat ≥ 15 and n(1-p) ≥ 15

Elements of a Test of Hypothesis

1. Null hypothesis (H0): A theory about the specific values of one or more population parameters. The theory generally represents the status quo, which we adopt until it is proven false. 2. Alternative (research) hypothesis (Ha): A theory that contradicts the null hypothesis. The theory generally represents that which we will adopt only when sufficient evidence exists to establish its truth. 3. Test statistic: A sample statistic used to decide whether to reject the null hypothesis. 4. Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the probability is a that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of a is usually chosen to be small (e.g., .01, .05, or .10) and is referred to as the level of significance of the test. 5. Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled. 6. Experiment and calculation of test statistic: Performance of the sampling experiment and determination of the numerical value of the test statistic. 7. . Conclusion: a. If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the alternative hypothesis is true. We know that the hypothesis-testing process will lead to this conclusion incorrectly (Type I error) only 100a% of the time when H0 is true. b. If the test statistic does not fall in the rejection region, we do not reject H0. Thus, we reserve judgment about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not (in general) know the probability b that our test procedure will lead to an incorrect acceptance of H0 (Type II error).

Properties of theMultinomial Experiment

1. The experiment consists of n identical trials. 2. There are k possible outcomes to each trial. These outcomes are called classes, categories, or cells. 3. The probabilities of the k outcomes, denoted by p1,p2,... pk, remain the same from trial to trial, where p1 + p2 + ... + pk = 1. 4. The trials are independent. 5. The random variables of interest are the cell counts, n1, n2, ..., nk, of the number of observations that fall in each of the k classes.

Conditions Required for Valid F-tests in Factorial Experiments

1. The response distribution for each factor-level combination (treatment) is normal. 2. The response variance is constant for all treatments. 3. Random and independent samples of experimental units are associated with each treatment.

Conditions Required for a Valid ANOVA F-test:Completely Randomized Design

1. The samples are randomly selected in an independent manner from the k treatment populations. (This can be accomplished by randomly assigning the experimental units to the treatments.) 2. All k sampled populations have distributions that are approximately normal. 3. The k population variances are equal (i.e. σ1^2 = σ2^2 = σ3^2 = ... σk^2)

Conditions Required for Valid Small-Sample Inferences about (μ1 - μ2)

1. The two samples are randomly selected in an independent manner from the two target populations. 2. Both sampled populations have distributions that are approximately equal. 3. The populations variances are equal (ie σ1^2 = σ2^2)

Conditions Required for Valid Large-Sample Inferences about(p1 - p2)

1. The two samples are randomly selected in an independent manner from the two target populations. 2. The sample sizes, n1 and n2, are both large so that the sampling distribution of (p-hat 1 - p-hat 2) will be approximately normal. (This condition will be satisfied if both n1p1≥15, n1q1≥15 , and n2p2 ≥ 15, n2p2 ≥ 15)

Conditions Required for a Valid X^2 Test: One-Way Table

1.A multinomial experiment has been conducted. This is generally satisfied by taking a random sample from the population of interest. 2.The sample size n is large. This is satisfied if for every cell, the expected cell count Ei will be equal to 5 or more.

chi-squared basic ideas

1.Compares observed count to expected count assuming null hypothesis is true 2.Closer observed count is to expected count, the more likely the H0 is true - Measured by squared difference relative to expected count —Reject large values

ANOVA Basic Idea

1.Compares two types of variation to test equality of means 2.Comparison basis is ratio of variances 3.If treatment variation is significantly greater than random variation then means are not equal 4.Variation measures are obtained by 'partitioning' total variation

Conditions Required for a Valid ANOVA F-test:Randomized Block Design

1.The b blocks are randomly selected, and all k treatments are applied (in random order) to each block. 2.The distributions of observations corresponding to all bk block-treatment combinations are approximately normal. 3.All bk block-treatment distributions have equal variances.

Sampling Distribution of p-hat

1.The mean of the sampling distribution of p-hat is p; that is, is an unbiased estimator of p. 2. The standard deviation of the sampling distribution of p-hat is √pq/n where q=1-p 3. For large samples, the sampling distribution of is approximately normal. A sample size is considered large if both np-hat ≥ 15 and nq-hat ≥ 15.

Conditions Required for a Valid c2-Test: Contingency Table

1.The n observed counts are a random sample from the population of interest. We may then consider then consider this to be a multinomial experiment with r x c possible outcomes. 2.The sample size, n, will be large enough so that, for every cell, the estimated expected count Êij will be equal to 5 or more.

Conditions Required for Valid Large-Sample Inferences about(μ1 - μ2)

1.The two samples are randomly selected in an independent manner from the two target populations. 2.The sample sizes, n1 and n2, are both large (i.e., n1 ≥ 30 and n2 ≥ 30). [Due to the Central Limit Theorem, this condition guarantees that the sampling distribution of (xbar 1 - xbar 2) will be approximately normal regardless of the shapes of the underlying probability distributions of the populations. Also, s1 and s2 will provide good approximations to σ1^2 and σ2^2 when the samples are both large.]

Turkey, Bonferroni, Scheffè Methods

? pg. 524

Chapter 8: Inferences Based on Two Samples

Chapter 7 : pg 416

Theorem 5.2 (Central Limit Theorem)

If you have a population with a mean and a standard deviation, and you take a sufficiently large sample from that population with replacement, then your sample distribution will be approximately normal. the larger n is, the better approximately that your distribution will be normal. Your sample needs to be sufficiently large, the sample can not be more than 10% of the population which you are taking the same from, the sample needs to be randomly selected, and if you are taking a proportion the samples must be independent from each other. This is all in efforts to validate your sample for an approximately normal distribution. When we use z and t scores of a sample, that is based off of the central limit theorem; we are validating the hypothesis being tested with these assumptions.

Finite Population Correction Factor

In some sampling situations, the sample size n may represent 5% or perhaps 10% of the total number N of sampling units in the population. When the sample size is large relative to the number of measurements in the population (see the next slide), the standard errors of the estimators of µ and p should be multiplied by a finite population correction factor.

F-stats

Near 1: approx equal Excess of 1: alternative, population treatment different Exceed 1: Reject the null

Will the sampling distribution of x-bar always be approximately normally​ distributed? Explain.

No, because the Central Limit Theorem states that the sampling distribution of x-bar is approximately normally distributed only if the sample size is large enough.

Parameter

Numerical description measure of a population (almost always unknown)

Confidence coefficient

The probability that a randomly selected confidence interval encloses the population parameter

Sampling Error

We express the reliability associated with a confidence interval for the population mean µ by specifying the sampling error within which we want to estimate µ with 100(1 - a)% confidence. The sampling error (denoted SE), then, is equal to the half-width of the confidence interval CI for μ: SE = za/2 (σ/√n) CI for p: SE = za/2 √pq/n

Multinomial experiment example

You toss two dice three times, and record the outcome on each toss. This is a multinomial experiment because: The experiment consists of repeated trials. We toss the dice three times. Each trial can result in a discrete number of outcomes - 2 through 12. The probability of any outcome is constant; it does not change from one toss to the next. The trials are independent; that is, getting a particular outcome on one trial does not affect the outcome on other trials.

Completely randomized design

a design in which the experimental units are randomly assigned to the k treatments or in which independent random samples of experimental units are selected from each treatment (- Experimental units (subjects) are assigned randomly to treatments - Subjects are assumed homogeneous - One factor or independent variable --Two or more treatment levels or classifications -Analyzed by one-way Analysis of Variance (ANOVA)).

Sample statistic

a numerical descriptive measure of a sample. It is calculated from the observations in the sample

An interval estimator (or confidence interval)

a range of numbers that contain the target parameter with a high degree of confidence.

Test Statistic

a sample statistic, computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypotheses

A point estimator

a single number calculated from the sample that estimates a target population parameter Example: we'll use the sample mean, x-bar, to estimate he population mean μ

Hypothesis

a statement about the numerical value of a population parameter

Simple Linear Regression

a statistical method for obtaining a formula to predict values of one variable from another where there is a casual relationship between two variables

Type 2 Error

accepts the null hypothesis when, in fact, H0 is false. The probability of committing a Type II error is denoted by ß

Central Limit Theorem offers

an explanation for the fact that many relative frequency distributions of data possess mound-shaped distributions

Values of the F-statistic near 1 indicate that the two sources of variation, between treatment means and within treatments, are

approximately equal

Treatments

are the levels of the factor (the factor level combinations used) Ex: (Female, high SES), (Male, high SES), (Female, low SES)

Multinomial experiment

qualitative data that fall into more than two categories

what is a paired difference test of hypothesis?

randomized block design experiment using, μd = (μ1 - μ2)H0: ud = D0

Point Estimator

rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter

Large sample

sample being of sufficiently large size that we can apply the CLT and the normal (z) statistic to determine the form of the sampling distribution of x-bar. note: the large-sample interval estimator requires knowing the value of the population standard deviation, σ. (in most business cases however, σ is unknown)

pooled sample estimator of σ^2

small samples?

If a value uses a population standard deviation in the calculation then what is it

standard deviation

To avoid a potential type 2 error, we state

that the sample evidence is insufficient to reject H0 at a=0.05.

Degrees of Freedom

the actual amount of variability in the sampling distribution of 't' depends on the sample size 'n'.

Designed study

the analyst controls the specification of the treatments and the method of assigning the experimental units to each treatment

Observational study

the analyst simply observes the treatments and the response on a sample of experimental units


Related study sets

Electricity and Electronics Book

View Set

United States History- The Spanish- American War: Instruction.

View Set

CIS-110 - Chapter 16 Definitions

View Set