Definitions
Define Relationships Between Confidence Interval and Prediction Interval With Regression Equation
(page 422) The following Minitab graph shows the relationship between the regression line (in the center), the confidence interval (shown in crimson), and the prediction interval (shown in green). The bands for the prediction interval are always further from the regression line than those for the confidence interval. Also, as the values of X move away from the mean number of calls (22) in either the positive or the negative direction, the confidence interval and prediction interval bands widen. This is caused by the numerator of the right-hand term under the radical in formulas (13-10) and (13-11). That is, as the term (X - X)^2 increases, the widths of the confidence interval and the prediction interval also increase. To put it another way, there is less precision in our estimates as we move away, in either direction, from the mean of the independent variable. The prediction interval will always be wider because of the addition of 1 under the radical in the second equation.
Define Differences Between Descriptive Statistics and Inferential Statistics
1) Descriptive focuses on populations, inferential on samples. 2) Descriptive Statistics summarizes past data, whereas inferential statistics can predict future data based on past data.
Define Steps in ANOVA Test
1) Find total variation. 2) Find treatment variation (refers to treatment component, other component is random/error component). 3) Find random variation (also referred to as Error Component). 4) Find variance of treatment variation. Since it is variation of sample means, denominator is n-1, where "n" refers to number of treatments. Divide treatment variation/n-1. 5) Divide random variation by (total number of observations across all treatments minus number of treatments). So if there are 3 treatments with 4 observations each, the denominator would be (12 - 3). 6) Find F by dividing step 4 by step 5. Meaning treatment variation/(treatment n - 1) / random variation/(total n - treatments n). 7) If F ratio is different from 1, we conclude the population means are not the same. 8) ANOVA table includes Source of Variation (Treatments and Error), Sum of Squares (SST and SSE), Degrees of Freedom (for both SST and SSE), Mean Square (MST and MSE), and F (MST/MSE).
Define Hurdles/Assumptions in Hypothesis Testing Two Independent Population Means, SD Known
1) If the two distributions of sample means follow the normal distribution, then we can reason that the distribution of their differences will also follow the normal distribution. This is the first hurdle. 2) The second hurdle refers to the mean of this distribution of differences. If we find the mean of this distribution is zero, that implies that there is no difference in the two populations. On the other hand, if the mean of the distribution of differences is equal to some value other than zero, either positive or negative, then we conclude that the two populations do not have the same mean. 3) Our final hurdle is that we need to know something about the variability of the distribution of differences. Statistical theory shows that when we have independent populations, as in this case, the distribution of the differences has a variance equal to the sum of the two individual variances.
Define Steps in Stepwise Regression
1) In the stepwise method, we develop a sequence of equations. The first equation contains only one independent variable. However, this independent variable is the one from the set of proposed independent variables that explains the most variation in the dependent variable. Stated differently, if we compute all the simple correlations between each independent variable and the dependent variable, the stepwise method first selects the independent variable with the strongest correlation with the dependent variable. 2) Next, the stepwise method looks at the remaining independent variables and then selects the one that will explain the largest percentage of the variation yet unexplained. We continue this process until all the independent variables with significant regression coefficients are included in the regression equation.
Define Relationships Between Correlation Coefficient, Coefficient of Determination, and Standard Error of Estimate (r, r^2, and Sy.x) using ANOVA
1) The correlation coefficient and the standard error of the estimate are inversely related. As the strength of a linear relationship between two variables increases, the correlation coefficient increases and the standard error of the estimate decreases. 2) Correlation Coefficient squared becomes Correlation of Determination. (pretty straightforward) 3) The more variation of the dependent variable (SS total) explained by the independent variable (SSR), the higher the coefficient of determination. 4) The coefficient of determination and the residual or error sum of squares are inversely related. The larger the unexplained or error variation as a percentage of the total variation, the lower is the coefficient of determination. 5) Relationship between Correlation Coefficient, Coefficient of Determination, and Standard Error of Estimate is that SSE can be substituted in place of sigma(Y - Y hat)^2 in the Standard Error of Estimate Formula: Sy.x = root of SSE/n-2
Define Characteristics/Assumptions of General Form of Linear Regression Equation
1) The general form of the linear regression equation is exactly the same form as the equation of any line. a is the Y intercept and b is the slope. The purpose of regression analysis is to calculate the values of a and b to develop a linear equation that best fits the data. 2) The least squares regression line has some interesting and unique features. First, it will always pass through the point (Sample Mean X, Sample Mean Y). 3) There is no other line through the data where the sum of the squared deviations is smaller. To put it another way, the term sigma(Y - Y hat)^2 is smaller for the least squares regression equation than for any other equation.
Define Characteristics & Assumptions of F Distribution
1) There is a family of F distributions. A particular member of the family is deter-mined by two parameters: the degrees of freedom in the numerator and the degrees of freedom in the denominator. 2) The F distribution is continuous. This means that it can assume an infinite number of values between zero and positive infinity. 3) The F distribution cannot be negative. The smallest value F can assume is 0. 4. It is positively skewed. The long tail of the distribution is to the right-hand side. As the number of degrees of freedom increases in both the numerator and denominator, the distribution approaches a normal distribution. 5. It is asymptotic. As the values of X increase, the F distribution approaches the X-axis but never touches it. This is similar to the behavior of the normal prob-ability distribution.
Define Characteristics/Assumptions of Empirical Rule
1. About 68% of the area under the normal curve is within one standard deviation of the mean. 2. About 95% of the area under the normal curve is within two standard deviations of the mean. 3. Practically all of the area under the normal curve is within three standard deviations of the mean.
Assumptions Before Determining Confidence Level Using t Distribution
1. Assume the sampled population is either normal or approximately normal. This assumption may be questionable for small sample sizes, and becomes more valid with larger sample sizes. 2. Estimate the population standard deviation (SD) with the sample standard deviation (s). 3. Use the t distribution rather than the z distribution. We should be clear at this point. We base the decision on whether to use the t or the z on whether or not we know SD, the population standard deviation. If we know the population standard deviation, then we use z. If we do not know the population standard deviation, then we must use t.
Define Characteristics of Chi-Square
1. Chi-square values are never negative. This is because the difference between f0 and fe is squared, that is, (f0 - fe)^2. 2. There is a family of chi-square distributions. There is a chi-square distribution for 1 degree of freedom, another for 2 degrees of freedom, another for 3 degrees of freedom, and so on. In this type of problem, the number of degrees of freedom is determined by k - 1, where k is the number of categories. There-fore, the shape of the chi-square distribution does not depend on the size of the sample, but on the number of categories used. 3. The chi-square distribution is positively skewed. However, as the number of degrees of freedom increases, the distribution begins to approximate the normal probability distribution.
Define Assumptions/Characteristics of Linear Regression
1. For each value of X, there are corresponding Y values. These Y values follow the normal distribution. 2. The means of these normal distributions lie on the regression line. 3. The standard deviations of these normal distributions are all the same. The best estimate we have of this common standard deviation is the standard error of estimate (Sy.x). 4. The Y values are statistically independent. This means that in selecting a sample, a particular X does not depend on any other value of X. This assumption is particularly important when data are collected over a period of time. In such situations, the errors for a particular time period are often correlated with those of other time periods.
Define Characteristics of Coefficient of Multiple Determination
1. It is symbolized by a capital R squared. In other words, it is written as R^2 because it behaves like the square of a correlation coefficient. 2. It can range from 0 to 1. A value near 0 indicates little association between the set of independent variables and the dependent variable. A value near 1 means a strong association. 3. It cannot assume negative values. Any number that is squared or raised to the second power cannot be negative. 4. It is easy to interpret. Because R^2 is a value between 0 and 1, it is easy to interpret, compare, and understand.
Define Characteristics/Assumptions of Probability Distribution
1. The probability of a particular outcome is between 0 and 1 inclusive. 2. The outcomes are mutually exclusive events. 3. The list is exhaustive. So the sum of the probabilities of the various events is equal to 1.
Define Characteristics/Interpretations of the Correlation Coefficient
1. The sample correlation coefficient is identified by the lowercase letter r. 2. It shows the direction and strength of the linear relationship between two interval-or ratio-scale variables. 3. It ranges from -1 up to and including +1. 4. A value near 0 indicates there is little relationship between the variables. 5. A value near 1 indicates a strong direct or positive relationship between the variables. 6. A value near -1 indicates a strong inverse or negative relationship between the variables. 7. What we can conclude when we find two variables with a strong correlation is that there is a relationship or association between the two variables, not that a change in one causes a change in the other
Define Assumptions of Multiple Regression
1. There is a linear relationship. That is, there is a straight-line relationship between the dependent variable and the set of independent variables. 2. The variation in the residuals is the same for both large and small values of Y hat. (Y - Y hat) is unrelated to whether Y hat is large or small. 3. The residuals follow the normal probability distribution. Recall the residual is the difference between the actual value of Y and the estimated value Y hat. So the term (Y - Y hat) is computed for every observation in the data set. These residuals should approximately follow a normal probability distribution. In addition, the mean of the residuals is 0. 4. The independent variables should not be correlated; we would like to select a set of independent variables that are not themselves correlated. 5. The residuals are independent; successive observations of the dependent variable are not correlated. This assumption is often violated (see autocorrelation) when time is involved with the sampled observations.
Define Reasons to Sample rather than Study Population
1. To contact the whole population would be time consuming. 2. The cost of studying all the items in a population may be prohibitive. 3. The physical impossibility of checking all items in the population. 4. The destructive nature of some tests. 5. The sample results are adequate.
Define Steps and Interpretation of Anderson-Darling Test of Normality
1. We create two cumulative distributions. The first is a cumulative distribution of the raw data. The other is a cumulative normal distribution. 2. We compare the two cumulative distributions by searching for the largest absolute numerical difference between the two distributions. Using a statistical test, if the difference is large, then we reject the null hypothesis that the data are normally distributed. (see other flashcard "Define interpretation of cumulative distribution on Graph), also on page 512. The Anderson-Darling test is an alternative to the chi-square and Kolmogorov-Smirnov goodness-of-fit tests, where: H0: mu follows the normal distribution. H1: mu does not follow the normal distribution. Judge validity by p-value and significance level, reject normality (H0) if p < significance level.
Define Bimodal Distribution
A bimodal distribution will have two or more peaks. This is often the case when the values are from two or more populations.
Define Box Plot
A box plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct a box plot, we need only five statistics: the minimum value, Q1 (the first quartile), the median, Q3 (the third quartile), and the maximum value.
Define Pie Chart
A chart that shows the proportion or percentage that each class represents of the total number of frequencies.
Define Statistics
A collection of numerical information is called statistics. It is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.
Define Decision Rule
A decision rule is a statement of the specific conditions under which the null hypothesis is rejected and the conditions under which it is not rejected. The region or area of rejection defines the location of all those values that are so large or so small that the probability of their occurrence under a true null hypothesis is rather remote.
Define Disadvantage of Goodness of Fit Test
A disadvantage of the goodness-of-fit test for normality is that a frequency distribution of grouped data is compared to an expected set of normally distributed frequencies. When we organize data into frequency distributions, we know that we lose information about the data. That is, we do not have the raw data.
Define Frequency Polygon
A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the inter-sections of the class midpoints and the class frequencies.
Define Histogram
A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars, and the bars are drawn adjacent to each other.
Define Bar Chart
A graph that shows qualitative classes on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars.
Define Frequency Table/ Frequency Distribution
A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class.
Define Homogeneous and Heterogeneous Populations
A homogenous population has little variation. You could refer to a specific trait, such as hair color or you could refer to genetic diversity. For example, a population of humans that has inhabited an island for thousands of years with little migration to or from the island is likely relatively homogeneous or alike in their traits. A heterogeneous population is one where individuals are not similar to one another. For example, you could have a heterogeneous population in terms of humans that have migrated from different regions of the world and currently live together. That population would likely be heterogeneous in regards to height, hair texture, disease immunity, and other traits because of the varied background and genetics.
Define Hypothesis
A hypothesis is a statement about a population. Data are then used to check the reasonableness of the statement. We always conduct tests of hypothesis that refer to population parameters, never sample statistics. This means you will never see the sample mean, X bar, in the null or the alternate hypothesis.
Define Table of Random Numbers
A more convenient method of selecting a random sample is to use the identification number of each employee and a table of random numbers. As the name implies, these numbers have been generated by a random process (in this case, by a computer). For each digit of a number, the probability of 0, 1, 2, . . . , 9 is the same.
Define Point Estimate
A point estimate is a single value (point) derived from a sample and used to estimate a population value. A point estimate is a single statistic used to estimate a population parameter.
Define Cluster Random Sampling
A population is divided into clusters using naturally occurring geographic or other boundaries. Then, clusters are randomly selected and a sample is collected by randomly selecting from each cluster. It is often employed to reduce the cost of sampling a population scattered over a large geographic area.
Define Sample
A portion, or part, of the population of interest. To infer something about a population, we usually take a sample from the population.
Define Probability Distribution
A probability distribution gives the entire range of values that can occur based on an experiment. It is a listing of all the outcomes of an experiment and the probability associated with each outcome. A probability distribution is similar to a relative frequency distribution. However, instead of describing the past, it describes a likely future event.
Define Sampling Distribution (of the Sample Mean)
A probability distribution of all possible sample means of a given sample size. Population mean is equal to the final mean of all sample means (because, obviously, all possible sample combinations covered means you have covered the whole population).
Define Joint Probability
A probability that measures the likelihood two or more events will happen concurrently.
Define Experiment (Probability)
A process that leads to the occurrence of one and only one of several possible results.
Define Random Variable
A quantity resulting from an experiment that, by chance, can assume different values. In any experiment of chance, the outcomes occur randomly, so it is often called a random variable. A random variable may be either discrete or continuous.
Define Systematic Random Sampling
A random starting point is selected, and then every kth member of the population is selected. Example: A baker testing every 25th pie in his bakery to get an idea of 1/25 of his pie population.
Define Discrete Random Variable
A random variable that can assume only certain clearly separated values. In some cases, this can assume fractional or decimal values, but only up to one decimal place (such as 13.2, 8,8, etc.).
Definition and Interpretation of Confidence Interval
A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. The specified probability is called the level of confidence. The higher the confidence interval, the greater the spread in the confidence interval. For example, for a 95% confidence interval, about 95% of the similarly constructed intervals would include the population parameter.
Define Simple Random Sample
A sample selected so that each item or person in the population has the same chance of being included. This, in combination with Table of Random Numbers, eliminates bias from the sampling process.
Define Positively Skewed Distribution
A set of values is skewed to the right or positively skewed if there is a single peak and the values extend much further to the right of the peak than to the left of the peak. In this case, the mean is larger than the median, and mode is the smallest value.
Define Paired Difference
A situation in which two populations have the same random observations selected, and the difference between these observations is measured.
Define Stepwise Regression
A step-by-step method to determine a regression equation that begins with a single independent variable and adds or deletes independent variables one by one. Only independent variables with nonzero regression coefficients are included in the regression equation. Other methods of variable selection are available. The stepwise method is also called the forward selection method because we begin with no independent variables and add one independent variable to the regression equation at each iteration. There is also the backward elimination method, which begins with the entire set of variables and eliminates one independent variable at each iteration.
Define 2 to the K rule
A useful recipe to determine the number of classes (k) is the "2 to the k rule." This guide suggests you select the smallest number (k) for the number of classes such that 2^k (in words, 2 raised to the power of k) is greater than the number of observations (n). example, there were 180 vehicles sold. So n=180. If we try k=7, which means we would use 7 classes, 2^7=128, which is less than 180. Hence, 7 is too few classes. If we let k=8, then 2^8=256, which is greater than 180. So the recommended number of classes is 8.
Define Probability
A value between zero and one, inclusive, describing the relative possibility (chance or likelihood) an event will occur.
Define Test Statistic
A value, determined from sample information, used to determine whether to reject the null hypothesis.
Define p-value
ALWAYS TRUST P-VALUE CALCULATED BY TI-84. Determining the p-value not only results in a decision regarding H0, but it gives us additional insight into the strength of the decision. This approach reports the probability (assuming that the null hypothesis is true) of getting a value of the test statistic at least as extreme as the value actually obtained. This process compares the probability, called the p-value, with the significance level. If the p-value is smaller than the significance level, H0 is rejected. If it is larger than the significance level, H0 is not rejected.
Define Treatment (ANOVA)
ANOVA was first developed for applications in agriculture, and many of the terms related to that context remain. In particular, the term treatment is used to identify the different populations being examined. For example, treatment refers to how a plot of ground was treated with a particular type of fertilizer.
Define Mean
An average is a measure of location that shows the central value of the data.
Define Regression Equation
An equation that expresses the linear relationship between two variables. Proper Definition: The equation for the line used to estimate Y on the basis of X is referred to as the regression equation.
Define Best-Subset Regression
Another approach is the best-subset regression. With this method, we look at the best model using one independent variable, the best model using two independent variables, the best model with three and so on. The criterion is to find the model with the largest value, regardless of the number of independent variables. Also, each independent variable does not necessarily have a nonzero regression coefficient. Since each independent variable could either be included or not included, there are 2^k - 1 possible models, where k refers to the number of independent variables.
Define Kruskal-Wallis Test
Another nonparametric test is the Kruskal-Wallis H test. Use this test only when you have independent random samples from three or more populations. Uses H statistic for its test. The H statistic should be a positive number. If it's greater than the Chi-Square critical value, you know that the null hypothesis should be rejected. The Kruskal-Wallis H test works with three or more populations. H0: all means are equal. H1: at least one mean is different from the others. df = (k - 1), under chi-square table. T1, T2, T3 are the totals of the ranking for each population sample.
Define Wilcoxon Signed Ranks Test and Wilcoxon Ranked Sum Test
Another nonparametric test that you can perform is the Wilcoxon signed ranks test. The Wilcoxon signed ranks test requires paired observations from two populations and computes the paired difference between the populations. The Wilcoxon signed ranks test ranks the absolute differences between observation values. The Wilcoxon signed ranks test requires paired observations from two populations. It is a nonparametric test used in paired difference experiments. The Wilcoxon rank sum test compares the probability distributions of two independent samples. The Wilcoxon rank sum test requires the data to be ranked. It is a nonparametric test used to compare two independent populations.
Define Permutation
Any arrangement of r objects selected from a single group of n possible objects.
Define Parameter
Any measurable characteristic of a population is called a parameter. The mean of a population is an example of a parameter.
Define Characteristics/Assumptions of Binomial Probability Distribution
BINOMIAL PROBABILITY EXPERIMENT 1. There are only 2 possible outcomes. An outcome on each trial of an experiment is classified into one of two mutually exclusive categories—a success or a failure. 2. The random variable counts the number of successes in a fixed and known number of trials. 3. The probability of success and failure stay the same for each trial. We need to know this probability beforehand. 4. The trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.
Define Interpretation and Formula for Least Squares Value
Basically just residuals squared. This is the sum of the squared differences or the least squares value. There is no other line through these 10 data points where the sum of the squared differences is smaller.
Define Chi-Square Test and its Purposes
Basically tests if two variables are related. A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying. We can also use the chi-square statistic to formally test for a relationship between two nominal-scaled variables. To put it another way, is one variable independent of the other? Example: H0: parking lot preference is not related to the gender of the student. H1: parking lot preference is related to the gender of the student.
Define Rules and Interpretation of Graphical Portrayal of Multiple Linear Regression, Depending on Number of Independent Variables
Because there are two independent variables, this relationship is graphically portrayed as a plane. The chart shows the residuals as the difference between the actual Y and the fitted on the plane. If a multiple regression analysis includes more than two independent variables, we cannot use a graph to illustrate the analysis since graphs are limited to three dimensions.
Define Probability Theory (Science of Uncertainty)
Because there is uncertainty in decision making, it is important that all the known risks involved be scientifically evaluated. Helpful in this evaluation is probability theory, which has often been referred to as the science of uncertainty. Probability theory allows a manager to assess the risks and benefits associated with a set of decision alternatives.
Define Interpretation of Cumulative Distribution on a Graph
Because we are looking at cumulative distributions, the graphs will increase from left to right. (Page 512) We can graph the cumulative distribution of the raw data and the cumulative normal distribution. The graph of the cumulative normal distribution is a straight line. The graph of the raw data will be scattered around the straight line representing the cumulative normal. Using the graph, we can observe that the data are normally distributed if the scatter is relatively close to the straight line that represents the normal cumulative distribution.
Define Univariate Data
Because we are studying a single variable, we refer to this as univariate data.
Define Sign Test Example for Finding P-Value
Calculator: [2nd] [vars] [B: binomcdf] Trials: 9, p: .5, x: 5. result: .7461. Then, 1.000 - .7461 =.2539! Same result as below method. For Example: A car dealer wants to test whether median car sales exceed 20 sales per day. The hypotheses are: H0: m = 20 H1: m > 20 Because we're testing the median sales, which are the dividing point of the distribution, you can use the binomial distribution to test this data. Following is the car sale data for 10 days arranged in asc order: No. Sold: 10, 12, 15, 20, 23, 24, 26, 27, 27, 28 So, 6 scores are above 20 (x), one score is equal to 20, and 3 scores are below 20. Now, compute the probability of getting this result in the sample. Because we're working with binomial distribution (only 2 outcomes) the probability of one outcome is .5 (1 of 2). p-value: P(x >= 5): P(x=6) + P(x=7) ..... + P(x=10) Using Appendix B9, n=9, under .50 column: P(x>=5): .164 + .07 + .018 + .000 = .2540 Since .2540 > level of significance .05, Fail to reject H0.
Define Classical Probability
Classical probability is based on the assumption that the outcomes of an experiment are equally likely. Using the classical viewpoint, the probability of an event happening is computed by dividing the number of favorable outcomes by the number of possible outcomes:
Define Class Frequency
Count the number of items in each class. The number of observations in each class is called the class frequency. In the $200 up to $600 class there are 8 observations, and in the $600 up to $1,000 class there are 11 observations. Therefore, the class frequency in the first class is 8 and the class frequency in the second class is 11.
Define Regression Sum of Squares (SSR) in ANOVA Table
Define Regression Sum of Squares (SSR) in ANOVA Table
Define Statistical Inference/Inferential Statistics (Probability)
Descriptive statistics is concerned with summarizing data collected from past events. We now turn to the second facet of statistics, namely, computing the chance that something will occur in the future. This facet of statistics is called statistical inference or inferential statistics. Statistical inference deals with conclusions about a population based on a sample taken from that population.
Define Class Interval/Class Width
Determine the class interval or class width. Generally the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the minimum value in the data up to the maximum value. i > Maximum value - Minimum value / k, where i is the class interval, and k is the number of classes.
Define Discrete Variable
Discrete variables can assume only certain values, and there are "gaps" between the values. Examples of discrete variables are the number of bedrooms in a house (1, 2, 3, 4, etc.), the number of cars arriving at Exit 25 on I-4 in Florida near Walt Disney World in an hour (326, 421, etc.), and the number of students in each section of a statistics course (25 in section A, 42 in section B, and 18 in section C). We count, for example, the number of cars arriving at Exit 25 on I-4, and we count the number of statistics students in each section. Notice that a home can have 3 or 4 bedrooms, but it can-not have 3.56 bedrooms. Thus, there is a "gap" between possible values. Typically, discrete variables result from counting.
Define Dot Plot
FOR SINGLE VARIABLE. A dot plot groups the data as little as possible, and we do not lose the identity of an individual observation. To develop a dot plot, we display a dot for each observation along a horizontal number line indicating the possible values of the data. If there are identical observations or the observations are too close to be shown individually, the dots are "piled" on top of each other. This allows us to see the shape of the distribution, the value about which the data tend to cluster, and the largest and smallest observations. Dot plots are most useful for smaller data sets, whereas histograms tend to be most useful for large data sets.
Definition and Interpretation of Residuals/Error Values (Regression Analysis)
Fact: the mean of the residuals is 0. In column E, we calculate the residuals, or the error values. This is the difference between the actual values and the predicted values. This value reflects the amount the predicted value of sales is "off" from the actual sales value. Formula: Y - Y hat where: Y hat is the linear regression equation. Y is the dependent variable. Recall that the standard error of the mean reports the variation in the sample means. In a similar fashion, these standard errors report the possible variation in slope and intercept values.
Define Problems with Multicollinearity in Multiple Regression
First, we should point out that multicollinearity does not affect a multiple regression equation's ability to predict the dependent variable. However, when we are interested in evaluating the relationship between each independent variable and the dependent variable, multicollinearity may show unexpected results. A second reason for avoiding correlated independent variables is they may lead to erroneous results in the hypothesis tests for the individual independent variables. This is due to the instability of the standard error of estimate. Several clues that indicate problems with multicollinearity include the following: 1. An independent variable known to be an important predictor ends up having a regression coefficient that is not significant. 2. A regression coefficient that should have a positive sign turns out to be negative, or vice versa. 3. When an independent variable is added or removed, there is a drastic change in the values of the remaining regression coefficients.
Define Inclusive
For the expression P(A or B), the word or suggests that A may occur or B may occur. This also includes the possibility that A and B may occur. The use of or is called an inclusive.
Define Nominal Level of Measurement
For the nominal level of measurement, observations of a qualitative variable can only be classified and counted. There is no particular order to the labels. To summarize, the nominal level has the following properties: 1. The variable of interest is divided into categories or outcomes. 2. Data categories have no logical sequence. 3. Data categories are mutually exclusive and exhaustive.
Define Standard Normal Probability Distribution
Fortunately, one member of the family can be used to determine the probabilities for all normal probability distributions. It is called the standard normal probability distribution, and it is unique because it has a mean of 0 and a standard deviation of 1.
Define Degrees of Freedom
From Straighterline: Degrees of freedom are associated with a person's ability to choose options. Door prizes can be an example of this. If there are five door prizes, and five people's names are drawn from a hat, each person has the chance to choose a prize. Sort of. Actually, the first person gets to choose from five prizes, the second from four prizes, and so on until the fifth person gets no choice at all. In other words, the person gets whatever prize is left. To illustrate the meaning of degrees of freedom: Assume that the mean of four numbers is known to be 5. The four numbers are 7, 4, 1, and 8. The deviations of these numbers from the mean must total 0. The deviations of 2, 1, 4, and 3 do total 0. If the deviations of 2, 1, and 4 are known, then the value of 3 is fixed (restricted) in order to satisfy the condition that the sum of the deviations must equal 0. Thus, 1 degree of freedom is lost in a sampling problem involving the standard deviation of the sample because one number (the mean) is known.
Define Characteristics/Assumptions of t Distribution
From Straighterline: When working with a small sample size (less than 30 n) The t distribution is a continuous probability distribution, with many similar characteristics to the z distribution. • It is, like the z distribution, a continuous distribution. • It is, like the z distribution, bell-shaped and symmetrical. • There is not one t distribution, but rather a family of t distributions. All t distributions have a mean of 0, but their standard deviations differ according to the sample size, n. There is a t distribution for a sample size of 20, another for a sample size of 22, and so on. The standard deviation for a t distribution with 5 observations is larger than for a t distribution with 20 observations. • The t distribution is more spread out and flatter at the center than the standard normal distribution. As the sample size increases, however, the t distribution approaches the standard normal distribution, because the errors in using s to estimate decrease with larger samples. (page 267)
Define Interpretation of Multiple Standard Error of Estimate
How do we interpret the standard error of estimate of 51.05? It is the typical "error" when we use this equation to predict the cost. First, the units are the same as the dependent variable, so the standard error is in dollars, $51.05. Second, we expect the residuals to be approximately normally distributed, so about 68% of the residuals will be within +/- 1 standard deviation, and 95% will be within 2 standard deviations, and almost all (99.7%) will be within 3 standard deviations. As before with similar measures of dispersion, such as the standard error of estimate in Chapter 13, a smaller multiple standard error indicates a better or more effective predictive equation.
Definition and Characteristics of Least Squares Principle (Regression Analysis)
However, we would prefer a method that results in a single, best regression line. This method is called the least squares principle. It gives what is commonly referred to as the "best-fitting" line. The least squares regression line has some interesting and unique features. First, it will always pass through the point (X bar, Y bar). Second, as we discussed earlier in this section, there is no other line through the data where the sum of the squared deviations is smaller. To put it another way, the term sigma(Y - Y hat)^2 is smaller for the least squares regression than for any other equation. Proper Definition: A mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y.
Define Interpretation of Formula for Confidence Interval For Difference in Treatment Means
If the confidence interval includes zero, there is not a difference between the treatment means. For example, if the left endpoint of the confidence interval has a negative sign and the right endpoint has a positive sign, the interval includes zero and the two means do not differ. e.g.: 5 +/- 12 = -7 up to 17 Note that zero is included in this interval. Therefore, we conclude that there is not a significant difference in the selected treatment means. On the other hand, if the endpoints of the confidence interval have the same sign, this indicates that the treatment means differ. E.g.: the confidence interval would range from -0.60 up to -0.10. Because -0.60 and -0.10 have the same sign, both negative, zero is not in the interval and we conclude that these treatment means differ.
Define Combination Formula (Principles of Counting)
If the order of the selected objects is not important, any selection is called a combination.
Define Interpretation/Assumptions of p-value
If the p-value is less than (a) .10, we have some evidence that H0 is not true. (b) .05, we have strong evidence that H0 is not true. (c) .01, we have very strong evidence that H0 is not true. (d) .001, we have extremely strong evidence that H0 is not true.
Define Interpretation of p-value
If the p-value is less than our selected significance level, then reject the null hypothesis. The decision is the same as when we used the critical value approach. The advantage to using the p-value approach is that the p-value gives us the probability of making a Type I error. So when we reject the null hypothesis in this instance, there is a very small likelihood of committing a Type I error!
Define Multiplication Formula (Principles of Counting)
If there are m ways of doing one thing and n ways of doing another thing, there are (m)(n) ways of doing both.
Define Rules/Limitations of Chi-Square
If there is an unusually small expected frequency in a cell, chi-square (if applied) might result in an erroneous conclusion. This can happen because fe appears in the denominator, and dividing by a very small number makes the quotient quite large! Two generally accepted policies regarding small cell frequencies are: 1. If there are only two cells, the expected frequency in each cell should be at least 5. 2. For more than two cells, chi-square should not be used if more than 20% of the fe cells have expected frequencies less than 5. The dilemma can be resolved by combining categories if it is logical to do so. In the above example, we combine the three vice presidential categories, which satisfies the 20% policy (Page 507).
Define Subjective Probability
If there is little or no experience or information on which to base a probability, it may be estimated subjectively. Essentially, this means an individual evaluates the available opinions and information and then estimates or assigns the probability. This probability is aptly called a subjective probability.
Define Special Rule of Addition (Probability)
If two events A and B are mutually exclusive, the special rule of addition states that the probability of one or the other event's occurring equals the sum of their probabilities. This rule is expressed in the following formula: P(A or B) = P(A) + P(B)
Define Dependence (Probability)
If two events are not independent, they are referred to as dependent.
Define Continuous Random Variable
If we measure something such as the width of a room, the height of a person, or the pressure in an auto-mobile tire, the variable is a continuous random variable. It can assume one of an infinitely large number of values, within a particular range.
Define Outlier
In a box plot, an asterisk identifies an outlier. An outlier is a value that is inconsistent with the rest of the data. It is defined as a value that is more than 1.5 times the interquar-tile range smaller than Q1 or larger than Q3.
Define Negatively Skewed Distribution
In a negatively skewed distribution there is a single peak, but the observations extend further to the left in the negative direction than to the right. In a negatively skewed distribution, the mean is smaller than the median, and mode is the largest value.
Define Symmetric Distribution
In a symmetric set of observations the mean, median, and mode are equal and the data values are evenly spread around these values. The data values below the mean and median are a mirror image of those above.
Define Residual Analysis (Multiple Regression Analysis)
In multiple regression analysis, residual analysis is used to test the requirement that the variation in the residuals is the same for all predicted values of Y. Residuals Y - Y hat are also used to evaluate homoscedasticity.
Define Model of Relationship (Multiple Regression)
In multiple regression we assume there is an unknown population regression equation that relates the dependent variable to the k independent variables. This is sometimes called a model of the relationship. In symbols, we write: Y hat = alpha + Beta1X1 + Beta2X2 . . . BetakXk This equation is analogous to formula (14-1) except the coefficients are now reported as Greek letters. We use Greek letters to denote population parameters. The computed values of a and bj are sample statistics. These sample statistics are point estimates of the corresponding population parameters alpha & Betaj. E.g, the sample regression coefficient b2 is a point estimate of the population parameter Beta2. The sampling distribution of these point estimates follows the normal probability distribution. The means of the sampling distributions are equal to the parameter values to be estimated. Thus, by using the properties of the sampling distributions of these statistics, inferences about the population parameters are possible.
Define Steps in Statistics
In order to make an informed decision, you will need to: 1. Determine whether the existing information is adequate or additional information is required. 2. Gather additional information, if it is needed, in such a way that it does not provide misleading results. 3. Summarize the information in a useful and informative manner. 4. Analyze the available information. 5. Draw conclusions and make inferences while assessing the risk of an incorrect conclusion.
Definition and Interpretation of Variance Inflation Factor (VIF)
In our evaluation of a multiple regression equation, an approach to reducing the effects of multicollinearity is to carefully select the independent variables that are included in the regression equation. A general rule is if the correlation between two independent variables is between -0.70 and 0.70, there likely is not a problem using both of the independent variables. A more precise test is to use the variance inflation factor. It is usually written VIF. If multicollinearity exists, removing one of the variables and rerunning the regression analysis is a common solution. Another solution might be to expand the size of the sample and rerun the regression analysis hoping that the larger sample solves the problem.
Define Formulas/Rules to Mitigate the Effects of Multicollinearity in Multiple Regression
In our evaluation of a multiple regression equation, an approach to reducing the effects of multicollinearity is to carefully select the independent variables that are included in the regression equation. A general rule is if the correlation between two independent variables is between -0.70 and 0.70, there likely is not a problem using both of the independent variables. A more precise test is to use the variance inflation factor. It is usually written VIF. (See VIF formula in Formulas flashcard) If multicollinearity exists, removing one of the variables and rerunning the regression analysis is a common solution. Another solution might be to expand the size of the sample and rerun the regression analysis hoping that the larger sample solves the problem.
Define Collectively Exhaustive
In probability, a set of events is collectively exhaustive if they cover all of the probability space: i.e., the probability of any one of them happening is 100%. If a set of statements is collectively exhaustive we know at least one of them is true. At least one of the events must occur when an experiment is conducted.
Define Regression Analysis
In regression analysis, our objective is to use the data to position a line that best represents the relationship between the two variables. Our first approach is to use a scatter diagram to visually position the line. If the correlation coefficient is significantly different from zero, then the next step is to develop an equation to express the linear relationship between the two variables. Using this equation, we will be able to estimate the value of the dependent variable Y based on a selected value of the independent variable X. The technique used to develop the equation and provide the estimates is called regression analysis.
Define Confidence Interval and Prediction Interval Using Regression Equation (Y hat)
In short: Confidence interval for mean of all observations; prediction interval for value of one particular observation. The first interval estimate is called a confidence interval. This is used when Yˆ the regression equation is used to predict the mean value of Y for a given value of X. For example, we would use a confidence interval to estimate the mean salary of all executives in the retail industry based on their years of experience. The second interval estimate is called a prediction interval. This is used when the regression equation is used to predict an individual Y (n = 1) for a given value of X. For example, we would estimate the salary of a particular retail executive who has 20 years of experience. The prediction interval will always be wider because of the addition of 1 under the radical in the second equation.
Define Spurious Correlations
In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor. The incomes of professors and the number of inmates in mental institutions have increased proportionately. Further, as the population of donkeys has decreased, there has been an increase in the number of doc-total degrees granted. Relationships such as these are called spurious correlations.
Define Ratio Level of Measurement
In summary, the properties of the ratio-level data are: 1. Data classifications are mutually exclusive and exhaustive. 2. Data classifications can be ranked or ordered. 3. The difference between classifications is a consistent unit of measurement. 4. Zero represents that there is no characteristic present. 5. The ratio between two classifications is meaningful. For example: Examples of ratio data are gross pay, hours worked, bonuses, and so on. A business example is company profitability, which can be negative, zero, or positive. In this example, Mary's bonus of $0 is a true zero. She receives no bonus! Also, the ratio of Bob's bonus to Tom's is two to one. Bob gets twice the amount of bonus that Tom receives. This ratio is useful because it describes the relationship between the bonus amounts. In ordinal, since there is ranking we can figure which is better, but not how much better. Ratio level solves this problem.
Define Standard Error in Regression ANOVA Table
In the column to the right of the regression coefficient is a column labeled "Standard Error." This is a value similar to the standard error of the mean. Recall that the standard error of the mean reports the variation in the sample means. In a similar fashion, these standard errors report the possible variation in slope and intercept values.
Definition and Advantage of Goodness-of-Fit Test (Using Chi-Square as Test Statistic)
In the goodness-of-fit test, the chi-square distribution is used to determine how well an observed distribution of observations "fits" an expected distribution of observations. The goodness-of-fit test is one of the most commonly used statistical tests. It is particularly useful because it requires only the nominal level of measurement. So we are able to conduct a test of hypothesis on data that has been classified into groups. Our first illustration of this test involves the case when the expected cell frequencies are equal. As the full name implies, the purpose of the goodness-of-fit test is to compare an observed distribution to an expected distribution.
Define Reason for Pooling Variance
In two-sample pooled t test, the two sample standard deviations are pooled to form a single estimate of the unknown population standard deviation. In essence, we compute a weighted mean of the two sample standard deviations and use this value as an estimate of the unknown population standard deviation. The weights are the degrees of freedom that each sample provides. Why do we need to pool the sample standard deviations? Because we assume that the two populations have equal standard deviations, the best estimate we can make of that value is to combine or pool all the sample information we have about the value of the population standard deviation.
Define Transformation (of Data)
Instead of using the actual values of the dependent variable, Y, we would create a new dependent variable by computing the log to the base 10 of Y, Log(Y). This calculation is called a transformation. Other common transformations include taking the square root, the reciprocal, or squaring one or both of the variables.
Definition and Interpretation of Coefficient of Determination (R-square, r^2)
It can never be a negative value. The proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X. The coefficient of determination is easy to compute. It is the correlation coefficient squared. Therefore, the term R-square is also used. To better interpret the coefficient of determination, convert it to a percentage. Hence, we say that 57.6% of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls. If it were possible to make perfect predictions, the coefficient of determination would be 100%. That would mean that the independent variable, number of sales calls, explains or accounts for all the variation in the number of copiers sold. A coefficient of determination of 100% is associated with a correlation coefficient of +1.0 or -1.0.
Define Interval Level of Measurement
It includes all the characteristics of the ordinal level, but in addition, the difference between values is a constant size. If the distances between the numbers make sense, but the ratios don't, then you have an interval scale of measurement. Properties of the interval level data are: 1. Data classifications are mutually exclusive and exhaustive. 2. Data classifications are ranked or ordered according to the amount of the characteristic they possess. 3. The difference between classifications is a consistent unit of measurement. 4. Equal differences in the characteristic are represented by equal differences in the measurements. 5. Zero does not mean that there is no characteristic present. It is just another possible result. For example: A person can wear a zero dress size. That doesn't mean the dress has no size. It is simply one of the possible dress sizes. Using basic math (or measurements), you can determine the difference between a zero dress size and a four, six, or nine dress size.
Define One-Tailed Test
It is called a one-tailed test because the rejection region is only in one tail of the curve. In this case, it is in the right, or upper, tail of the curve. One-tailed test measures increase/decrease in the problem, whereas two-tailed test is concerned only with whether or not there is a difference (regardless of increase/decrease). Null Hypothesis: mu <= 200 Alternate Hypothesis: mu > 200
Define Interpretation of ANOVA table in Multiple Regression (Regression Variance, Error Variance)
Let's discuss a few other points about the ANOVA table. The ANOVA table separates the variance into two components: the regression variance and error variance. The error variance is the portion of the variance that is not explained by the variables. The adjusted R square values (often referred to as adjusted R2) represents the R square value adjusted to ensure that the importance of the independent variables is not overstated.
Define Assumptions Before Two-Sample Test of Means, SD Known
Let's review the assumptions necessary for using the formula: • The two populations follow normal distributions. • The two samples must be unrelated, that is, independent. • The standard deviations for both populations must be known.
Define Measures of Location
Measures of location are often referred to as averages. The purpose of a measure of location is to pinpoint the center of a distribution of data, using tools like the arithmetic mean, weighted mean, geometric mean, median, and mode.
Define Measures of Dispersion
Measures that assess the variation or the spread of the data, also known as the dispersion of values across a data set. To describe the dispersion, we use measures such as range, mean deviation, variance, and standard deviation.
Define Measures of Position
Measures that help determine the location of values that divide a set of observations into equal parts, such as quartiles, deciles, and percentiles.
Define Correlation Matrix
MegaStat: Correlation/Regression --> Correlation Matrix Are you familiar with the concept of a family tree? It is a diagram that shows who is related and how closely they are related. This is similar to a correlation matrix. It shows the interrelationships between each variable and those around it. Your first step is to compare the correlation of each variable to the dependent variable (test score). The closer a value is to a positive or negative 1 value, the stronger the correlation. Now check for multicollinearity, which refers to the correlation between independent variables. The normal standard is that any value above 0.70 or below -0 .70 indicates a multicollinearity problem. For example, the hours studying and number of classes missed have a strong negative correlation. The normal solution to this problem is to drop one of the correlated variables and run the regression analysis again. Another solution is to expand the size of the sample and rerun hoping that the larger sample solves the problem.
Define Descriptive Statistics
Methods of organizing, summarizing, and presenting data in an informative way
Define Type II Error
Not rejecting the null hypothesis when it is false. The probability of committing another type of error, called a Type II error, is designated by the Greek letter beta.
Define Interpretation of Results of a Scatter Chart
Note that, if the correlation is weak, there is considerable scatter about a line drawn through the center of the data. For the scatter diagram representing a strong relationship, there is very little scatter about the line.
Define Correlation Analysis
Numerical measures that precisely describe the relationship between the two variables. This group of statistical techniques is called correlation analysis. The basic idea of correlation analysis is to report the relationship between two variables. The usual first step is to plot the data in a scatter diagram.
Define Continuous Variable
Observations of a continuous variable can assume any value within a specific range. Examples of continuous variables are the air pressure in a tire and the weight of a shipment of tomatoes. Other examples are the amount of raisin bran in a box and the duration of flights from Orlando to San Diego. Grade point average (GPA) is a continuous variable. We could report the GPA of a particular student as 3.2576952. The usual practice is to round to 3 places—3.258. Typically, continuous variables result from measuring.
Define Scatter Diagram
One graphical technique we use to show the relationship between variables is called a scatter diagram. To draw a scatter diagram, we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis). Usually one variable depends to some degree on the other.
Define Ordinal Level of Measurement
Ordinal level data is also used to rank items in a list. In summary, the properties of an ordinal level of measurement are: 1. Data are represented by an attribute. A student's rating of a professor and a state's "business climate" are examples. 2. The data can only be ranked or ordered, because the ordinal level of measurement assigns relative values such as high, medium, or low; or good, better, or best. 3. The data classifications are mutually exclusive and exhaustive.
Define Correlation Coefficient of Two Variables
Originated by Karl Pearson about 1900, the correlation coefficient describes the strength of the linear relationship between two sets of interval-scaled or ratio-scaled variables. Designated r, it is often referred to as Pearson's r and as the Pearson product-moment correlation coefficient. It can assume any value from -1.00 to +1.00 inclusive. A correlation coefficient of -1.00 or +1.00 indicates perfect correlation. +1.00 means the two variables are perfectly related in a positive linear sense. -1.00 means the two variables are perfectly related in an inverse linear sense. 0 means there is absolutely no relationship between the two variables. A correlation coefficient r close to 0 (say, .08) shows that the linear relationship is quite weak. The same conclusion is drawn if r = -0.8. Coefficients of -.91 and +.91 have equal strength; both show very strong correlation between the two variables. Thus, the strength of the correlation does not depend on the direction (either - or +).
Define 3 Variables Needed for Appropriate Sample Size (for Population Mean)
Our decision is based on three variables: 1. The margin of error the researcher will tolerate. 2. The level of confidence desired, for example, 95%. 3. The variation or dispersion of the population being studied.
Define Law of Large Numbers (Empirical Probability)
Over a large number of trials, the empirical probability of an event will approach its true probability.
Define Characteristics/Assumptions of Poisson Probability Distribution
POISSON PROBABILITY EXPERIMENT 1. The random variable is the number of times some event occurs during a defined interval. 2. The probability of the event is proportional to the size of the interval. 3. The intervals do not overlap and are independent.
Define Perfect Prediction
Perfect prediction, which is finding the exact outcome, in economics and business is practically impossible.
Definition and Interpretation of Standard Error of Estimate
Proper Definition: A measure of the dispersion, or scatter, of the observed values around the line of regression for a given value of X. If the standard error of estimate is small, this indicates that the data are relatively close to the regression line and the regression equation can be used to predict Y with little error. If the standard error of estimate is large, this indicates that the data are widely scattered around the regression line, and the regression equation will not provide a precise estimate of Y. A measure that describes how precise the prediction of Y is based on X or, conversely, how inaccurate the estimate may be. This measure is the standard error of estimate, symbolized by Sy.x. The subscript y.x is interpreted as the standard error of y for a given value of x. It is the same concept as the standard deviation. The SD measures the dispersion around the mean. The standard error of estimate measures the dispersion about the regression line for a given value of X.
Define Dummy Variable
Qualitative variables describe a particular quality, such as male or female. To use a qualitative variable in regression analysis, we use a scheme of dummy variables in which one of the two possible conditions is coded 0 and the other 1. Alternative definition: A variable in which there are only two possible outcomes. For analysis, one of the outcomes is coded a 1 and the other a 0.
Define Quartile, Decile, Percentile
Quartiles divide a set of observations into four equal parts. The first quartile, usually labeled Q1, is the value below which 25% of the observations occur, and the third quartile, usually labeled Q3, is the value below which 75% of the observations occur. Similarly, deciles divide a set of observations into 10 equal parts and percentiles into 100 equal parts.
Define Empirical Rule for Standard Error of Estimate
Recall from Chapter 7 that if the values follow a normal distribution, then the mean plus or minus one standard deviation will encompass 68% of the observations, the mean plus or minus two standard deviations will encompass 95% of the observations, and the mean plus or minus three standard deviations will encompass virtually all of the observations. The same relationship exists between the predicted values Y hat and the standard error of estimate 1. Y hat +/- sy.x will include the middle 68% of the observations. 2. Y hat +/- 2*sy.x will include the middle 95% of the observations. 3. Y hat +/- 3*sy.x will include virtually all the observations.
Define Type I Error (Level of Significance)
Rejecting the null hypothesis, H0, when it is true. The probability of committing a Type I error is denoted by Greek letter alpha.
Definition and Interpretation of F Statistic in Global Test
Remember that the F-statistic tests the basic null hypothesis that two variances or, in this case, two mean squares are equal. In our global multiple regression hypothesis test, we will reject the null hypothesis, H0, that all regression coefficients are zero when the regression mean square is larger in comparison to the residual mean square. If this is true, the F-statistic will be relatively large and in the far right tail of the F-distribution, and the p-value will be small, that is, less than our choice of our significance level of 0.05. Thus, we will reject the null hypothesis.
Define Reasons to Study Statistics
So why is statistics required in so many majors? The first reason is that numerical information is everywhere. A second reason for taking a statistics course is that statistical techniques are used to make decisions that affect our daily lives. A third reason for taking a statistics course is that the knowledge of statistical methods will help you understand how decisions are made and give you a better understanding of how they affect you. In summary, there are at least three reasons for studying statistics: (1) data are everywhere, (2) statistical techniques are used to make many decisions that affect our lives, and (3) no matter what your career, you will make professional decisions that involve data. An understanding of statistical methods will help you make these decisions more effectively.
Define 5-Step Procedure/Assumptions for Hypothesis Testing
Step 1: State null (H0) and alternate (H1) hypotheses. Step 2: Select a level of significance, that is alpha. Step 3: Identify an appropriate test statistic. Step 4: Formulate a decision rule based on steps 1, 2, and 3. Step 5: Take a sample, arrive at a decision -----> Do not reject H0, OR, Reject H0 and accept H1
Define Primary Units
Subdivisions created through Cluster Sampling are often called primary units.
Define Poisson Probability Distribution
The Poisson probability distribution describes the number of times some event occurs during a specified interval. The interval may be time, distance, area, or volume. The Poisson probability distribution is always positively skewed and the random variable has no specific upper limit. As mu becomes larger, the Poisson distribution becomes more symmetrical.
Define Assumptions of Poisson Probability Distribution
The Poisson probability distribution is always positively skewed and the random variable has no specific upper limit. As mu becomes larger, the Poisson distribution becomes more symmetrical. The distribution is based on two assumptions. The first assumption is that the probability is proportional to the length of the interval. The second assumption is that the intervals are independent. To put it another way, the longer the interval, the larger the probability, and the number of occurrences in one interval does not affect the other intervals. This distribution is also a limiting form of the binomial distribution when the probability of a success is very small and n is large. It is often referred to as the "law of improbable events," meaning that the probability, pi, of a particular event's happening is quite small. The Poisson distribution is a discrete probability distribution because it is formed by counting.
Define Chebyshev's Theorem
The Russian mathematician P. L. Chebyshev (1821-1894) developed a theorem that allows us to determine the minimum proportion of the values that lie within a specified number of standard deviations of the mean. For example, according to Chebyshev's theorem, at least three out of every four, or 75%, of the values, must lie between the mean plus two standard deviations and the mean minus two standard deviations. This relationship applies regardless of the shape of the distribution. Further, at least eight of nine values, or 88.9%, will lie between plus three standard deviations and minus three standard deviations of the mean. At least 24 of 25 values, or 96%, will lie between plus and minus five standard deviations of the mean.
Define SS, SST, and SSE (ANOVA)
The SS total term is the total variation attributable to each source, SST is the variation due to the treatments, and SSE is the variation within the treatments or the random error. There are three values, or sum of squares, used to compute the test statistic F. You can determine these values by obtaining SS total and SSE, then finding SST by subtraction.
Define Advantages of Best-Subset Method
The advantages to the best-subset method is it may examine combinations of independent variables not considered in the stepwise method.
Define Alternate Hypothesis (H1)
The alternate hypothesis describes what you will conclude if you reject the null hypothesis. It is written H1 and is read "H sub one." It is also referred to as the research hypothesis. The alternate hypothesis is accepted if the sample data provide us with enough statistical evidence that the null hypothesis is false. We always conduct tests of hypothesis that refer to population parameters, never sample statistics. This means you will never see the sample mean, X bar, in the null or the alternate hypothesis.
Define Arithmetic Mean
The arithmetic mean is a widely used measure of location. It has several important properties: 1. Every set of interval-or ratio-level data has a mean. Recall from Chapter 1 that ratio-level data include such data as ages, incomes, and weights, with the distance between numbers being constant. 2. All the values are included in computing the mean. 3. The mean is unique. That is, there is only one mean in a set of data. 4. The sum of the deviations of each value from the mean is zero.
Define Properties of Mean
The arithmetic mean is a widely used measure of location. It has several important properties: 1. Every set of interval-or ratio-level data has a mean. Recall from Chapter 1 that ratio-level data include such data as ages, incomes, and weights, with the distance between numbers being constant. 2. All the values are included in computing the mean. 3. The mean is unique. That is, there is only one mean in a set of data. Later in the chapter, we will discover a measure of location that might appear twice, or more than twice, in a set of data. 4. The sum of the deviations of each value from the mean is zero. 5. The mean does have a weakness. Recall that the mean uses the value of every item in a sample, or population, in its computation. If one or two of these values are either extremely large or extremely small compared to the majority of data, the mean might not be an appropriate average to represent the data.
Define Mean Deviation
The arithmetic mean of the absolute values of the deviations from the arithmetic mean. It is a measure of the average distance between an observation and the mean of the observations. Advantage of the Mean Deviation: Uses all values in the computation, whereas range only uses maximum and minimum values. Why do we ignore the signs of the deviations from the mean? If we didn't, the positive and negative deviations from the mean would exactly offset each other, and the mean deviation would always be zero. Such a measure (zero) would be a useless statistic.
Definition and Interpretation of Residual Plot
The best regression line is passed through the center of the data in a scatter plot. In this case, you would find a good number of the observations above the regression line (these residuals would have a positive sign), and a good number of the observations below the line (these residuals would have a negative sign). Further, the observations would be scattered above and below the line over the entire range of the independent variable. • The residuals are plotted on the vertical axis and are centered around zero. There are both positive and negative residuals. • The residual plots show a random distribution of positive and negative values across the entire range of the variable plotted on the horizontal axis. • If the points are scattered and there is no obvious pattern, there is no reason to doubt the linearity assumption.
Define Binomial Probability Distribution
The binomial probability distribution is a widely occurring discrete probability distribution. One characteristic of a binomial distribution is that there are only two possible outcomes on a particular trial of an experiment. Outcomes are mutually exclusive.
Define Adjusted Coefficient of Determination
The coefficient of determination tends to increase as more independent variables are added to a multiple regression model. Each new independent variable causes the predictions to be more accurate. That, in turn, makes SSE smaller and SSR larger. Hence, R^2 increases only because the total number of independent variables increases and not because the added independent variable is necessarily a good predictor of the dependent variable. In fact, if the number of variables, k, and the sample size, n, are equal, the coefficient of determination is 1.0. In practice, this situation is rare and would also be ethically questionable. To balance the effect that the number of independent variables has on the coefficient of multiple determination, statistical software packages use an adjusted coefficient of multiple determination.
Define Interpretation of Chi-Square Value
The decision rule indicates that if there are large differences between the observed and expected frequencies, resulting in a computed X^2 of more than 7.815, the null hypothesis should be rejected. However, if the differences between fo and fe are small, the computed X^2 value will be 7.815 or less, and the null hypothesis should not be rejected. The reasoning is that such small differences between the observed and expected frequencies are probably due to chance.
Define Dependent Variable (Correlation Analysis)
The dependent variable is the variable that is being predicted or estimated. It can also be described as the result or outcome for a known value of the independent variable. The dependent variable is random. That is, for a given value of the independent variable, there are many possible outcomes for the dependent variable. In this example, notice that five different sales representatives made 20 sales calls. The result or outcome of making 20 sales calls is three different values of the dependent variable.
Definition and Interpretation of Sampling Error
The difference between a sample statistic and its corresponding population parameter. Sometimes these errors are positive values, indicating that the sample mean overestimated the population mean; other times they are negative values, indicating the sample mean was less than the population mean. If you were to determine the sum of these sampling errors over a large number of samples, the result would be very close to zero. This is true because the sample mean is an unbiased estimator of the population mean.
Define Critical Value
The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.
Define Population
The entire set of individuals or objects of interest.
Define Spearman's Rank Correlation Coefficient Test
The final nonparametric method to look at is Spearman's rank correlation coefficient test. Use this method when you want to determine if there is a relationship between two variables. The Spearman's correlation coefficient test ranks the observation values within a population and the differences in the observation values between populations. A Spearman's correlation coefficient value of .90 indicates a strong positive correlation.
Define Null Hypothesis (H0)
The first step is to state the hypothesis being tested. It is called the null hypothesis, designated H0, and read "H sub zero." The capital letter H stands for hypothesis, and the subscript zero implies "no difference." There is usually a "not" or a "no" term in the null hypothesis, meaning that there is "no change." Generally speaking, the null hypothesis is developed for the purpose of testing. We either reject or fail to reject the null hypothesis. The null hypothesis is a statement that is not rejected unless our sample data provide convincing evidence that it is false. We always conduct tests of hypothesis that refer to population parameters, never sample statistics. This means you will never see the sample mean, X bar, in the null or the alternate hypothesis.
Define Proportion
The fraction, ratio, or percent indicating the part of the sample or the population having a particular trait of interest.
Define General Rule of Addition
The general rule of addition is to compute the probability of two events, A and B, that are not mutually exclusive.
Define Rules/Assumptions of Confidence Interval using Central Limit Theorem
The higher the confidence interval, the greater the spread in the confidence interval. The results of the central limit theorem allow us to make the following general confidence interval statements using z-statistics: 1. Ninety-five percent of all confidence intervals computed from random samples selected from a population will contain the population mean. These intervals are computed using a z-statistic equal to 1.96. 2. Ninety percent of all confidence intervals computed from random samples selected from a population will contain the population mean. These confidence intervals are computed using a z-statistic equal to 1.65.
Definition and Interpretation of Levels of Confidence
The higher the confidence interval, the greater the spread in the confidence interval. These confidence interval statements provide examples of levels of confidence and are called a 95% confidence interval and a 90% confidence interval. The 95% and 90% are the levels of confidence and refer to the percentage of similarly constructed intervals that would include the parameter being estimated—in this case, the population mean. For example, for a 95% confidence interval, about 95% of the similarly constructed intervals would include the population parameter.
Define Independent Variable (Correlation Analysis)
The independent variable provides the basis for estimation. It is the predictor variable. For example, we would like to predict the expected number of copiers sold if a salesperson makes 20 sales calls. Notice that we choose this value. The independent variable is not a random number.
Define Interquartile Range
The interquartile range is the distance between the first and the third quartile. It shows the spread or dispersion of the middle 50% of the deliveries.
Define Margin of Error
The margin of error is actually the amount that is added and subtracted from the point estimate to find the endpoints of a confidence interval.
Define Variance (Probability Distribution)
The mean is a typical value used to summarize a discrete probability distribution. However, it does not describe the amount of spread (variation) in a distribution. The variance does this.
Define Mean (Expected Value in Probability Distribution)
The mean of a probability distribution is also referred to as its expected value. It is a weighted average where the possible values of a random variable are weighted by their cor-responding probabilities of occurrence.
Define Statistic
The mean of a sample, or any other measure based on sample data, is called a statistic. A statistic is a characteristic of a sample.
Define Inferential Statistics
The methods used to estimate a property of a population on the basis of a sample. You might think of inferential statistics as a "best guess" of a population value based on sample information.
Define Median
The midpoint of the values after they have been ordered from the minimum to the maximum, or the maximum to the minimum values. For the median, the data must be at least an ordinal level of measurement.
Define Characteristics/Assumptions of Null and Alternate Statements
The null hypothesis represents the current or reported condition. It is written H0: mu = 15. The alternate hypothesis is that the statement is not true, that is, H1: mu /= 15. It is important to remember that no matter how the problem is stated, the null hypothesis will always contain the equal sign. The equal sign (=) will never appear in the alternate hypothesis. Why? Because the null hypothesis is the statement being tested, and we need a specific value to include in our calculations. We turn to the alternate hypothesis only if the data suggests the null hypothesis is untrue. We always conduct tests of hypothesis that refer to population parameters, never sample statistics. This means you will never see the sample mean in the null or the alternate hypothesis.
Define Independence (Probability)
The occurrence of one event has no effect on the probability of the occurrence of another event.
Define Mutually Exclusive
The occurrence of one event means that none of the other events can occur at the same time.
Define Coefficient of Multiple Determination
The percent of variation in the dependent variable, explained by the variation in the set of independent variables, X1, X2, X3, . . . Xk. SSR is the portion of the variance that is not explained by the independent variables.
Define Permutation Formula (Principles of Counting)
The permutation formula is applied to find the possible number of outcomes when there is only one group of objects.
Define Conditional Probability
The probability of a particular event occurring, given that another event has occurred.
Define Empirical Probability
The probability of an event happening is the fraction of the time similar events happened in the past.
Define Level of Significance
The probability of rejecting the null hypothesis when it is true. The level of significance is designated the Greek letter alpha. It is also sometimes called the level of risk. This may be a more appropriate term because it is the risk you take of rejecting the null hypothesis when it is really true.
Define Residual/Error Sum of Squares (SSE) in ANOVA Table
The residual sum of squares is the sum of the squared differences between the observed values of the dependent variable, Y, and their corresponding estimated or predicted values, . Notice that this difference is the error of estimating or predicting the dependent variable with the multiple regression equation.
Define Sign Test
The sign test is a nonparametric method that is particularly suited to distributions that are highly skewed, have small sample sizes, or are not normally shaped. However, it can be used on any population. It's a nonparametric test used to hypothesize about a population median.
Define Z Value / Z Score
The signed distance between a selected value, designated X, and the mean divided by the standard deviation. So, a z value is the distance from the mean, measured in units of the standard deviation.
Define Range
The simplest measure of dispersion is the range. It is the difference between the maximum and the minimum values in a data set.
Definition and Assumptions of ANOVA (Analysis of Variance)
The simultaneous comparison of several population means is called analysis of variance (ANOVA). In other words, another use of the F distribution is the analysis of variance (ANOVA) technique in which we compare three or more population means to determine whether they could be equal. To use ANOVA, we assume the following: 1. The populations follow the normal distribution. 2. The populations have equal standard deviations. 3. The populations are independent. 4. The data must be at least interval-scale. When these conditions are met, F is used as the distribution of the test statistic.
Define Slope, b
The slope or gradient of a line is a number that describes both the direction and the steepness of the line.
Define Special Rule of Multiplication
The special rule of multiplication requires that two events A and B are independent. Two events are independent if the occurrence of one event does not alter the probability of the occurrence of the other event.
Definition and Interpretation of Standard Deviation
The square root of the population variance is the population standard deviation. How to Interpret Standard Deviation: The closer it is to the mean, the more likely it makes the mean to be a reliable measure. Larger standard deviation means more dispersion in the values.
Definition and Interpretation of Standard Error
The standard error of the mean reports the variation in the distribution of sample means. It is really the standard deviation of the distribution of sample means. (How the Size of Standard Error can be affected by SD and n. Large SD = Large SE, but Large n = Small SE). The size of the standard error is affected by two values. The first is the standard deviation of the population. The larger the population standard deviation, the larger SD/root of n. If the population is homogeneous, resulting in a small population standard deviation, the standard error will also be small. However, the standard error is also affected by the number of observations in the sample. A large number of observations in the sample will result in a small standard error of estimate, indicating that there is less variability in the sample means.
Define Advantages of Stepwise Regression
The stepwise method develops the same regression equation, selects the same independent variables, and finds the same coefficient of determination as the global and individual tests described earlier in the chapter. The advantages to the stepwise method is that it is more direct than using a combination of the global and individual procedures. The advantages to the stepwise method are: 1. Only independent variables with significant regression coefficients are entered into the equation. 2. The steps involved in building the regression equation are clear. 3. It is efficient in finding the regression equation with only significant regression coefficients. 4. The changes in the multiple standard error of estimate and the coefficient of determination are shown.
Definition and Interpretation of Treatment Variation
The sum of the squared differences between each treatment mean and the grand or overall mean. If there is considerable variation among the treatment means, it is logical that this term will be large. If the treatment means are similar, this term will be a small value. The smallest possible value would be zero. This would occur when all the treatment means are the same.
Define Sum of Squares (SS, Total Variation)
The term "SS" located in the middle of the ANOVA table refers to the sum of squares. Notice that there is a sum of squares for each source of variation (treatments/regression, residual/error). The sum of squares column shows the amount of variation attributable to each source. The total variation of the dependent variable, Y, is summarized in SS total.
Define Mean Square, MST, and MSE (ANOVA)
The term mean square is another expression for an estimate of the variance. The mean square for treatments is SST divided by its degrees of freedom. The result is the mean square for treatments and is written MST. Compute the mean square error in a similar fashion. To be precise, divide SSE by its degrees of freedom. To complete the process and find F, divide MST by MSE.
Define Meaning & Assumptions of F Distribution
The test statistic for several situations follows this probability distribution. It is used to test whether two samples are from populations having equal variances, and it is also applied when we want to compare several population means simultaneously. The simultaneous comparison of several population means is called analysis of variance (ANOVA). In both of these situations, the populations must follow a normal distribution, and the data must be at least interval-scale.
Define Tree Diagram
The tree diagram is a graph that is helpful in organizing calculations that involve several stages. Each segment in the tree is one stage of the problem. The branches of a tree diagram are weighted by probabilities.
Define Uniform Probability Distribution
The uniform probability distribution is perhaps the simplest distribution for a continuous random variable. This distribution is rectangular in shape and is defined by minimum and maximum values (a and b). The height of the distribution is constant or uniform for all values between a and b.
Define Mode
The value of the observation that appears most frequently. In summary, we can determine the mode for all levels of data—nominal, ordinal, interval, and ratio. The mode also has the advantage of not being affected by extremely high or low values. The mode does have disadvantages, however, that cause it to be used less frequently than the mean or median. For many sets of data, there is no mode because no value appears more than once.
Define Homoscedasticity
The variation between the actual and predicted values of the dependent variable must be approximately the same for all estimated values. The variation around the regression equation is the same for all of the values of the independent variables. To check for homoscedasticity the residuals are plotted against the fitted values of Y. This is the same residual plot graph that we used to evaluate the assumption of linearity. (See the graph on the left on page 464).
Definition of Y-Intercept, a
The y-intercept of a graph is the point where the graph crosses the y-axis. We know that the x-coordinate of any point on the y-axis is 0. So the x-coordinate of a y-intercept is 0. In Multiple Regression: The y-intercept indicates that the regression equation (plane) intersects the Y-axis at the value that it is, let's say, 21.
Define Levels of Measurement
There are four levels of measurement: nominal, ordinal, interval, and ratio. Nominal and Ordinal are for Qualitative Variables, whereas Interval and Ratio are always for Quantitative Variables.
Define Bivariate Data
There are situations where we wish to study and visually portray the relationship between two vari-ables. When we study the relationship between two variables, we refer to the data as bivariate.
Define NonParametric Methods/Tests (Distribution-Free Methods)
There are tests available in which no assumption regarding the shape of the population is necessary. These tests are referred to as nonparametric. In this situation, the assumption of a normal population is not necessary. Nonparametric methods are also referred to as distribution-free methods.
Define Relationships between Population Distribution and Sampling Distribution (of the Sample Mean)
This example illustrates important relationships between the population distribution and the sampling distribution of the sample mean: 1. The mean of the sample means is exactly equal to the population mean, given that all sample combinations are covered. Even if all are not covered, it is close to the population mean. 2. The dispersion of the sampling distribution of sample means is narrower than the population distribution. 3. The sampling distribution of sample means tends to become bell-shaped and to approximate the normal probability distribution with the more and more samples we take.
Define Complement Rule (Probability)
This is the complement rule. It is used to determine the probability of an event occurring by subtracting the probability of the event not occurring from 1. This rule is useful because sometimes it is easier to calculate the probability of an event happening by determining the probability of it not happening and subtracting the result from 1. Notice that the events A and -A are mutually exclusive and collectively exhaustive. Therefore, the probabilities of A and -A sum to 1.
Definition and Interpretation of p-value in ANOVA
This is the probability of finding a value of the test statistic this large or larger when the null hypothesis is true. To put it another way, it is the likelihood of calculating an F value larger than 8.99 with 3 degrees of freedom in the numerator and 18 degrees of freedom in the denominator. So when we reject the null hypothesis in this instance, there is a very small likelihood of committing a Type I error! (page 374)
Define Assumptions Before Two-Sample Test of Proportions
To conduct the test, we assume each sample is large enough that the normal distribution will serve as a good approximation of the binomial distribution. The test statistic follows the standard normal distribution.
Define 3 Variables Needed for Appropriate Sample Size (for Population Proportion)
To determine the sample size for a proportion, the same three variables need to be specified (same as finding sample size for population mean): 1. The margin of error. 2. The desired level of confidence. 3. The variation or dispersion of the population being studied.
Define Assumptions Before Developing Confidence Interval for a Proportion
To develop a confidence interval for a proportion, we need to meet the following assumptions. 1. The binomial conditions, discussed in Chapter 6, have been met. Briefly, these conditions are: a. The sample data is the number of successes in n trials. b. There are only two possible outcomes. (We usually label one of the outcomes a "success" and the other a "failure.") c. The probability of a success remains the same from one trial to the next. d. The trials are independent. This means the outcome on one trial does not affect the outcome on another. 2. The values nPi and n(1 - Pi) should both be greater than or equal to 5. This condition allows us to invoke the central limit theorem and employ the standard normal distribution, that is, z, to complete a confidence interval.
Define Interpretation of Y Intercept/Constant (a) and Regression Coefficients (b, b1, b2, etc.) with Example
To illustrate the interpretation of intercept and 2 regression coefficients, suppose the selling price of a home is directly related to the number of rooms and inversely related to its age. We let X1 refer to the number of rooms, X2 to the age of the home in years, and Y to the selling price of the home in $000. Suppose the regression equation is: Yˆ = 21.2 + 18.7X1 - 0.25X2. The intercept, 21.2, indicates that the regression equation (plane) intersects the Y-axis at 21.2. This happens when both the number of rooms and the age of the home are zero. We could say that $21,200 is the average value of a property without a house. The first regression coefficient, 18.7, indicates that for each increase of 1 room in the size of a home, the selling price will increase by 18.7 ($18,700), regardless of the age of the home. The second regression coefficient, -0.25, indicates that for each increase of one year in age, the selling price will decrease by .25 ($250), regardless of the number of rooms.
Define Assumptions Before Hypothesis Testing/Creating Confidence Interval for a Proportion
To test a hypothesis about a population proportion, a random sample is chosen from the population. The test is appropriate when both n and n(1-pi) are both greater than or equal to 5. n is the sample size, and pi is the population proportion. This condition allows us to invoke the central limit theorem and employ the standard normal distribution, that is, z. It is also assumed that the binomial assumptions are met: (1) the sample data collected are the result of counts; (2) the outcome of an experiment is classified into one of two mutually exclusive categories—a "success" or a "failure"; (3) the probability of a success is the same for each trial; and (4) the trials are independent, meaning the outcome of one trial does not affect the outcome of any other trial.
Define Paired Sample and Assumptions of Paired t Test
Values from 2 different samples are dependent on each other. We assume the distribution of the population of differences follows the normal distribution. The test statistic follows the t distribution. Since this is computed as if it were 1 sample, degrees of freedom are n - 1.
Define Two-Tailed Test
We also wish to use a two-tailed test. That is, we are not concerned whether the sample results are larger or smaller than the proposed population mean. Rather, we are interested in whether it is different from the proposed value for the population mean. Null Hypothesis: mu = 200 Alternate Hypothesis: mu /= 200
Define Global Test
We can test the ability of the independent variables X1, X2, . . . , Xk to explain the behavior of the dependent variable Y. To put this in question form: Can the dependent variable be estimated without relying on the independent variables? The test used is referred to as the global test. Basically, it investigates whether it is possible all the independent variables have zero regression coefficients. Uses Hypothesis Test, with H0: Beta1 = Beta2 = Beta3 = 0 H1: Not all the Beta's are 0. If the null hypothesis is true, it implies the regression coefficients are all zero and, logically, none of the independent variables can be used to estimate the dependent variable. We employ the F distribution and use the .05 level of significance.
Define Skewness (of Data)
We define skewness as the lack of symmetry in a set of data. A distribution of data can either be symmetrical, positively skewed (dips to the right), or negatively skewed (dips to the left).
Define Cumulative Binomial Probability Distribution
We may wish to know the probability of correctly guessing the answers to 6 **or more** true/false questions out of 10. Or we may be interested in the probability of selecting **less than two** defectives at random from production during the previous hour. In these cases, we need cumulative frequency distributions similar to the ones developed in Chapter 2.
Define Class Midpoint
We will use two other terms frequently: class midpoint and class interval. The midpoint is halfway between the lower limits of two consecutive classes. It is computed by adding the lower limits of consecutive classes and dividing the result by 2. The mid-point of $400 best represents, or is typical of, the profits of the vehicles in that class.
Define Contingency Table
What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case, we tally the results in a contingency table. A contingency table is a cross-tabulation that simultaneously summarizes two variables of interest and their relationship. Both can be nominal, or both can be ordinal, or one could be nominal and the other ordinal.
Define Stratified Random Sampling & Strata
When a population can be clearly divided into groups based on some characteristic, we may use stratified random sampling. It guarantees each group is represented in the sample. The groups are also called strata. Samples are randomly selected from *all* groups - this is how it differs from Cluster Random Sampling. In some cases, Stratified Data also has the benefit of more accurately reflecting the characteristics of the population than does simple random or systematic random sampling.
Define Autocorrelation
When successive residuals are correlated, we refer to this condition as autocorrelation. Autocorrelation frequently occurs when the data are collected over a period of time. The fifth assumption about multiple regression and correlation analysis is that successive residuals should be independent. This means that there is not a pattern to the residuals, the residuals are not highly correlated, and there are not long runs of positive or negative residuals. Autocorrelation is a violation of this assumption.
Define Qualitative Variable
When the characteristic being studied is non-numeric, it is called a qualitative variable or an attribute. Examples of qualitative variables are gender, religious affiliation, type of automobile owned, state of birth, and eye color. When the data are qualitative, we are usually interested in how many or what percent fall in each category. For example, what percent of the population has blue eyes?
Define Quantitative Variable
When the variable studied can be reported numerically, the variable is called a quantitative variable. Examples of quantitative variables are the balance in your checking account, the ages of company presidents, the life of an automobile battery (such as 42 months), and the number of children in a family.
Define Rule for Degrees of Freedom When Estimating Population Parameters Using Sample Data
When we estimate population parameters from sample data, we lose a degree of freedom for each estimate. So we lose two more degrees of freedom for estimating the population mean and the population standard deviation. Thus the number of degrees of freedom in this problem is 5, found by k - 2 - 1 = 8 - 2 - 1 = 5. (page 511) To expand, if we know the mean and the standard deviation of a population and wish to find whether some sample data conform to a normal, the DF is k-1. On the other hand, suppose we have sample data grouped into a frequency distribution, but we do not know the value of the population mean and the population standard deviation. In this case, the DF is k - 2 -1. In general, when we use sample statistics to estimate population parameters, we lose a degree of freedom for each parameter we estimate. This is parallel to the situation in Section 14.4 of the chapter on multiple regression where we lost a DF in the denominator of the F statistic for each independent variable considered.
Define Sample Size
When working with confidence intervals, one important variable is sample size. However, in practice, sample size is not a variable. It is a decision we make so that our estimate of a population parameter is a good one. Larger sample size means less error and more accurate results. Dispersion (standard deviation) decreases with larger sample size. Smaller sample sizes also result in non-normal distributions.
Define Relationship Between Binomial and Poisson
Which answer is correct? Why should we look at the problem both ways? The binomial is the more "technically correct" solution. The Poisson can be thought of as an approximation for the binomial, when n, the number of trials is large, and pi, the probability of a success, is small. We look at the problem using both distributions to emphasize the convergence of the two discrete distributions. In some instances, using the Poisson may be the quicker solution, and as you see there is little practical difference in the answers. In fact, as n gets larger and pi smaller, the differences between the two distributions gets smaller.
Define Reason Why Dependent Samples are Better Than Independent Samples
Why do we prefer dependent samples to independent samples? By using dependent samples, we are able to reduce the variation in the sampling distribution. In sum, when we can pair or match observations that measure differences for a common variable, a hypothesis test based on dependent samples is more sensi-tive to detecting a significant difference than a hypothesis test based on independent samples. In the case of comparing the property valuations by Schadek Appraisals and Bowyer Real Estate, the hypothesis test based on dependent samples eliminates the variation between the values of the properties and focuses only on the comparisons in the two appraisals for each property. There is a bit of bad news here. In the dependent samples test, the degrees of freedom are half of what they are if the samples are not paired. For the real estate example, the degrees of freedom drop from 18 to 9 when the observations are paired. However, in most cases, this is a small price to pay for a better test. (page 347)
Define Assumptions Before Two-Sample Pooled Test of Means, SD Unknown
With the following assumptions: 1. We assume the sampled populations have equal but unknown standard deviations. Because of this assumption, we combine or "pool" the sample standard deviations. 2. We use the t distribution as the test statistic. 3. The sampled populations follow the normal distribution. 4. The sampled populations are independent. 5. The standard deviations of the two populations are equal.
Define Coordinates (in a Frequency Polygon)
X and the Y values of this point are called the coordinates.
Define Relative Class Frequencies
You can convert class frequencies to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class total and the total number of observations. (basically used for finding percentage)
Define Central Limit Theorem
(Basically just means the larger the sample, the more symmetrical the distribution) The central limit theorem states that, for large random samples, the shape of the sampling distribution of the sample mean is close to the normal prob-ability distribution. The approximation is more accurate for large samples than for small samples. This is one of the most useful conclusions in statistics. We can reason about the distribution of the sample mean with absolutely no information about the shape of the population distribution from which the sample is taken. In other words, the central limit theorem is true for all distributions.
Definition and Interpretation of Testing Individual Regression Coefficients (Global Test)
(In Global Test) Why is it important to know if any of the Beta's equal 0? If a Beta could equal 0, it implies that this particular independent variable is of no value in explaining any variation in the dependent value. If there are coefficients for which H0 cannot be rejected, we may want to eliminate them from the regression equation. Our strategy is to establish three sets of hypotheses — one for X1, one for X2, and one for X3. H0: Beta1 = 0 H1: Beta1 /= 0 H0: Beta2 = 0 H1: Beta2 /= 0 H0: Beta3 = 0 H1: Beta3 /= 0
Define Event (Probability)
A collection of one or more outcomes of an experiment.
Define Outcome (Probability)
A particular result of an experiment.
Define Variance
The arithmetic mean of the squared deviations from the mean.
Define Characteristics/Assumptions of Normal Probability Distribution
• It is bell-shaped and has a single peak at the center of the distribution. The arithmetic mean, median, and mode are equal and located in the center of the distribution. The total area under the curve is 1.00. Half the area under the normal curve is to the right of this center point and the other half to the left of it. • It is symmetrical about the mean. If we cut the normal curve vertically at the center value, the two halves will be mirror images. • It falls off smoothly in either direction from the central value. That is, the distribution is asymptotic: The curve gets closer and closer to the X-axis but never actually touches it. To put it another way, the tails of the curve extend indefinitely in both directions. • The location of a normal distribution is determined by the mean. The dispersion or spread of the distribution is determined by the standard deviation.