Stats-chapter 7-estimation and sampling distributions
Unbiased Estimator
* Of a population parameter is a statistic whose average (mean) across all possible random samples of a given size equals the value of the parameter.
Variance estimate
*An unbiased estimator can be obtained from sample data by dividing the sum of squares by N - 1 *The resulting value constitutes the sample estimate of the population variance . *By reducing the size of the denominator, the subtraction of 1 from N makes the variance estimate larger than the sample variance and, hence, corrects for the tendency of the sample variance to underestimate the population variance. *on average the variance estimate will be closer to the population variance than will the sample variance.
Mean Square
*Any sum of squares divided by its associated degrees of freedom *Formally known as a variance estimate.
Degrees of Freedom
*Are the number of pieces of information that are "free of each other" in the sense that that they cannot be deduced from one another. *the last deviation score to be computed is not free to vary but is determined by the other deviation scores. *a sum of squares around a sample mean will always have N-1 degrees of freedom associated with it. *As the degrees of freedom associated with an estimate increase, the accuracy of the estimate also tends to increase. *Technically, the accuracy of a variance estimate (and, thus, a standard deviation estimate) is not a function of the sample size (N) but rather a function of the degrees of freedom (N-1)-that is, the number of independent pieces of information-used in calculating such an estimate.
The mean of the Sampling Distribution of the mean
*As stated in central limit theorem, the mean of a sampling distribution of the mean will always be equal to the population mean (of the raw scores) *When we select sample of a given size, some of the sample means will overestimate the true population mean and others will underestimate it. However, when we average all of these, the underestimations will cancel the overestimations, with the result being the true population mean.
Summary of Notation and Formulas for the mean and measures of variability
*Different notation is used to represent sample and population values of the mean, the variance, and the standard deviation *If we had scores for ALL members of a population, we would use the formulas for population values; however this situation is extremely rare. *If we had scores for a subset of a population and were interested in describing only that subset WITHOUT making inferences to the population, we would use the formulas for sample values; however this situation is also extremely rare. *MOST common focus in behavioral science research is on estimating population parameters from sample data. - to do this we use the formulas for sample estimates of population values.
The standard deviation of the sampling distribution of the mean
*If the standard error of the mean is small, then the sample means based on a given sample size (N) will tend to be similar and all will tend to be close to the population mean. *If the standard error of the mean is large, then the sample means based on samples of a given size will tend to differ from one another, and only some will be close to the population mean.
Central limit theorem
*If we assume the distribution of the sample means is based on random samples independent of observations, insight into the mean and standard deviation of a sampling distribution of the mean, as well as its shape, is provided by this important formula.
Types of sampling distributions
*It is also possible to conceptualize sampling distributions for other statistics, such as the mode, the median, and the variance. *Given the same population, the sampling distribution of the mean will show less variability (that is, it will have a smaller standard error) than either the sampling distribution of the mode or the sampling distribution of the median. *It is for this reason that the mean is usually preferred by statisticians as a measure of central tendency.
Sampling error
*Reflects the fact that sample values are likely to differ from population values because they are based on only a portion of the overall population; *It does NOT imply that mistakes have been made in the collection and analysis of the data. *can be represented as the difference between the vale of a sample statistic (in the present context, x bar) and the value of the corresponding population parameter (in the present context (muu) *In practice, an investigator does not know the value of the population parameter, so it is impossible to compute the exact amount of sampling error that occurs.
Sample mean
*Some sample means overestimate the true population mean, whereas other underestimate it. Across all sample sizes, the overestimations cancel the underestimations and the AVERAGE of the many sample means equals the true population mean. *Sample mean is said to be an UNBIASED ESTIMATOR of the population mean.
Sampling Distribution of the Mean
*The distribution of the means for all possible random samples of a given size. *The key to understanding the sampling distribution of the mean is to realize that sampling distributions are theoretical in nature, in that it is virtually never possible to obtain scores for an entire population given the very large population sizes that behavioral science research is concerned with.
Standard deviation estimate
*The sample estimate of the population standard deviation *is the positive square root of s hat squared. *Paralleling the case for the variance estimate, the standard deviation estimate, on average, will be closer to the population standard deviation than the sample standard deviation will be.
Standard error of the mean
*The standard deviation of a sampling distribution of the mean. *Reflects the accuracy with which sample means estimate a population mean.
Finite versus infinite populations
*The use of sample data to make inferences about populations is fundamental to statistical techniques used in the behavioral sciences. *Population parameters can be estimated with reference either to small, finite populations or to populations that are so large that for all practical purposes they can be considered infinite. *Behavioral scientists typically select a set of individuals to study and then assume that these individuals are randomly drawn from some population. *Sampling from a small, finite population with replacement (where each sample member is returned to the population before the next sample member is selected) is analogous to sampling from a very large or infinite population without replacement, as behavioral science research is conceptualized as doing.
Estimation of the population variance
*Unlike the sample mean, the sample variance is not our best estimate of the true population variance *Statisticians have determined that the sample variance is a biased estimator of the population variance because it underestimates (is smaller than) the population variance across all possible samples of a given score. *HOWEVER, an unbiased estimator can be obtained from sample data by dividing the sum of squares by N -1 rather than N.
Summary of Basic Estimation Concepts
1) A sample statistic may differ from the value of it corresponding population parameter because of sampling error. 2) The sample mean is an unbiased estimator of the population mean and the variance and standard deviation estimates are unbiased estimators of the population variance and the standard deviation, respectively (separately/individually) 3) An unbiased estimator is a statistic whose average (mean) across all possible random samples of a given size equals the value of the population parameter. 4) As the degrees of freedom associated with a statistic increase, the more accurate that statistic will be in estimating the corresponding population parameter.
2 factors that influence the standard error of the mean:
1) Is the sample size; as the sample size increases, the standard error becomes smaller, other things being equal. *However, with smaller sample sizes, there is likely to be greater variability among the sample means because some will be much closer to the population mean than will others. 2) The variability of scores in the population influences the size of the standard error of the mean. *As sigma becomes smaller, so does the standard error, other things being equal.