CFA. Level I. R11 - Sampling and Estimation

Ace your homework & exams now with Quizwiz!

Data mining

_____ occurs when analysts repeatedly use the same database to search for patterns or trading rules until one that "works" is discovered.

Sample selection bias

_____ occurs when some data is systematically excluded from the analysis, usually because of the lack of availability. This practice renders the observed sample to be nonrandom, and any conclusions drawn from this sample can't be applied to the population because the observed sample and the portion of the population that was not observed are different.

Data-mining bias

_____ refers to results where the statistical significance of the pattern is overestimated because the results were found through data mining. Following warning signs of data mining: - Evidence that many different variables were tested, most of which are unreported, until significant ones were found. - The lack of any economic theory that is consistent with the empirical results.

Central limit theorem

_____ states that for simple random samples of size n from a population with a mean μ and a finite variance σ2, the sampling distribution of the sample mean x approaches a normal probability distribution with mean μ and a variance equal to σ2/n as the sample size becomes large. Important properties of the central limit theorem include the following: - If the sample size n is sufficiently large (n ≥ 30), the sampling distribution of the sample means will be approximately normal. - The mean of the population, μ, and the mean of the distribution of all possible sample means are equal. - The variance of the distribution of sample means is σ2/n, the population variance divided by the sample size.

Stratified random sampling

_____ uses a classification system to separate the population into smaller groups based on one or more distinguishing characteristics. From each subgroup, or stratum, a random sample is taken and the results are pooled. The size of the samples from each stratum is based on the size of the stratum relative to the population.

Desirable properties of an estimator

- An unbiased estimator is one for which the expected value of the estimator is equal to the parameter you are trying to estimate. - An unbiased estimator is also efficient if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate. - A consistent estimator is one for which the accuracy of the parameter estimate increases as the sample size increases.

Standard error of the sample mean

_____ is the standard deviation of the distribution of the sample means. When the standard deviation of the population, σ, is known, the standard error of the sample mean

Look-ahead bias

_____ occurs when a study tests a relationship using sample data that was not available on the test date.

T-distribution to construct a confidence interval

If the distribution of the population is normal with unknown variance, we can use the t-distribution to construct a confidence interval.

Confidence interval for the population mean

If the population has a normal distribution with a known variance

Sampling distribution

The distribution of values taken by the statistic in all possible samples of the same size from the same population.

Cross-sectional data

_____ are a sample of observations taken at a single point in time.

Longitudinal data

_____ are observations over time of multiple characteristics of the same entity, such as unemployment, inflation, and GDP growth rates for a country over 10 years.

Point estimates

_____ are single (sample) values used to estimate population parameters. The formula used to compute the point estimate is called the estimator. For example, the sample mean, x , is an estimator of the population mean μ

Time-period bias

_____ can result if the time period over which the data is gathered is either too short or too long. If the time period is too short, research results may reflect phenomena specific to that time period, or perhaps even data mining. If the time period is too long, the fundamental economic relationships that underlie the results may have changed.

Time-series data

_____ consist of observations taken over a period of time at specific and equally spaced time intervals.

Panel data

_____ contain observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over the most recent 24 quarters.

Confidence interval

_____ estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1 − α. Here, alpha, α, is called the level of significance for the confidence interval, and the probability 1 − α is referred to as the degree of confidence.

Student's t-distribution

_____ is a bell-shaped probability distribution that is symmetrical about its mean. It is the appropriate distribution to use when constructing confidence intervals based on small samples (n < 30) from populations with unknown variance and a normal, or approximately normal, distribution. Student's t-distribution has the following properties: - It is symmetrical. - It is defined by a single parameter, the degrees of freedom (df), where the degrees of freedom are equal to the number of sample observations minus 1, n − 1, for sample means. - It has more probability in the tails ("fatter tails") than the normal distribution. - As the degrees of freedom (the sample size) gets larger, the shape of the t-distribution more closely approaches a standard normal distribution.

Simple random sampling

_____ is a method of selecting a sample in such a way that each item or person in the population being studied has the same likelihood of being included in the sample.

Systematic sampling

_____ is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point but with a fixed, periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size.

Sampling error

_____ is the difference between a sample statistic (the mean, variance, or standard deviation of the sample) and its corresponding population parameter (the true mean, variance, or standard deviation of the population)

Survivorship bias

_____ is the most common form of sample selection bias. A good example of the existence of survivorship bias in investments is the study of mutual fund performance. Most mutual fund databases, like Morningstar®'s, only include funds currently in existence—the "survivors." They do not include funds that have ceased to exist due to closure or merger.


Related study sets

Organizational Ethics - Final LEAD 3133

View Set

Chapter 23: The Evolution of Populations

View Set

Anatomy Lab Test: Exercises 8-11

View Set

Business Law 235: Chapter 9 Contract Law (Questions Whether it is Contract Law or Not)

View Set