5. Special Distributions, the Sample Mean, the Central Limit Theorem

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

$\mu$, $\sigma^2$

A parameter is a constant indexing a family of distributions. Indexing a family of distributions means that the parameters allow you to distinguish between the distributions in the given family. Thus, giving you the family restricts the distribution to that set and then the parameter allows you to uniquely identify a distribution in that set.

C; Similar to the binomial, the hypergeometric distribution models a sequence of success/failure trials. However, the key difference is that in the binomial models the case of sampling with replacement (where the probability of success remains constant over time), while the hypergeometric models the case of sampling without replacement (in other words, the probability of success changes over time depending on the number of successes and failures already drawn). A standard deck of cards contains 26 red and 26 black cards. A hypergeometric distribution models the outcome of 10 draws from the deck where cards are not replaced after each draw. If cards were replaced after each draw, then the sequence of 10 draws could be modeled using the binomial.

According to the description given in class, which of the following are likely to be characterized by the hypergeometric distribution? A. The number of red cards drawn from a regular deck 52 cards, where cards are replaced in successive draws B. At an ice cream shop, the number of customers out of the next 100 that choose chocolate C. The number of red cards drawn from a regular deck 52 cards, where cards are not replaced in successive draws D. The number of shots a basketball player makes out of the next 50 shot attempts

E(θ^)=θ for all θ

An estimator is unbiased if:

\sigma^2 \ n

For a sample of size n from an i.i.d distribution with variance σ2, which expression is the variance of the sample mean?

Parameter

For all families of distributions, estimation is trying to determine the specific ______ of a distribution.

No effect on the expectation of the sample mean

For an i.i.d. distribution, how does the expectation of sample mean of n random variables drawn from this distribution vary with n?

A, D; If we consider repeated sampling of n data points from U[0,θ] and construct the statistics of 2*(sample mean) or 2*(sample median), they will trace a histogram that is in the limit (of infinite samples of n points each) symmetric and centered around θ. Thus is the intuition for why they are unbiased: more formally expectation of 2*(sample mean) or 2*(sample median) will be theta. Computing the maximum of the nth order statistic is "consistent", i.e. in the limit of increasing n gets closer and closer to theta. However, it is not unbiased in that fixed samples of n points will always give an n-order statistic less than theta, and thus the expectation of the n-order statistic over all samples of npoints will be less than theta. More intuitively, perhaps, the n-order statistic can always underestimate theta but has no chance of overestimating theta: this is the sense in which its biased downwards. Having R generate a single random value of the underlying distribution will only regenerate that distribution in repeated sample. The expected value of this statistic would thus be the expected value of U[0,θ], i.e. θ2. (We could, however, multiply the random value by 2 to get an unbiased estimator of theta. However, this has convergence properties that are far worse than first taking a sample mean or median of several points and multiplying by 2).

For the example distribution in lecture, which of the following methods would result in an unbiased estimate of θ? (Select all that apply.) A. Compute the median of the sample and multiply by 2 B. Compute the maximum (nth order statistic) of the sample C. Have R generate a random value from the underlying distribution. D. Compute the sample mean and multiply by 2 correct

C; With a higher lambda, the mass of the distribution moves right (in a more positive direction). In the diagram shown above, C shows the probability distribution with the largest lambda ( λ). Note that, as discussed in class, as lambda increases ( λ), the Poisson distribution begins to be approximated by the binomial distribution.

For the next question, take a look at the following three Poisson variables. Based on what you know about the visual representation of the probability distribution of a Poisson distribution, which of the three distributions has the highest lambda?

.5; variance = p * q

For which value(s) p∈[0,1] does Bernoulli variable with probability of success p have maximum variance?

0, 1

For which value(s) p∈[0,1] does Bernoulli variable with probability of success p have minimum variance?

If x describes the waiting time for some event, then the probability distribution of x at t=0 is the same as the probability distribution of x at time t=1 or t=100 when the event has not occurred, for example.

In the context of the exponential distribution, what is meant by memorylessness?

.75

Let x=0.75. Without typing into your R console, what should you get for an output of qnorm(pnorm(0.75,lower.tail=TRUE),lower.tail=TRUE)?

.047; dhyper(5, 13, 39, 10)

Let's look more closely at the example of a deck of 52 cards, where 13 are clubs, 13 are diamonds, 13 are spades, and 13 are hearts. Suppose that you sample 10 cards from the deck without replacing the cards. What is the probability that exactly five of the cards are hearts?

A; While gamma ( γ), is fixed, lambda ( λ) is a function of the window of time that you are interested in. Recall that λ=γ∗t. λ will be 90*γ for the 90 minute window, 45*γ for the 45 minute window, and 30*γ for the 30 minute window. λ is the smallest for the 30 minute window, which would be represented by the probability mass function labeled A. Of course it makes perfect sense when you think about it: you are less likely to see a goal in 30 minutes than in 90 minutes!

Now suppose that we want to plot the number of shots on goal during a soccer match. Suppose that there is a fixed propensity for a shot on goal in any given minute of the match. Suppose that the total match is 90 minutes, and you want to plot three probability distributions that represent the total number of shots on goal in 30 minutes, 45 minutes, and 90 minutes. Suppose that the 3 curves represent the 3 distributions. Which one would represent the probability distribution for the 30 minute window?

θ usually refers to a parameter relevant to the underlying distribution, while θ^ refers to its estimation in a finite sample

Per the notational convention, what is the usual relationship between θ and θ^?

D

Suppose a soccer team's goal-scoring X in each of their games follows a (fixed) Poisson distribution. What does the P(X=2λ) capture? A. The probability that a game, the soccer team will have to play at least twice the expected number of games they will have to play to score their first goal B. The probability that in a game, the soccer team scores at least twice their expected number of goals C. The probability that a game, the soccer team will have to play twice the expected number of games they will have to play to score their first goal D. The probability that in a game, the soccer team scores twice their expected number of goals

.2 * 6

Suppose that there is a one lane road where only one bicycle can pass through at any given point. Suppose that you know that the propensity to arrive in any given minute is 0.2. What is the expectation of the number of bicycles that will pass on the road in a 30 minute period?

p; The expectation of a Bernouilli variable X is given by p.

Suppose that you have a Bernouilli variable X with some probability of success given by p and some probability of failure given by q. The mean of X is given by:

True; A function of random variables must be a random variable. Being the arithmetic average of n random variables makes the sample mean a function of random variables, so the sample mean must be a random variable.

T/F; When the sample mean is defined as the arithmetic average of n random variables from random sample of size n, the sample mean will also be a random variable.

Each Xi need not be approximately normally distributed, but the sample mean X¯=∑iXi/n will be approximately normally distributed.

The Central Limit Theorem (CLT) implies that were one to draw n samples X1,...Xn independently and identically, then for reasonably large n...

False; The function of a random sample is the estimator. The realization of the function of the random sample is the estimate, so the estimates are the realizations of applying the estimators to random samples.

True or False: Estimators are the realizations of applying estimates to random samples.

False; normal!

True or False: If you have a set of i.i.d normal random variables, then any linear combination of these variables will follow a uniform distribution.

True; The expected value of randomly drawing a number from a normal distribution would be the mean of that distribution, so E(θ^)=θ.

True or False: Suppose you are able to generate random numbers from a normal distribution N(μ,σ2) of unknown mean μ. If μ^ is the random variable whose realizations are the individual numbers generated, μ^ is an unbiased estimator for μ. True False

True

True or False: Taking a linear transformation of a normally-distributed random variable generates a normally-distributed random variable. In other words, if X1 is normally-distributed and X2=a+b∗X1 for b≠0, then X2 is also normally-distributed.

False; you can show it through proof

True or False: To prove an estimator is unbiased, you need to know the value of the parameter it is trying to estimate.

True; A parameter is a constant indexing a family of distributions. Indexing a family of distributions means that the parameters allow you to distinguish between the distributions in the given family. Thus, giving you the family restricts the distribution to that set and then the parameter allows you to uniquely identify a distribution in that set.

True or False: You can uniquely identify a given distribution if you know the family of distributions it is from (ex. Normal, uniform etc.) and the value of the relevant parameters for that family.

Standardization

What is the process of subtracting the mean of a distribution and dividing by the square root of its variance?

B, C, D; The Poisson distribution characterizes a series of events where occurrences can be counted in whole numbers, the occurrences are independent, and the average frequency of occurrences for a given time period is known.

Which of the following are requirements for a series of events to be effectively modeled according to the Poisson distribution? (Select all that apply) A. The probability of the occurrence versus not happening is 50/50 B. Occurrences of the event must be countable and measureable C. Each of the events are independent D. The average frequency of occurrences is known for a certain time period

A; A binomial distribution refers the number of successes in a sequence of n success/failure trials (meaning, there are only two outcomes) where the probability of success is constant across trials and given by p.

Which of the following is the appropriate interpretation of the Binomial distribution? A. The number of successes in a sequence of n success/failure trials, each of which has the same probability of success, p B. The number of successes in a sequence of n success/failure trials, each of which may have a different associated probability, p1,...,pn C. The distribution of the different probabilities of success, p1,...,pn for a series of n success/failure trials D. The distribution of a series of trials, each with any number of outcomes 1,...,k and associated probabilities p1,...,pk

A, B

Which of the following is true about a sample mean? (Select all that apply) A. It can be described as the arithmetic average of n random variables from a random sample of size n. B. It can be described as the arithmetic average of the realizations of n random variables. C. It only applies to random variables from normal distributions D. It only applies to random variables from uniform distributions

NaN; qnorm gives CDF, needs 0 to 1 range

Without typing into your R console, what should you get for an output of the R code pnorm(qnorm(2.1, lower.tail=TRUE))?


Ensembles d'études connexes

Types of life insurance policies

View Set

Chapter 51: Concepts of Care for Patients With Noninflammatory Intestinal Disorders

View Set

Chapter 51: PrepU - Nursing Assessment: Integumentary Function

View Set

BIO 169 Urinary system + fluid, electrolytes, and acid-base imbalances

View Set

Chapter 2 Exam -- Life Provisions

View Set

Check your understanding Cisco chapters 1-7 midterm

View Set

Bright Romanticism: American Individualism

View Set