Biometry mid term
How to write a concise statement of test conclusion
At a significance level of 0.05, we do/do not reject the null hypothesis that ... (Z = , n = , p = ).
Which estimate is likely to be closer to the mean? 89.0 lbs (Sample 1, n=5) 79.5 lbs (Sample 2, n=20) Impossible to say.
79.5 lbs (Sample 2, n=20) An estimate based on a larger sample is more likely to be closer to the true value of the parameter.
continuous random variable
A conceptual numerical quantity that, in some experiment involving randomness, will take one value from a continuous range. L is the lowest possible value (can be −∞). H is the highest possible value can be ∞). Denoted by Roman letter (X, Y, Z...). Examples: height, blood pressure, tail length, temperature, ...
induction
A deduction is any statement about nature that derives from a theory (often via mathematical reasoning).
deduction
A deduction is any statement about nature that derives from a theory (often via mathematical reasoning). theory to observation
Discrete PDF vs. Continuous PDF
A discrete variable has a probability distribution function (PDF). Uses probability for y axis A continuous variable has a probability density function (PDF). Uses probability density for its y axis
The standard normal distribution
A normal distribution with mean 0 and standard deviation 1. A variable having this distribution is represented as Z.
Parameter:
A number that describes something about a population.
What is a quantile
A quantile is a value at or below which a given proportion of a probability distribution lies. Example: The 0.10 quantile of X is the value q for which this statement is true: The probability that X ≤ q is 0.10.
Bernoulli trial
A random event with two possible outcomes, one arbitrarily defined as "success", the other as "failure". The probability of success is denoted p. The probability of failure is 1-p (or q).
How to carry out a two-sample t-test, you need two samples, one from each population
A sample of size nA from population A A sample of size nA from population A
Population:
All the individuals about which a statistical inference is to be made.
2) Using data from the sample, measure the value of an estimator. what is an estimator
An estimator is a random variable whose value approximates the parameter's value.
What is the probability distribution of the average? two parts
Answer has two parts: What kind of distribution is it? (normal, binomial, uniform, etc.) What are its parameter values?
Examine sex of next 10,000 babies born in the US. Measure X, the number of females. X is a binomial random variable with n = 10,000 and p = 0.5 (we assume). Use the 2 standard deviation rule to fill in this sentence: "We predict there is a 95% chance that the number of females will lie between ______ and ______."
"We predict there is a 95% chance that the number of females will lie between 4900 and 5100."
Null hypothesis
(represented H0) Nothing interesting is happening. Conservative. Specific: Can be used to make probabilistic predictions.
4) Confidence interval of the difference in means for a two-sample t-test
(𝑋 ̅_𝐴−𝑋 ̅_𝐵 )±𝑡_(𝛼(2),𝜈) 𝑠_(𝑋 ̅_𝐴−𝑋 ̅_𝐵 ) 𝜈=𝑛_𝐴+𝑛_𝐵−2 𝑠_(𝑋 ̅_𝐴−𝑋 ̅_𝐵 )=√((𝑠_𝑝^2)/𝑛_𝐴 +(𝑠_𝑝^2)/𝑛_𝐵 )
What is the probability of 10 or more males, if sex ratio is truly equal?
(𝟏−𝑭x) (𝟗) gives the probability of 10 or more males
If X is a normally distributed random variable with mean 69 and standard deviation 2.9, what R command gives the probability that X ≥ 72?
1-pnorm(72, 69, 2.9)
Testing is focused on the null hypothesis
Basic question of test: Should the null hypothesis be rejected? Only two possible outcomes of test: Reject the null hypothesis. Do not reject null hypothesis.
one-sample t-test
Calculate 𝑋 ̅ and 𝑠_𝑋 ̅ from a random sample of size n. Under H0, t follows a t distribution with n-1 degrees of freedom. Calculate p-value for observed value of t, compare to 𝛼, and reject H0 if p-value < 𝛼.
What kind of variable is this? The weight (in milligrams) of a seed collected from a palo verde tree.
Continuous numeric
Why do wer pic 95% as our confidence interval?
Convention says 95% CI is a good balance between confidence and precision.
2-sample vs paired
Data are paired when each observation in one sample is uniquely associated with one observation in the other sample Example: Blood pressure measurements from brother/sister pairs (mmHg) Gender effect may be obscured by variation among families.
Deductions are statements about
Deductions are statements about probability: i.e., quantitative measures of our certainty that an event will occur.
Estimation strategy: 1) Take a random sample from the population
Features of a random sample: All members of the population must have an equal chance of being sampled. Sampling must be independent: selection of one item does not influence another's selection.
The random variable X is the number of females in a random sample of 10 deer offspring. This species of deer produces female offspring with a probability of 0.8. What units are the variance of X?
Females2
As sample size gets bigger what happens to the effect size?
It gets bigger?
Can you measure a whole entire population?
It is generally not possible to measure an entire population
σx ̅
It measures the precision of our estimate of the mean. is commonly called the standard error of the mean. It is really just the standard deviation of .
t-test formula
Mean 1 - Mean 2 / SE1 - SE2 or t= (𝑋 ̅-μ0)/(s𝑋 ̅) where s𝑋
Properties of the normal distribution
Mean = Median = Mode Bell-shaped Symmetrical Extends to infinity in both directions
Examples of parameters to estimate
Mean length of all monitor lizards in Australia. Mean length of all male, one-year-old monitor lizards on Kangaroo Island. Variance of the length of all monitor lizards in Australia. Mean survival times of Drosophila melanogaster raised in elevated oxygen. Probability that a crow egg yields a male offspring.
Question: Is the mean weight of Venezuelan capybaras 89 lbs? (after finding the X or the sample average of the capybaras?)
NO because this assumes this is the mean weight for all Venezuelan capybaras, when this is just the sample. The sample average is not the same thing as the mean.
Discrete Uniform Distribution formula
PX(x) = 1/N where N is the number of possible values of X. Example X = outcome of one roll of a six-sided die. 1/6
Probability distribution of X displayed as a function
PX(x) = g(x) PX(x) represents the probability that the random variable X takes on the value x. g(x) is some mathematical function of x.
Are these data paired or not? The alcohol intake of 9 men measured before and after liver disease diagnosis.
Paired
paored t-test example
Question: Is there a difference in blood pressure between males and females?
Type I:
Reject H0 when it is actually true.
Two different scientists are interested in the weights of male bumblebees. Each one takes a random sample from the same population and uses it to estimate the mean and the 95% confidence interval of the mean. These are their results: Scientist A: 𝜇 = 0.60 g 95% CI: 0.4 to 0.8 g Scientist B 𝜇 = 0.55 g 95% CI: 0.5 to 0.6 g Who probably had a larger sample size?
Scientist B because B has a smaller confidence interval. The larger the sample size, the smaller the confidence interval because a larger sample gives you a more precise estimate of the answer. More data (higher n) means more certainty about the estimate of μ
A researcher is studying the genetic basis of eye color in fruit flies. She wants to test a null hypothesis that predicts one-fourth of the offspring of a cross will have a mutant phenotype (brown eyes). She collects a random sample and uses it to calculate the 95% confidence interval of the proportion of brown-eyed flies. It is [0.11, 0.27]. If she uses a significance level of 5%, what will be the conclusion of her hypothesis test? (i.e., will she reject or not reject the null hypothesis?)
She should not reject the hypothesis because the p-0.25. Since a=0.5, the p value is smaller causing us no to reject the null. If the 95% CI of a parameter includes the null hypothesis value of the parameter, this is the same things s saying the null hypothesis is is not rejected at a significance level of 0.5. If the 95% CI of a parameter includes the null hypothesis value of the parameter, this is the same thing as saying that the null hypothesis is not rejected, at significance level 0.05.
Review of steps in hypothesis testing
Specify null and alternative hypotheses. Select a significance level α (acceptable Type I error). Choose a test statistic: a single value to be calculated from data. Collect data and calculate the test statistic. Calculate the P-value: If it is less than α, reject the null hypothesis.
define Statistics: and the two main areas
Statistics: The method of saying something about the real world based on observations influenced by randomness Two main areas: Using data to estimate unknown parameters. Using data to test hypotheses about unknown parameters.
Cumulative distribution function (CDF)
Symbolized FX(x) Gives probability of outcomes less than or equal to x.
Probability distribution function (PDF)
Symbolized PX(x) Gives probability of specific outcome x.
Estimation strategy
Take a random sample from the population. Using data from the sample, measure the value of an estimator. Calculate the precision of the parameter estimate.
Quick guide to estimating 𝜇_𝑋, the mean of the random variable X
Take a random sample of n measurements of X. Calculate the average 𝑋 ̅ to estimate 𝜇_𝑋. Calculate 𝑠_𝑋 ̅ , the standard deviation of 𝑋 ̅. First calculate s, the sample standard deviation. Then calculate 𝑠_𝑋 ̅ =𝑠⁄√𝑛. Calculate the 95% confidence interval of 𝜇_𝑋. Use the 2 standard deviation rule: [𝑋 ̅−2𝑠_𝑋 ̅ ", " 𝑋 ̅+2𝑠_𝑋 ̅ ].
Identify the binomial random variable, and give the values of n and p
The anemonefish lives in small groups that defend territories on coral reefs. This species has a male-biased sex ratio, with only 25% of fish being female. You plan to examine 20 territories, each of which has exactly four fish. In each territory, you will determine how many of the fish are female.
How many degrees of freedom does a 2-sample t test have?
The answer is nA + nB - 2. Start with the total number of independent observations (nA + nB) and subtract the number of parameters you needed to estimate (two: 𝜇_𝐴 and 𝜇_𝐵).
Induction: Determine sex of 12 offspring of red deer mothers in good condition. 2 are female 10 are male Based on these data, is sex ratio male-biased when mothers are in good condition?
We cannot yet clearly answer this question. The sample is male-biased, but this could just be random variation around a truly equal sex ratio. We need quantitative methods to decide the best answer, hence this course!
The pooled variance gives the best estimate for the variance of each group
The pooled variance is based on all of the data, so it is more precise. Use the pooled variance to calculate standard errors for each mean.
The number of males in a random sample of 12 deer offspring is a binomial random variable What is the Bernoulli trial? What is "success"? What is the value of n? What is the value of p?
What is the Bernoulli trial? Each offspring's sex What is "success"? Offspring is male What is the value of n? 12 What is the value of p? ½ (assuming equal sex ratio)
Do we know the value of parameters?
The value of parameters is typically unknown. When carrying out deductive reasoning, we assume that we know the values of parameters. This enables us to calculate probabilities. In reality, we rarely know parameter values.
Definition of the variance of a discrete random variable:
The variance is based on the squared deviation of each possible value from the mean. The variance is the mean squared deviation of all possible values of the random variable, with each value weighted by its probability.
Variance of a continuous random variable
The variance is the mean squared deviation of all possible values of the random variable, with each value weighted by its probability density.
An experiment was carried out to test whether the mean body temperature of postoperative patients is normal. The investigators chose a significance level of 5% (α = 0.05). The null hypothesis was "𝜇 = 98.6 ºF". It was rejected with a P-value of 0.04. Is the following statement true or false? The null hypothesis would not have been rejected if the significance level had been set to α = 0.01.
True. If you make the a smaller, you were less tolerant of the type 1 error and this less leikely to reject. T The choice of α can affect whether the null hypothesis is rejected (Rejection is more likely with a larger α).
When reporting a parameter estimate, always include a measure of precision
Two acceptable measures: Standard error of the mean ( ). 95% confidence interval of the mean.
Independence of events
Two events are independent if knowing whether one has occurred tells you nothing about whether the other will occur.
Hypothesis tests may be one-tailed or two-tailed examples
Two-tailed: H0: p = 0.5 vs. HA: p > 0.5 or p < 0.5 Are red deer offspring ratios equal, or are they biased toward one or the other sex? One-tailed H0: p = 0.5 vs HA: p > 0.5 Are red deer offspring ratios equal, or are they male-biased?
Question: Is mean body temperature of postoperative patients (X) different from normal temperature? Method: Measure the average temperature of a sample of 25 patients. Assumptions: Distribution of body temperature is normal. Standard deviation of body temperature is known (σ = 1.1°F).
Use Z-test: First step: State null and alternative hypotheses H0: μ = 98.6 HA: μ ≠ 98.6 (2-sided) Second step: Choose the significance level of the test (a=0.05) Third step: Choose a test statistic (Using Z test) Z= (𝑋 ̅-μ0)/(𝜎𝑋 ̅) 𝑋 ̅=is the average temperature of a sample of n patients. μ0=is the mean temperature under the null hypothesis (98.6) 𝜎𝑋 ̅=is the standard error of the mean If H0 is true, Z has a normal distribution with mean 0 and standard deviation 1 Fourth step: Collect data and calculate the test statistic Measure body temperature of 25 patients Average temperature 𝑋 ̅=99.0 σ = 1.1 𝜇0 = 98.6 Calculate Z Z=1.82 Fifth step: Calculate the P-value If P-value < α, reject the null hypothesis.
Estimating the standard deviation (σ)
Use the sample standard deviation (s) is the best estimator of σ: s= sqrtΣ ( xi - x )^2 / ( n - 1 )
One-sample t-test
Used to determine if a single sample mean is different from a known population mean For hypotheses about the mean of a population
Paired-sample t-test
Used when means need to be compared that are from independent samples
When is the Z-Test useful?
Useful only when the variance is known (usually not the case).
When is the t-Test useful?
Useful when variance is estimated from data. Much more realistic. Very widely used test.
Categorical:
Value describes membership in a group (qualitative). Ordinal: Categories can be ordered. Ex: rate from (low, medium and high). Nominal: Categories have no inherent order. Ex color, gender, and sex
Numerical:
Value is a numeric measurement (quantitative). Continuous: Variable can take any real-number value within some range. Ex: height, weight, length, time Discrete: Variable can take only particular values within some range. (must be a whole number) Ex: number of offspring, results of rolling 2 dice, he number of students in a class
As ν increases, t gets
closer and closer to Z
Two-sample t-test
example: Is the mean size of island lizards different from that of mainland ones? Null and alternative hypotheses H0: μA − μB = 0 The mean lengths of island and mainland lizards are the same. HA: μA − μB ≠ 0 The mean lengths of island and mainland lizards are different.
How does a higher n effect the standard error of the mean and 95% confidence interval?
higher n=Lower standard error of the mean=Smaller 95% confidence interval
mean
indicates the central tendency or location of the distribution.
variance
indicates the dispersion or width of the distribution.
quantile
is a value below which a given proportion of a probability distribution lies. The 0.10 quantile is the value q for which this statement is true: Prob(X < q) = 0.10
random process
is not perfectly predictable (typical in nature).
How does randomness affect induction?
makes induction difficult
Two key parameters that describe any probability distribution
mean and variance
The random variable X is the number of females in a random sample of 10 deer offspring. This species of deer produces female offspring with a probability of 0.8. What are the mean and variance of X?
mean= np mean= 8 females variance= np(1-p) var=1.6 females2 (squared)
Calculating the mean of a binomial random variable, n=3, p = 1/3
n*p or 3*(1/3)= 1
Identify the binomial random variable, and give the values of n and p. Flower color of Four-o-clocks is determined by a single gene as follows: Genotype Color aa white Aa pink AA red If two pink flowers are crossed, white, pink and red offspring are expected in the proportions 1:2:1. You plan to inspect 80 offspring and count how many are red.
n=80 p=1/4
paired or not The CO2 level at the entrance of 10 randomly chosen harvester ant colonies compared with the CO2 level near the bottom of the same colonies.
paired
sample variance
s2 = Σ ( xi - x )^2 / ( n - 1 )
The random variable X is the number of females in a random sample of 10 deer offspring. This species of deer produces female offspring with a probability of 0.8. What is the standard deviation?
sqrt of the variance. 1.26 females
Induction relies on
statistics: the method of saying something about the real world on the basis of observations influenced by randomness.
What are the degrees of freedom and the standard error for the 2-sample confidence intervals?
𝜈=𝑛_𝐴+𝑛_𝐵−2 𝑠_(𝑋 ̅_𝐴 )=√((𝑠_𝑝^2)/𝑛_𝐴 ) 𝑠_(𝑋 ̅_𝐵 )=√((𝑠_𝑝^2)/𝑛_𝐵 )
Examples of questions about the world can be expressed as questions about parameters
1. The parameter p gives the probability that a deer mother has male offspring. Does p = 0.5? 2. The parameter μdiet gives the mean weight gain for a child receiving the diet. The parameter μno_diet gives the mean weight gain for a child not receiving the diet. Does μdiet = μno_diet?
A fair coin is to be flipped three times. What is the probability that it lands heads all three times?
1/8
The Binomial Distribution
A binomial random variable X gives the number of successes in a fixed number n of independent Bernoulli trials, each with the same probability of success p.
Examples of null an alternative hypothesis in 1. The parameter p gives the probability that a deer mother has male offspring. Does p = 0.5? 2. The parameter μdiet gives the mean weight gain for a child receiving the diet. The parameter μno_diet gives the mean weight gain for a child not receiving the diet. Does μdiet = μno_diet?
Deer: Null: p=0.5 Alternative: p does not equal 0.5 Diet: Null: 𝜇diet= 𝜇no diet Alternative: 𝜇diet>𝜇no diet
What kind of variable is this? The number of Facebook friends of a randomly chosen person
Discrete numeric
A researcher wanted to know if astronauts increase in height during space voyages, due to the reduced gravitational force acting on them. She calculated the following 95% confidence interval for the change in astronaut height before and after a stay on the International Space Station: 95% CI: -3 to 37 mm Based on this confidence interval, should you reject the null hypothesis that there is no change in height, at a significance level of 0.05?
Do not reject. You have to have a critical value of t and then you know if we should reject the null hypothesis. We need more information. IF the conficence intecal includes the null hypothesis , you can't reject null. If the 1-α CI of μ does not include μ0, then the t test will reject the hypothesis at significance level α
two basic tasks of statistical inference:
Estimate the true value of parameters that describe nature. Test hypotheses about the value of these parameters.
Which expression gives the probability that X lies between 67 and 70?
F(70) - F(67)
Type II:
Fail to reject H0 when it is actually false.
An experiment was carried out to test whether the mean body temperature of postoperative patients is normal. The investigators chose a significance level of 5% (α = 0.05). The null hypothesis was "𝜇 = 98.6 ºF". It was rejected with a P-value of 0.04. Is the following statement true or false? The probability that the investigators committed a Type I error is 0.04.
False The P-value does not measure Type I error. The chance of Type I error is determined by α, which is chosen before the P-value is even calculated.
An experiment was carried out to test whether the mean body temperature of postoperative patients is normal. The investigators chose a significance level of 5% (α = 0.05). The null hypothesis was "𝜇 = 98.6 ºF". It was rejected with a P-value of 0.04. Is the following statement true or false? The probability that the investigators committed a Type II error is 0.04.
False Type I: Reject H0 when it is actually true. Type II: Fail to reject H0 when it is actually false. The P-value does not measure Type II error. It is used to make sure that Type I error is below an acceptable level.
An experiment was carried out to test whether the mean body temperature of postoperative patients is normal. The investigators chose a significance level of 5% (α = 0.05). The null hypothesis was "𝜇 = 98.6 ºF". It was rejected with a P-value of 0.04. Is the following statement true or false? The P-value is close to 𝛼. This means that mean body temperature is probably close to 98.6 ºF.
False because of the effect size. We need to know the sample size. The effective size the the difference between the null hypothesis value (u0) and the best estimate of the true value(x). The effective size is best calculated as x-u0(null hypothesis). A tiny p-value may go with a small sample size or a large sample size.
t-test steps
First and second steps are the same as for Z-test First step: State hypotheses Second step: Choose significance level Third step: Choose a test statistic(t-test) t= (𝑋 ̅-μ0)/(s𝑋 ̅) where s𝑋 is the estimate of 𝜎𝑋 Fourth step: Collect data and calculate the test statistic Fifth step: Calculate the P value and compare to α Conclusion example Conclusion: At a significance level of 0.05, we do not reject the null hypothesis that mean body temperature of postoperative patients equals 98.6°F (t24 = -1.88, p = 0.07).
Steps of the Z-test
First step: State null and alternative hypotheses Second step: Choose the significance level of the test (a=0.05) Third step: Choose a test statistic (Using Z test) Z= (𝑋 ̅-μ0)/(𝜎𝑋 ̅) 𝑋 ̅=is the average μ0=is the mean under the null hypothesis 𝜎𝑋 ̅=is the standard error of the mean Fourth step: Collect data and calculate the test statistic Fifth step: Calculate the P-value If P-value < α, reject the null hypothesis. With Z=1.82, the probability is 0.034 P-value = 0.034 + 0.034 = 0.068 The hypothesis that mean body temperature is 98.6 ºF is not rejected. Final conclusion: At a significance level of 0.05, we do not reject the null hypothesis that mean body temperature of postoperative patients equals 98.6°F (Z = 1.82, n = 25, p = 0.068).
Calculating sXa-Xb for a 2-sample t-test
First, make this assumption: The two populations have the same variance. Represent both with the same symbol, σ2 Next, estimate σ2 The best estimate of σ2 is the pooled variance, Finally, calculate sXa-Xb
Ideal plan for taking a random sample from a population of size N
Give each member of the population a unique integer ID (1, 2, 3, 4, ..., N). Pick a sample size, n. Use a random number generator to get n random integers between 1 and N. Sample the units indicated by the numbers.
Testing strategy places priority on minimizing Type I error
Goal of strategy: Set an acceptable level of committing a Type I error. Standard level is 5%. However, other values are possible: the choice is up to you.
What are histograms used for?
Histograms are a useful way to display a random sample Histogram can also be scaled in terms of probability density
Inspection of the graph suggests no difference (although a statistical test should be done to find out more certainly). With the new data determing they are br are brothers and sisters
Inspection of the graph suggests a small effect of sex, and a larger effect of family. There is more likely a difference here compared to the last plot because almost all the lines have a negative slope. Since they have similar genes, and other people could have genetic differences, or exercise differently, measuring siblings gives them a more similar environment and would lead the data to be less biased. Looking within families, genetic and environmental differences will be more similar.
Is there a difference in blood pressure between males and females?
Inspection of the graph suggests no difference (although a statistical test should be done to find out more certainly).
Central question for 2-sample test:
Is the true difference between 𝜇_𝐴 and 𝜇_𝐵 the same as the difference proposed by H0 ?
A researcher used a random sample to estimate the mean flash duration of a species of firefly. She calculated the following 95% confidence interval for the mean: [87.9 msec, 90.1 msec] She then wrote the following about this interval: "There is a 95% chance that the actual mean is between 87.9 and 90.1, a 2.5% chance it is less than 87.9, and a 2.5% chance that it is greater than 90.1." Is this a correct interpretation of a confidence interval?
NO, A correct statement would be: "There is a 95% chance that the interval from 87.9 to 90.1 includes the true value of the mean." Probability is about a random variable. The mean is a parameter and they do not vary. What varies is the average. The mean can be chance. It is not a variable. There is a 95% chance that the interval is between 87.9 and 90.1/ This is a better statement. The interval must include the mean but also may not. The mean is not a probability varying variable.
In the vermilion flycatcher, males are brightly colored and they sing frequently and prominently. Females are dull-colored and generally quiet. A researcher wanted to estimate the proportion of each sex in a forest population. To do so, she walked through the forest and noted how many individuals of each sex she detected. Is this sample likely to be a random sample?
NO. males draw more attetion because they are more brightly colored and sing more frequently then females. Females are more difficult to find.
Which estimate is correct? 89.0 lbs (Sample 1, n=5) 79.5 lbs (Sample 2, n=20) Neither.
Neither. Both are just estimates. Neither one is likely to be actually equal to the true value of the mean.
You plan to take a random sample and carry out a hypothesis test. What effect does increasing the sample size n have on the probability that you will falsely reject the null hypothesis (Type I error)? A. Reduce B. Increase C. No effect
No effect Key point: Type I error is set by the researcher's choice. It does not depend on sample size or anything else about the data. You set the type 1 error.
Does the following random variable follow a binomial distribution?The number of red balls out of 10 drawn one by one from a vat of 50 red and 50 green balls.
No, because the trials are not independent. The trial outcomes are not independent, because removing each ball changes the probability that the next ball is red.
Does the following random variable follow a binomial distribution? The number of red balls drawn from a vat of 50 red and 50 green balls, before the first green ball is drawn (the balls are replaced and mixed after each draw).
No. The value of n is not fixed, but instead depends on the outcome of each trial.
What kind of variable is this? The species of a beetle collected at a light trap.
Nominal categorical
The CO2 level of 10 randomly chosen harvester ant colonies compared with that of 10 randomly chosen honeypot ant colonies.
Not paired
The alcohol intake of 9 women who have had liver disease compared with the intake of 9 women who have not had liver disease.
Not paired
Is the following better as a null hypothesis or an alternative hypothesis? King cheetahs on average run the same speed as standard spotted cheetahs.
Null because they assume they are equal. It is either equal or not equal.
What kind of variable is this? A subjective rating given to a restaurant (poor, satisfactory, good, very good, etc.).
Ordinal categorical
Example of 2-sample t-test conclusion
P = 2*(1-pt(t, ν)) = 2*(1-pt(2.11, 18)) = 0.049 At a significance level of 0.05, we reject the null hypothesis that means are equal, in favor of the alternative that mainland lizards are larger (t-test: t18 = 2.11, P = 0.049).
How is the sample average different from the mean?
The average is a random variable. Varies from case to case, according to some probability distribution. Directly observable Represented by a Roman letter. (X) The average is calculated from data (a random sample). The mean is a parameter. Constant value: does not vary probablistically. Usually unknown. Represented by a Greek letter. (μ) The mean is calculated from the probability distribution, not from data.
Effect size
The difference between the null hypothesis value (𝜇_0) and the best estimate of the true value (𝑋 ̅). Effect size is best calculated as 𝑋 ̅ -𝜇0
In a study of heart rate in ocean-diving birds, researchers harnessed ten randomly sampled wild-caught cormorants to a laboratory device that monitored heart rate. Each cormorant was subjected to six artificial "dives" over the following week. The researchers claimed that this method gave them a random sample of 60 measurements of heart rate. Why are they wrong? That is, which criterion of random sampling does their sample violate?
The dives are not independent. These need to be measured at random. Each meansure needs to be statisctically independent. Every member must have the same chance of being samples.
What does The exact shape of the t PDF depends on
The exact shape of the t PDF depends on ν, the degrees of freedom of s.
The results of the test show that the P-value equals 0.068. Earlier, we chose a significance level α of 0.05. What conclusion should we draw?
The hypothesis that mean body temperature is 98.6 ºF is not rejected.
Alternative hypothesis
The hypothesis that states there is a difference between two or more sets of data. (represented HA) Something interesting is happening. Requires rejection of conservative assumptions. Often non-specific: Hard to make probabilistic predictions.
Mean of a continuous random variable
The mean is the integral of all possible values of the random variable, with each value weighted by its probability density.
Definition of the mean of a discrete random variable:
The mean is the sum of all possible values of X, with each value weighted by its probability. Ex:xPx(x)
random facts about mean
The mean of X is generally written as μX, or plain μ. The mean does not have to be a possible value of the random variable (e.g., a mean of 1.9 children per family).
When do mean and mediant differ?
The median differs from the mean when a distribution is skewed, or asymmetrical.
Median:
The middle value of the distribution.
A study set out to estimate the mean age of piñon pine trees in the coast ranges of California. Researchers used a computer to randomly select a location from a distribution map of the species in California. They went to the location and marked out a ten-hectare square plot. They then proceeded to measure the age of every piñon pine tree within the plot. They used the average age within the plot to estimate the mean age of the whole California population. What is the population of interest in this study? Were the trees sampled randomly from this population? Why or why not?
The population of interest is The piñon pine trees in the coast ranges of California The trees were not sampled independently and were not randomly from a population because they only sampled on location. They most likely have similar ages and trees don't move, so they are not well-mixed.
Two requirements for a discrete probability distribution
The probabilities must add up to 1 Each probability must be non-negative
We gather data on 12 offspring and observe 10 males and 2 females. If sex ratio is equal, what is the probability of seeing a result this male-biased? 10+11+12 in data
The probability of this many (or more) males is rather low, if sex ratios are equal.
Multiplicative rule of probability
The probability that several independent events occur is simply the product of the probabilities of each event. Example: A fair coin is to be flipped three times. The probability that it lands heads all three times is: 1/2 x 1/2 x 1/2= 1/8
Estimating μX, the mean of X
The sample average (X) is the best estimator of μX: add up the data and then divide it by the sample size.
How would you estimate the variance with a random sample of size n?
The sample variance (s2) is the best estimator of σ2:
Is the standard deviation or the variance more useful?
The standard deviation is often more useful than the variance, because it is measured in the original units.
Mode:
The value of X that has the highest probability density (continuous variables).
Example: In the US, 1/3 of people have O-positive blood. You plan to determine the blood type of a random sample of 3 people. The binomial random variable X is the number of people in the sample who have O-positive blood. What is the Bernoulli trial? What is "success" What is the value of n? What is the value of p?
What is the Bernoulli trial? Each person's blood type What is "success"? Blood type is O-positive What is the value of n? 3 What is the value of p? 1/3
If the probability of being 66" tall is zero, does that mean people who say they are 66" tall are wrong?
When someone says they are 66 inches tall, they do not mean exactly 66", but somewhere between 65.5" and 66.5". 𝑃𝑟𝑜𝑏(65.5" ≤𝐻𝑒𝑖𝑔ℎ𝑡<66.5")>0
two standard deviation rule
X has mean μ and standard deviation σ. The distribution of X is unimodal. The distribution of X is roughly symmetrical. Then... Prob(μ−2σ ≤ X ≤ μ+2σ) ≅ 95%.
Exact confidence interval of the mean with t-value
X+ta(2),v sx X=Estimator ta(2),v=critical value of a sx=standard deviation of estimator
Does the following random variable follow a binomial distribution? The number of red balls out of 10 drawn one by one from a vat of 50 red and 50 green balls, if the balls are replaced and mixed after each draw.
Yes, Replacement of the balls ensures that the trials are independent.
Does the following random variable follow a binomial distribution? The number of red-eyed flies among 200 fruit flies drawn at random from a large population having both red- and black-eyed flies.
Yes, Strictly speaking, the trials are not independent, because the flies are not replaced. However, for a large population, the effect is likely trivial and can be ignored.
parameter
a fixed constant that determines the form of the distribution. The parameters of the binomial distribution are n and p.
Two-sample t-test
a hypothesis test for answering questions about the mean where the data are collected from two random samples of independent observations, each from an underlying normal distribution. For hypotheses about the difference between the means of two populations.
Random variables
are conceptual numerical quantities that, in some experiment involving randomness, take on a particular value from some probability distribution. Varies from case to case, according to some probability distribution. Represented by Roman letter. Directly observable. Examples: numbers of female offspring, average length of ten lizards, growth rates of plants, n, etc., etc.
Parameters
are constants that characterize the probability distribution followed by a random variable. Constant value: does not vary probabilistically. Represented by Greek letter (usually). Usually unknown. Examples: binomial parameter (p); mean (μ) and variance (σ2) of a probability distribution.
deterministic process
always produces the same outcome (rare to non-existent in nature)
Does variance or range give more information?
variance because variance gives a higher probability of a number occurring while the range only gives an equal chance of a specific range.
mean shortcut formula
μ = np
variance shortcut formula
σ2 = np(1-p)
What is the probability that X lies between a and b?
𝑃𝑟𝑜𝑏(𝑎<𝑋<𝑏)=𝐹(𝑏)−𝐹(𝑎) The difference between F(b) and F(a) gives the area between a and b: i.e. the probability that X lies between a and b.
If X is a random variable and 𝑋 ̅ is an average based on n measurements of X ...
𝑋 ̅ has the same mean and a lower variance than X.
How well does a given average estimate the mean?
𝑋 ̅ is the average of the random variable X for a sample of size n. 𝑋 ̅ is an estimator of 𝜇_𝑋. On average, how close is 𝑋 ̅ to 𝜇_𝑋? The answer requires knowledge of the probability distribution of 𝑋 ̅.
Estimate the mean with this data: Data (lbs): 90, 103, 38, 75, 72, 68, 115, 97, 67, 65, 103, 67, 98, 91, 47, 86, 80, 74, 85, 69
𝑋 ̅=(90+103+38+75+...+85+69)/20 = 79.5
3) Confidence intervals of the means for a two-sample t-test
𝑋 ̅_𝐴±𝑡_(𝛼(2),𝜈) 𝑠_(𝑋 ̅_𝐴 ) 𝑛_𝐴 is the sample size for population A 𝑋 ̅_𝐵±𝑡_(𝛼(2),𝜈) 𝑠_(𝑋 ̅_𝐵 ) 𝑛_𝐵 is the sample size for population A
1) Confidence interval of the mean for a one-sample t-test
𝑋 ̅±𝑡_(𝛼(2),𝜈) 𝑠_𝑋 ̅ 𝜈=𝑛−1 𝑠_𝑋 ̅ =√(𝑠^2/𝑛)
2) Confidence interval of the mean difference in a paired t-test
𝑑 ̅±𝑡_(𝛼(2),𝜈) 𝑠_𝑑 ̅ 𝜈=𝑛−1 𝑠_𝑑 ̅ =√((𝑠_𝑑^2)/𝑛) (n is the number of pairs)
Of 12 deer offspring: 10 are males What is the probability of 10 or fewer males if sex ratio is truly equal? Use the cumulative distribution function to get the answer:
𝑭_𝑿 (𝟏𝟎) gives the probability of 10 or fewer males
𝑋 ̅ is normally distributed with:
𝜇_𝑋 ̅ = μ 𝜎_𝑋 ̅^2 = 𝜎^2/n 𝜎_𝑋 ̅ = 𝜎/sqrt(n)