322 Final Exam

Ace your homework & exams now with Quizwiz!

what does this mean (a,b] and [a,b)

(a,b] left open and right closed (starts from the right) [a,b) right open and left closed (starts from the left)

F-test for variance ratios

(also referred as to homogeneity of variance) The F-distribution: the ratio between two sample variances follows the F distribution

Continuous

(can take any real-number value) Core body temperature (e.g., degrees Celcius, oC) Territory size of a bird (e.g., hectares) Size of fish (e.g., cm),

Discrete

(only take indivisible units) Age at death (e.g., years) Number of amino acids in a protein, Number of eggs in a bird nest.

one-sided test

(or one-tailed) test, the alternative hypothesis includes parameter values (populations) on only one side of the value specified by the null hypothesis. H0 is rejected only if data depart from it in the directed stated by HA.

coefficient of variation (CV)

(standard deviation / mean) x 100% the coefficient of variation (CV) is used to describe the level of variability within a population independently of the absolute values of the observations. If absolute values are similair, populations can be compared using their standard deviations. But if they differ a lot (like weights of mice and elephants) or different variables (weight and height) then you need to use standardized measurements = coefficient of variation

Assumptions of a paired t-test:

- The sampling units are randomly sampled from the population. -The paired differences have a normal distribution in the population.

of we do 18 tests, what is the chance of finding at least one insignificant test when you should not. Alpha = 0.05 (acceptance area of 95%)

1-0.95^(18)=0.603 60.3% chance of finding at least 1 significant difference between group 1 and group 2 when Ho is true

How was the F distribution built?

1. Assume that H0 is true 2. Sample from the same normally distributed population the appropriate number of groups (samples) respecting the sample size of each group. 3. Repeat step 2 a large (or infinite) number of times and each time, calculate the F statistic. 4. Calculate the probability distribution based on the F values generated in step 3. 5. The probability of rejection of H0 (P-value) is estimated as the number of F values equal or greater than the observed (i.e., one tailed-test).`

The Process of a One Sample Mean Test

1. Establish the theoretical population mean value of interest under H0. This is the parameter that is used to standardize the sample mean value to generate the t standardized value 2. Take one sample from the population and assume that the sample is normally distributed. 3. Standardize the sample mean in relation to the population mean value of interest using the t standardization (calculate t-score) 4. Determine the probability of finding the observed t score in the t distribution that is extreme or more extreme than the observed. 5.Based on the probability calculated and the established significance value (alpha), reject or do not reject the null hypothesis.

Assumptions of Linear Regression

1. The linear model correctly describes the functional relationship between X and Y. 2. The X variable is measured without error (vertical offsets as residuals). 3. Sampled X values and sampled Y values are independent 4. Variance of Y is constant along the regression line. 5. At each value of X, the distribution of possible Y values is normal

steps of standardization

1. centre each t value in the sampling distribution so that the new distribution all have a mean = 0 2. divide the subtracted value by the standard deviation of the sampling distribution (SE=1) (given any normal distributing with any mean and sd or n, there will be one single standardized t distribution

steps involved in hypothesis testing...

1. transform the scientific question into a statistical question. 2. State the null (theoretical population) and alternative hypotheses based on population values (parameters). 3. Compute the observed value of interest 4. Determine the P-value by contrasting the sample (observed) value against a sampling distribution that assumes the null hypothesis to be true (theoretical population), probability of finding the observed, or a more extreme value in the sampling distribution of the theoretical population. 5. Draw a conclusion by comparing the observed P-value against the significance level (𝛼). If P-value greater than 𝛼, then do not reject H0; if P-value smaller than 𝛼, then reject H0.

What do P-Values Represent?

A P-value is the probability of generating a value equal to the sample data or values more unusual (smaller or greater) than the one for the sample data in the sampling distribution based on assuming that the theoretical population is true.

t distribution

A distribution specified by degrees of freedom used to model test statistics for the sample mean. is a sampling distribution of the the number of sample standard errors away from the mean necessary to produce a confidence interval of the desired coverage (e.g., 95%). In its standardized form, it has always mean of zero and standard deviation equal to 1.

probability density function (pdf)

A function indicating the relative frequency with which any measurement may be expected to occur. In remote sensing it is represented by histogram 68.2% -1 to 1 sd away from mean 95.4% -2 to 2 sd away from mean 99.7% -3 to 3 sd away from mean

How to fit the model?

Aim of linear regression is to fit a straight line to data that generates (in average) the best prediction of y for any value of x. Predicted values for Y are on the regression line, i.e., given an X value we can predict the Y value. The line minimizes the average distance between data and fitted line, i.e., the residuals.

The analysis of variance (ANOVA)

Comparing the means of three or more groups (often called treatments in experiments)

ANOVA & the Tukey-test Assumptions:

Each of the samples is a random sample from its population The variable is normally distributed in each (treatment) population. The variances are equal among all statistical populations from which the treatments were sampled.

ANOVA assumptions

Each of the samples is a random sample from its population. The variable is normally distributed in each (treatment) population. The variances are equal among all populations from which the treatments were sampled (otherwise the F values change in ways that may not measure difference among means).

Assumptions of Two-sample comparison of means

Each of the two samples is a random sample from its population The variable is normally distributed in each population. The standard deviation (and variance) of the variable is the same in both population

Two-sample comparison of variances assumptions

Each of the two samples is a random sample from its population. The variable is normally distributed in each population

Welch's t-test - Assumptions

Each of the two samples is a random sample from their populations. The variable is normally distributed in each population. The variances differ between the two samples.

why is the distribution a problem for regressions?

Ensure that the distribution of predictor values is approximately uniform within the sampled range.

why is range an issue for regressions?

Ensure that the range of values sampled for the predictor variable is large enough to capture the full range of responses by the response variable.

Population

Entire collection of individual units that share a property or sets of properties from which you want to generalize knowledge about unknown quantities (observations) based on a sub-set of individual units (sample).

why is extrapolation of a regression an issue?

Extrapolation (outside of the limits of X) in regression is dangerous when predicting predictions hold well within the range of X values but not outside.

what does an anova use as a test statistic?

F ratios

How can we measure the difference among multiple means?

F statistic does that by considering the ratio of two variances (variance components): F= (variance among group means due to treatment) / (variance within groups)

interquartile range

First quartile (Q1) = j = 0.25n = (0.25)(16) = 4th number Third quartile (Q3) = j = 0.75n = (0.75)(16) = 12th number The interquartile range (IQR) for the speed data before amputation is then: (12th number - 4th number) = IQR

make a null and alternative hypothesis for this study: Normal human body temperature, as kids are taught in North America, is 98.6oF. But how well is this supported by data?

H0 (null hypothesis): the mean human body temperature is 98.6oF. HA (alternative hypothesis): the true population is different from 98.6oF.

The analysis of variance (ANOVA) hypotheses?

H0: F = df2/(df2 −2) HA: F > df2/(df2 −2) H0: Differences in means among groups are due to sampling error from the same population HA: Differences in means among groups are NOT due to sampling error from the same population H0: The samples come from statistical populations with the same mean (μcontrol = μknee = μeyes) HA: At least two samples come from different statistical populations with different means.

two sided test null and alternative hypothesis

H0: The number of right-handed and left- handed toads in the population are equal. HA: The number of right-handed and left- handed toads in the population are different. H0: the mean human body temperature is 98.6oF. HA: the mean human body temperature is different from 98.6oF.

one sided test Null and alternative hypothesis

H0: The number of right-handed is equal or greater than left-handed toads in the population. HA: The number of right-handed is smaller than left-handed toads in the population. H0: the mean human body temperature is smaller or equal to 98.6oF. HA: mean human body temperature is greater than 98.6oF.

Zero correlation hypotheses?

H0: There is no relationship between the inbreeding coefficient and the number of pups in the population (𝜌 = 0). HA: Inbreeding coefficient and the number of pups in the population are correlated (𝜌 ≠ 0).

simple linear regression hypotheses??

H0: the population slope = 0 (i.e., Y can't be predicted by X) HA: the population slope ≠ 0 (i.e., Y can be predicted by X)

The Levene's test hypotheses??

Ho = 𝜎control =𝜎knee =𝜎eye HA: At least one population variance (𝜎. ) is different from another population variance or other population variances.

if it is likely we collected the evidence we did (high p-value) then ...

If it is likely, then we "do not reject" our initial assumption (theoretical population). There is not enough evidence to do otherwise. In other words, any observed difference between the sample and the theoretical population value (50%/50%) is due to chance alone.

if it is unlikely we collected the evidence we did (low p-value) then ...

If it is unlikely, then either: Our initial assumption (proportion is equal) is incorrect and we should "reject" the initial assumption. We could say "we have strong evidence against the initial assumption". OR our initial assumption is correct and we experienced a truly unusual event

if the assumption of equal variances is NOT met through the F-test for variance ratios then it means...

If the assumption is not met variances differ (𝜎 ≠ 𝜎) use the Welch's t-test

how to calculate intervals and speed intervals with 8 observations

Intervals: surges rule = 1+In(n)/ln(2) Here = 1 + ln(8)/ln(2) = 4 classes Speed intervals = (max(value)-min(value))/number of classes = (2-9) / 4 = 0.275

Ordinal

Life stage (e.g., egg, larva, juvenile, adult) Snake bite severity score (e.g., minimal, moderate, severe), Size class (e.g., small, medium, large).

F-test for variance ratios (also referred as to homogeneity of variance) null and alternative hypothesis

NULL = (𝜎 = 𝜎) ALT = (𝜎 ≠ 𝜎) H0: Lizards killed by shrikes and living lizard do not differ in their horn length variances HA: Lizards killed by shrikes and living lizard differ in their horn length variances

what is the null and alternative hypothesis in a paired t test?

NULL = 𝜇 = 0 ALT = 𝜇 ≠ 0 - the population mean = 0 - the population mean does not = 0

Two sample comparison of means null and alternative hypothesis

NULL = 𝜇 = 𝜇 ALT = 𝜇 ≠ 𝜇 H0: Lizards killed by shrikes and living lizard do not differ in mean horn length (i.e., 𝜇 = 𝜇). HA: Lizards killed by shrikes and living lizard differ in mean horn length (i.e., 𝜇 ≠ 𝜇).

normally distributed data makes what unbiased?

Normally distributed data makes standard deviations and standard errors to be unbiased; the mean is already unbiased regardless of the distribution (as we saw previously).

Relevant issues with the welches t-test

Perhaps sample size is too small to detect real differences. The difference in variances reduce the degrees of freedom and reduces statistical power. Even if the means are truly not different (i.e., H0 is true; something we don't know), it is important to note the variation, which has important conservation implications!

Observation Study

Researcher has no control over which observational units fall into which groups.

Experimental Study

Researcher randomly assigns observational units (fish individuals) to different groups (often called treatments; e.g., high/low protein diet)

Linear Simple Regression

Simple linear regression describes the linear relationship between a predictor variable (predator body size), plotted on the x-axis, and a response variable(# "dees"/call), plotted on the y-axis. We regress Y on X

Small P-values are ...

Small P-values are strong evidence against the initial assumption based on the theoretical population. Small P-values provide evidence that the initial assumption is not true and should be rejected!

Standardization

Standardization will not result in a normally distributed variable, rather it will keep the original distribution of the data It guarantees that the mean of the standardized t distribution is always 0 and the standard deviation is always 1 if the original distribution is normally distributed - standardized variable will have a standard normal distribution. if the original distribution is uniform, the standardized variable will have a uniform distribution

Nominal

Survival (alive or dead) Method of disease transmission (e.g., water, air, animal vector) Eye colours (amber, blue, brown, gray, green, hazel, or red), Breed of a dog (e.g., collie, shepherd, terrier)

The Levene's test

Testing differences in variances among populations

what is an anova really testing?

The (standardized) slope! The t-test for the slope equals the ANOVA for the overall simple regression model The t-test for intercepts test only for the intercept and not the quality of the model.

How many intervals (classes of abundance) should be used?

The Sturges' rule: n.intervals=1+ln(n)/ln(2)

when an anova is significant, which pairs are truly considered significant and how do we know?

The Tukey's honest test

SSregression represents

The amount of variation explained by the regression

Why do we use n-1 in a standard deviation calculation instead of n?

The average of all infinite s based on n-1 provides an "honest" (unbiased) estimator because the mean of all sample s values equals the population value 𝜎.

Correlation between continuous variable

The correlation coefficient measures the strength and direction of the association between two continuous variables (often referred as to co-variables) it measures the tendency of two variables to co-vary.

Assumptions of the one sample t test?

The data are a random sample from the population (either from the theoretical) or any of the other possible populations from which the sample may have been sampled from. The variable is normally distributed in the population.

standard t-test (equal variances) versus the Welch's t-test (different variances)

The difference in variances reduces the degrees of freedom and affects statistical power. Greater degrees of freedom (v) = greater power ( P value is smaller) Differences in variances between samples are known to increase Type I error, hence the need of the Welch's t test to make sure that the t test has correct Type I error when variances between samples differ.

location

The location tells us something about the average or typical individual (i.e., where the observations are centered).

is the standard deviation biased or unbiased?

The mean of the sample standard deviations DOES NOT EQUAL the population standard deviation = biased therefore, we may not be able to trust a given sample standard deviation form a population to make inferences

Statistics

The most important goal of statistics is to infer an unknown quantity (e.g., height of a species of plant) of an entire population of plants based on sample data (a subset of observations from the population)

Pearson Correlation r Assumptions:

The relationship between X and Y is linear. The distribution of X and Y separately are normal.

is the sample standard error biased or unbiased?

The sample standard error is said to be unbiased estimator of the true parameter value SE 𝜇 = ONLY FOR NORMALLY DISTRIBUTED POPULATIONS

The sampling distribution makes plain that although the __________ is a constant (2622.0), _________ is a variable

The sampling distribution makes plain that although the population mean 𝜇 is a constant (2622.0), its estimate 𝑌 is a variable

spread

The spread tells us how variable the measurements are from individual to individual - how widely scattered the observations are around the center.

Post-hoc Tukey's test - Hypotheses??

There is a pair of hypotheses for each pair of means as follows: H0: μi = μj for each pair I ≠ j HA: μi ≠ μj for each pair (control - knee, control - eye, knee - eye) We have to go through as many hypothesis testing, as there are pairs to detect which pair is significant.

Sampling distributions

They represent the probability distribution of all values for an estimate that we might obtain when we sample a population

Ordinary Least Squares (OLS)

To find the best line, we must minimize the sum of the squares of the residuals; as such we need to find model coefficients ( βo & β1) that minimize the sum of squares of residuals:

Differences in variances between samples are known to increase Type ??? error

Type 1 False Positive is rejecting a true null hypothesis (i.e., reject the null hypothesis when you should not have).

False positive

Type I error rejecting a true null hypothesis

Linear regression vs Correlation between continuous variable

Unlike linear regression Correlation fits no line to the data and There are no expectation in terms of which variable is the response and which variable is the predictor.

Bar graph

Vertical or horizontal columns (bars) representing the distribution of a numerical variable against one or more categorical variables

A study in which the variance of the two samples differ and the need to apply the

Welch's t-test

intercept slope equation

Y = B0+B1X

Parameter

a quantity describing a statistical population

estimate or statistic

a related quantity calculated from a sample.

correlation shows

a relationship

The 2SE rule of thumb

a rough approximation to the 95% confidence interval for a mean can be calculated as the sample mean plus and minus two standard errors!

a residual is computed as

a y-coordinate from thee data minus a y-coordinate predicted by the line

the null hypothesis in analysis of variances asks

all groups have the same statistical population mean

to test for equality of means of more than 2 populations which of the following techniques is used

analysis of variance (ANOVA)

Variable

any characteristic, number, or quantity that can be measured or counted. Height, weight, age, gender, business income and expenses, country of birth, capital expenditure, class grades and eye color are examples of variables.

null hypothesis

any observed difference between the sample and the theoretical population value is due to chance alone; i.e., one is likely to find a sample from a theoretical population as extreme or more extreme than the observed sample. The null hypothesis is a specific statement about a theoretical population parameter made for the purposes of argument.

HISTOGRAM

are important because they describe shape of numerical variables

theoretical population

assume an infinite population in which the proportion is truly evenly distributed 0.5

why do we estimate the probability of finding the overseer or a more extreme sample value when the null hypothesis is true?

because if the probability is very small, the sample value gives reasonable evidence to support the alternative hypothesis

in a paired design

both treatments are applied to every sampled unit (such as within the same river, or on the same tree...)

The normal distribution

can be described as a probability density function (pdf) for continuous variables. Instead of probabilities of a particular value, which is zero for a pdf, we calculate the probability that a particular value is within a range defined by any two values in the pdf.

Numerical Variables (Quantitative)

characteristics of observations have magnitude on a numerical scale. continuous or discrete

what are the variables in a correlation called?

co-variables

R2 in linear regression is referred as

coefficient of determination and measures the goodness of fit of the model, i.e., how well the regression approximates the observed data points.

Welch's t-test

comparing two sample means when their variances are different

Observation

contains all the values for the variables of interest such as the fork length and individual weight of an individual brook trout

Statistical hypothesis are about populations, but are tested with

data from samples

Categorical variables (qualitative variables)

describe membership in a category or group; characteristics of observations that do not have magnitude on a numerical scale nominal or ordinal

residuals

difference between each value in relation to its group mean

Within a two-sample design

each treatment group is composed of an independent, random sample unit.

the __ sum of squares measured the variability of the observed values around their respective treatments means

error

we build the sampling distribution under the null hypothesis in order to

estimate the probability of finding the overused or a more extreme sample value when the null hypothesis is true

confidence intervals

estimate the range of possible values for the population parameter of interest (like the population mean). They are a range of values surrounding the sample estimate that is likely to contain the population parameter

two-sided test

estimates the p-value by considering results that are at least as extreme as our observed result in either direction a hypothesis test with a two-sided alternative

explain the components of a box plot and what left vs right skewed looks like.

graphically displaying a variables location and spread at a glance. Provides some indication of the datas symmetry and skewness. Unlike many other methods of data display, box plots show outliers left skewed line will be at the top of the graph right skewed the line will be at the bottom of the graph

the ___ sum of squares measures the variability of each group mean around the total mean (across all groups)

groups

Statistical hypothesis testing asks what?

how unusual it is to get the observed value for the sample data within the distribution built under the null hypothesis.

deciding between the standard t-test and the welches t-test depends on

if the variances of the two samples are equal or different

as variability due to chance decrease, the value of the F statistic will

increase

null distribution

infinite population with a even proportion of observations is called a theoretical population and its sampling distribution is referred as to "null distribution"

standard deviation (s)

is a commonly used measure of the spread of a distribution. It measures how far from the mean the observations typically are. The standard deviation is large if most observations are far from the mean, and it is small if most measurements lie close to the mean.

frequency distribution

is a representation either in a graphical or tabular format, that displays the number of observations within a given internal of a quantitative variable. (continuous or discrete)

Type II error

is failing to reject false null hypothesis (i.e., do not reject the null hypothesis when you should not have). false negative

Type I error (alpha)

is rejecting a true null hypothesis (i.e., reject the null hypothesis when you should not have). The significance level 𝛼 sets the probability of committing a type I error. Stating that there is an effect when none exists false positive

the mode

is the interval corresponding to the highest peak in the frequency distribution. A distribution is said bimodal when it has two dominant peaks

median

is the middle measures of a set of observations (distribution) If the number of observations is odd, then the median is the middle observation If the number of observations is even, then the median is calculated by taking the two middle numbers and dividing by two

The significance level (𝜶 level)

is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

the power of a test

is the probability that a random sample will lead to the rejection of a false null hypothesis.

what is alpha

is the risk of rejecting H0 even though it is true; i.e., to say that exhibit handedness when in reality (which you don't know) they do not. The risk is small though (0.05). Alpha determines the rejection region versus the non-rejection region for H0.

the t-distribution in a one sample mean test is...

is the sampling distribution of all possible t-values from a theoretical population assumed to bee true under the null hypothesis

what is spurious correlation

it is the correlation between two variables having no causal relation

Only _____ values of F are interesting in an ANOVA context for rejecting H0

large

large sample sizes should lead to small or large p-values?

leads to small p-values and smaller type 2 errors, and more statistical power

analysis of variance is a statistical method of comparing the ___ of several populations

means

the error sum of squares

measured the variability of the oberserved values around their respective treatments means

the group sum of squares

measures the variability of each group mean around the total mean (across all groups)

intervals must be

mutually exclusive (each observation only belongs to one interval) and exhaustive (all observations must be included), and the interval size depends on the data being analyzed and the goals of the analys

what if the p-value is smaller than 0.05?

one should favour the alternative hypothesis

Hypothesis testing

operationalizes decision making by asking whether the sample differs from a specific "null" expectation (also called theoretical population or initial assumption).

𝜇

population mean

the null hypothesis and alternative hypothesis assume particular values for the...

population parameter

𝜎

population standard deviation

𝜎2

population variance

Explainatory variable

predicts or affects the other variable, called the response variable. When conducting an experiment, the treatment variable (the one manipulated by the researcher) is the explanatory variable

the F-distribution depends on

ratio between two sample variance and sample sizes of each sample

what does a difference in variances cause?

reduces the degrees of freedom and affects statistical power known to increase Type I error Greater degrees of freedom (v) = greater power = P value is smaller

the skew

refers to asymmetry in the shape of a frequency distribution for a numerical variable. (can be skewed left or right)

why is using regressions to make predications a problem?

regression of Y on X does not always imply dependency

alternative hypothesis

represents all other possible parameter values, i.e., all possible populations except the one stated under the null hypothesis. In other words, our initial assumption is incorrect

Slope

represents the difference in predicted value of Y (number of "dees" per call) for each one unit difference in predator body mass. the slope of a regression line represents the rate of change in y as x changes.

the coefficients of the least-squares regression line are determined by minimizing the sum of the squares of the ...

residuals

𝑋

sample mean

what does a larger sample size within a one sample mean test do??

sample size usually decreases the standard deviation and the standard error, which make t increase, which leads to smaller P values. Smaller P values allows rejecting the null hypothesis. As such, increased sample values lead to greater statistical power (smaller Type II errors) to reject the null hypothesis when it is not true!

S

sample standard deviation

the maximum probability of a type I error that the decisions maker will tolerate is called the

significance level (alpha)

mean (Y)

sum of all observations in a sample divided by n, the number of observations

the advantage of the paired sample design is

that it reduces the effects of variation among sampling units that has nothing to do with the treatment itself Although, pairs of samples are assumed independent, within a pair there is dependency. The greater the dependency within a pair, the greater the reduction in variation among.

And any sampling distribution for values different than the one assumed under the null hypothesis is called:

the alternative hypothesis

if the assumption of equal variances is met through the F-test for variance ratios then it means...

the assumption of equality of variances is met! the variances are equal (𝜎 = 𝜎) use two sample t test for this data

pooled sample variance (S(2/p))

the average of the variances of the samples weighted by their degrees of freedom

significance level

the decision threshold is called significance level and its symbol is 𝛼 (alpha). In biology the mostly used 𝛼 = 0.05 (and often 𝛼 = 0.01).

sampling error (SE)

the difference between the sample and population values. The estimate of this error is the standard deviation of the sampling distribution, i.e., the average difference between all sample means and the true mean:

Observation units or Statistical units

the entity on which information is logged (e.g., one individual lake trout).

If p-value is small (smaller than alpha)

the evidence against H0 is strong (reject H0)

If p-value is large (greater than alpha)

the evidence against H0 is weak (do not reject H0)

The regression R2 measures

the explained sum of squares as a proportion of the total sum of squares

response variable

the explanatory variable, predicts or affects the other variable, called the response variable. the measured effect of the treatment is the response variable.

the two sample comparison of variances uses what data ?

the f-test for variance ratios

Is the mean biased or unbiased

the law of large numbers says that as sample size increase, there is a greater probability that the sample mean is closer to the tru population mean The mean is an unbiased sample statistic

is the mean or median influenced by extreme values?

the mean is influenced by extreme values and the median is NOT

Our initial assumption to build the sampling distribution for the theoretical population is called:

the null hypothesis

the p-value is calculated assuming that

the null hypothesis is true

A study in which the variance of the two samples do not differ, then need to apply the

the paired t test

The mean of all sample estimates of the mean equals

the population mean and is centered exactly on the true (population) mean! This means that the statistic 𝑌 is an unbiased estimate of 𝜇.

a p-value indicates

the probability of obtaining the sample results (or one more extreme) assuming the null hypothesis as true

is the standard deviation or interquartile range influenced by extreme values?

the standard deviation is influenced by extreme values and the IQR is NOT sensitive to extreme values.

the degrees of freedom of the mean square error is calculated on the basis of

the total number of observations and numbers of groups

Residuals

the unexplained variation in Y (number of "dees" per call) by the regression model

intercept

the value of Y when X is zero (unit is the same as in Y)

the advantage of one sided test over two tailed tests are

they have more statistical power but greater type I error (rejecting null hypothesis when the null hypothesis is true)

why the standard deviation of the population does not seem to change the t-distribution as expected?

this its because of standardization

Anova are typically used to test

three or more means

a theoretical value under the null hypothesis needs to be assumed in order to

to find the appropriate sampling distribution for the statistic of interest under the null hypothesis

Random Sampling

two criteria: Every observational unit in the population have an equal chance of being included in the sample. The selection of observational units in the population must be independent. EQUAL CHANCE AND INDEPENDENT Random sampling minimizes bias of estimates in relation to a parameter

false negative

type II error failing to reject false null hypothesis

Statistical hypothesis

uses sample data to make inferences about the population from which the sample was taken. Estimations puts bounds (confidence intervals) on the value of a population parameter

statistical variables

variables are not based on their measuring units but rather their types Arm length and leg length can be both measured in centimeters but they are TWO different variables.

variance (s^2)

variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. the square root of variance = standard deviation

a geometric interpretation of a residual is the...

vertical distance from a y-coordinate to the regression line

what do you assume to generate the t-distribution

we assumed a normally distributed population If the original population is not normal, then the standardized sampling distribution of means may not be normal! And the standard error may not be unbiased

If P is smaller than 𝛼

we have enough evidence to reject the null hypothesis (H0) in favour of the alternative (HA).

sampling distribution of the estimate

which is the probability distribution of all the values for an estimate that we might have obtained when we sampled the population.


Related study sets

Intro to Public Administration Quiz 2

View Set

chapter 1: Analyzing Data to Make Accurate Clinical Judgements

View Set

Week 10 - Clustering 2, Hierarchical & K-Means

View Set

ch. 1 Law of Agency (Real Estate)

View Set