R101 Confidence Intervals and Hypothesis Testing
margin of error for proportion
MOE <- z * SE
margin of error (MOE)
Z* x SEM
print results for proportion
cat("Biden support:", p_hat, "\n") cat("Standard Error:", SE, "\n") cat("Margin of Error:", MOE, "\n") cat("95% Confidence Interval: [", lower_bound, ",", upper_bound, "]\n")
how to print results for means
cat("Sample Mean:", sample_mean, "\n") cat("Sample Standard Deviation:", sample_sd, "\n") cat("Sample Size:", n, "\n") cat("Standard Error of the Mean:", sem, "\n") cat("Critical Value:", critical_value, "\n") cat("Margin of Error:", margin_of_error, "\n") cat("95% Confidence Interval: [", lower_bound, ",", upper_bound, "]\n")
how to find the critical value for a __% confidence level
confidence_level <- ___ alpha <- 1 - confidence_level critical_value <- qnorm(1 - alpha / 2)
set.seed()
create reproducible results when writing code that involves creating variables that take on random values (same random values each time code is run)
conclusion if p value is greater than alpha
fail to reject the null hypothesis. this means that the data do not provide sufficient evidence to conclude that there is a significant effect or relationship.
how to find confidence interval for proportions with given confidence level
n <- ___ df <- read.csv p_hat <- sum(df$target variable >,< or ≠ 100) / n SE <- sqrt((p_hat * (1 - p_hat)) / n) # Critical value for 99% confidence level z <- qnorm(1 - 0.005) MOE <- z * SE lower_bound <- p_hat - MOE upper_bound <- p_hat + MOE cat("Sample Proportion:", p_hat, "\n") cat("Standard Error:", SE, "\n") cat("Margin of Error:", MOE, "\n") cat("99% Confidence Interval: [", lower_bound, ",", upper_bound, "]\n")
how to find confidence level from sample data
n <- _____ df <- read.csv p_hat <- sum(df$Spending >,< or ≠ 150) / n p_hat <- sum(df$Spending > 150) / n desired_moe <- _____ SE <- sqrt((p_hat * (1 - p_hat)) / n) z <- desired_moe / SE confidence_level <- 2 * pnorm(z) - 1 cat("Sample Proportion of spendings > $150:", p_hat, "\n") cat("Standard Error:", SE, "\n") cat("Desired Margin of Error:", desired_moe, "\n") cat("Z-score required:", z, "\n") cat("Confidence Level:", confidence_level * 100, "%\n")
how to find sample proportion
n <- _____ name of dataset_data <- read.csv p_hat <- sum(name of dataset_data$variable of interest == "name of wanted") / n
how to find sample size
n <- nrow(sample_df)
finding critical value Z*
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE)
conclusion if p value is smaller than alpha
reject the null hypothesis. this suggests that there is strong evidence that the effect or relationship being studied is not due to random chance.
standard error of the mean (SEM)
s/sqrt(n)
how to find standard error of the mean
sem <- sample_sd / sqrt(n)
critical value for __% confidence for proportion
z <- ____
z test from aggregates
z_test_from_agg(mean1,mean2, sd1,sd2. N1,N2)
permutations test
- Permutation(df, 'CAT', 'NUM',N, 'v1, 'v2') - df is the name of data frame - CAT is a categocial variable - NUM is the numerical variable - N is the number of permutations - v1 and v2 are two values of CAT variable
interpretation of confidence interval
- at __% confidence, the true ______ falls between ___ and ____ - if we repeated this process multiple times, __% of intervals would contain the true ________
rnorm()
- generates a sample of normal distribution - norm(n, mean, sd)
relationship between confidence level and width of confidence interval
- higher confidence leads to wider interval - lower confidence leads to narrower interval
pnorm()
- return probability p = CDF value from (-infinity, q) - qnorm(q, mean, sd)
qnorm()
- return quantile value q based on the probability value of p - qnorm(p, mean)
dnorm()
- return the density of probability at point of x - dnorm(x, mean, sd)
z test
- z_test(df,"CAT, "NUM","v1", "v2") - df is the name of data frame - CAT is a categocial variable - NUM is the numerical variable - N is the number of permutations - v1 and v2 are two values of CAT variable
standard error for proportion
SE <- sqrt((p_hat * (1 - p_hat)) / n)
how to find two tailed proportion tests
library(stats) df <- read.csv milk_transactions <- df[df$Milk == "Yes", ] p_hat <- mean(milk_transactions$Bread == "Yes") n <- nrow(milk_transactions) se <- sqrt(p_hat * (1 - p_hat) / n) z <- qnorm(0.99) # for 98% confidence interval, we use 0.99 because it's two-tailed ci_lower <- p_hat - z * se ci_upper <- p_hat + z * se cat("The 98% confidence interval for the proportion of transactions that buy Bread when they buy Milk is:", ci_lower, "to", ci_upper, "\n")
how to find confidence level/interval given margin of error
library(stats) name of dataset_df <- read.csv set.seed(123) sample_indices <- sample(nrow(airbnb_df), # of samples of tuples from data frame) sample_df <- airbnb_df[sample_indices, ] sample_mean <- mean(sample_df$price, na.rm = TRUE) sample_sd <- sd(sample_df$price, na.rm = TRUE) n <- nrow(sample_df) sem <- sample_sd / sqrt(n) desired_margin_of_error <- _____ critical_value <- desired_margin_of_error / sem confidence_level <- 2 * pnorm(critical_value) - 1 lower_bound <- sample_mean - desired_margin_of_error upper_bound <- sample_mean + desired_margin_of_error cat("Sample Mean:", sample_mean, "\n") cat("Sample Standard Deviation:", sample_sd, "\n") cat("Sample Size:", n, "\n") cat("Standard Error of the Mean:", sem, "\n") cat("Critical Value:", critical_value, "\n") cat("Desired Margin of Error:", desired_margin_of_error, "\n") cat("Corresponding Confidence Level:", confidence_level * 100, "%\n") cat("Confidence Interval with Margin of Error $1: [", lower_bound, ",", upper_bound, "]\n")
how to find sample size necessary to achieve confidence level for given margin of error
library(stats) name of dataset_df <- read.csv set.seed(123) # For reproducibility sample_indices <- sample(nrow(airbnb_df), # of samples of tuples from data frame) sample_df <- airbnb_df[sample_indices, ] sample_sd <- sd(sample_df$price, na.rm = TRUE) desired_margin_of_error <- __ confidence_level <- _____ alpha <- 1 - confidence_level z_score <- qnorm(1 - alpha / 2) required_sample_size <- (z_score * sample_sd / desired_margin_of_error)^2 cat("Sample Standard Deviation:", sample_sd, "\n") cat("Critical Value for 90% Confidence Level:", z_score, "\n") cat("Desired Margin of Error:", desired_margin_of_error, "\n") cat("Required Sample Size:", ceiling(required_sample_size), "\n")
how to find sample mean and standard deviation
library(stats) nameofdataset_df <- read.csv set.seed(123) sample_df <- name of dataset_df[sample(nrow(nameofdataset_df), # of samples of tuples from data frame), ] sample_mean <- mean(sample_df$target variable, na.rm = TRUE) sample_sd <- sd(sample_df$target variable, na.rm = TRUE)
confidence interval for proportion
lower_bound <- p_hat - MOE upper_bound <- p_hat + MOE
how to find confidence interval
lower_bound <- sample_mean - margin_of_error upper_bound <- sample_mean + margin_of_error
how to find margin of error
margin_of_error <- critical_value * sem
confidence interval
mean ± MOE
alternative hypothesis
mean(df[CAT==v2,]$NUM > mean(df[CAT==v1,]$NUM)