Data
We have run permutation test 10000 times. We got 500 tests with Dnull larger than 1. What observed value of D would result in pvalue less than 5%? A. Dnull>1 B. Dnull<1 C. Dnull>5
A
We observe that Texting in class matters for your final score. Which z-test should be applied to evaluate the hypothesis that our observation is true in general. A. Two-sided test B. One-sided test
A
. IQR is an abbreviation for which of the following quantities? (Select the answer that makes the most sense for this class.) (a) Interval quantity ratio. (b) Internal query rate. (c) International query register. (d) Interquartile range.
D
The larger the z-value is... A. The smaller the p-value B. The larger the p-value C. Impossible to say anything about p-value
A
Add a column to previous dataset Test[,3] = "" Test[3] = "" Test[2] = "" Test[,2] = ""
A
How would you create a data frame with these two columns? firstName <- c('abc','def','efg') lastName <- c('xyz','wxy','uvw') test<- data.frame(firstName, lastName) test<- data.frame(firstname lastname)
A
If p-value = 0 A. We reject null hypothesis B. We fail to reject null hypothesis
A
In our data set for 2018 data 101 class results we clearly saw that Students with GPA >3.0 get higher score in data 101 than students with GPA <=3.0. We want to back this observation up - by calculating p-value. What is NULL hypothesis? A.There is no difference in score of high GPA and low GPA students B. Students with GPA >3.0 get higher score in data 101 than students with GPA <=3.0
A
Neeles decided to run a z-test instead. For D=5, he found the z-score to be equal to 1.7sd. We know that for a normal distribution, ~68% of the data is within 1sd from the mean. ~86% of the data within 1.5sd from the mean and ~95% of the data is within 2sd from the mean. What can we say about the p-value? P < 0.14 P >= 0.14 0.003 <= p <= 0.01
A
Permutation test function requires one variable to be numerical and another categorical A. True B. False
A
The goal of a permutation test is to: (a) Show how often the observed results could happen by random chance. (b) Prove that the order in which elements are added to the dataset is irrelevant. (c) Show that every permutation of the dataset gives the same results. (d) Prove that the null hypothesis is true.
A
We observe that average traffic in higher on weekdays than on weekends. We would like to back this up by calculating p-value What is NULL hypothesis? A. There is no difference between weekend and weekday traffic B. Traffic is higher on weekdays than on weekends
A
We run 1000 permutation tests and 950 of these tests show mean(Score, AskQuestions='Often'>mean(Score, AskQuestions='Never) + 5 Alternative Hypothesis = "If you ask questions often your score is higher" What can we say about p-value? A. p<0.05 B. p<0.95 C. p>0.95 D. None of the above
A
What would R say, if c("a",1,T) is entered to the console? (a) [1] "a" "1" "TRUE" (b) [1] "b" 1 T (c) [1] "a" "1" "T" (d) [1] "a" 1 T (e) Something else.
A
What would R say? x<- 1:4 y<- 2:9 x+y 3 5 7 9 7 9 11 13 1 2 3 4 2 3 4 5 6 7 8 9 Error
A
What would the following code print? M <- 1:10 M 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 Error
A
Which plot would you use to check if Professor Moody assigns grades solely on the basis of score in class? A. Boxplot B. Histogram C. Piechart D. Mosaic plot E. Bargraph
A
assume that the following data frame has been entered into R: t <- data.frame(x=c(1,2,3),y=c(3,2,1),z=c("a","b","c")) In each problem, what would R say if you entered the given command in the console? Choose from the following possible answers. t[,"x"] (a) [1] 1 2 3 (b) x y z 1 1 3 a (b) y x 1 3 1 2 2 2 3 1 3 (b) [1] 3 2 1 (e) Something else
A
assume that the following data frame has been entered into R: t <- data.frame(x=c(1,2,3),y=c(3,2,1),z=c("a","b","c")) In each problem, what would R say if you entered the given command in the console? Choose from the following possible answers. t$x (a) [1] 1 2 3 (b) x y z 1 1 3 a (b) y x 1 3 1 2 2 2 3 1 3 (b) [1] 3 2 1 (e) Something else
A
data.frame(Day=c('weekday', 'weekend'), Conditions =c('sunny','rainy','cloudy') what would R say? a. error b. show a data frame with dim = 2,3 c. show a data frame with dim = 3,2
A
determine which type of plot would be best to use for visualizing/plotting the decsribed data. Choose from the following answers: (a) Box plot (b) Bar plot (b) Scatter plot (c) Mosaic plot (c) Histogram The distribution of annual income for 500 individuals who live in NJ and have a post-high school education
A
m<-data.frame(first<-c('A', 'B'), second <-c(1:10) ) A. Correct R code B. Not correct R code
A
name<- c('abc','def','efg') sex<- c('M','F','M') score<- c(100,99,98) test<-data.frame(name, sex, score) How would you add a new column to test? test$age<- c(20,21,22) test.age<- c(20,21,22) You cannot add a new column
A
weather =data.frame(Day=c('weekday', 'weekend'), Conditions =c('sunny','rainy','cloudy') weather what would R say? a. error b. show a data frame with dim = 2,3 c. show a data frame with dim = 3,2
A
In order to check whether indeed Holland tunnel traffic is higher than Lincoln traffic (H>L) instead of permutation test, you can use z-test. What data is sufficient to calculate p-value for hypothesis that H>L a. Means of traffic for Lincoln and Holland tunnels respectively b. Standard deviations of traffic for Lincoln and Holland tunnels respectively c. Standard deviations of traffic for Lincoln and Holland tunnels along with the number of records for both tunnels respectively d. Overall standard deviation of all recorded traffic volumes for both tunnels as well as total number of records for both tunnels
A C
For Problems 15-19 determine which type of plot would be best to use for visualizing/plotting the described data. Choose from the following answers (a) Box plot (b) Bar plot (c) Scatter plot (d) Mosaic plot (e) Histogram (f) piechart (g) stacked bar plot (h) grouped by plot The distribution of scores for latecomers to the class in professor Moody data set
A E
Let traffic in Holland and Lincoln tunnels in our sample be distributed according to Holland --- rnorm(50, mean=60, sd=10) Lincoln - rnorm(50, mean=65, sd = 20) Alternative Hypothesis = Traffic in Lincoln tunnel is higher than traffic in Holland tunnel If sd of Lincoln tunnel sample goes up, say from 20 -> 40, the p-value would A. Go up B. Go down C. Hard to say
A.
. Is this correct R code: moody$PF<-"" moody[moody$GRADE!='F',]$PF <- 'pass' moody[moody$GRADE==F,]$PF <- 'fail' A. True B. False
B
Can p value be very low (say 0.0000001) and we still cannot accept the alternative hypothesis? A. No, than we need always to reject null hypothesis B. Can happen if observation sample is biased C. Can happen when sample is too small
B
Central limit theorem is about A. Distribution of samples B. Sample means distribution C. Distribution of standard deviations of means D. Limit of the center of the distribution
B
Every distribution converges to normal (Bell curve) when the number of observations growth to infinity A. True B. False
B
For Problems 15-19 determine which type of plot would be best to use for visualizing/plotting the described data. Choose from the following answers (a) Box plot (b) Bar plot (c) Scatter plot (d) Mosaic plot (e) Histogram (f) piechart (g) stacked bar plot (h) grouped by plot The average score of "frequent" smartphone users vs average score of "frequent" question askers in professor Moody data set
B
If p value is equal to 1, what does it mean? A. We reject null hypothesis B. We fail to reject the null hypothesis
B
In the plot from the previous questions, the solid line in the middle of the boxes indicates: (a) The mean. (b) The median. (c) The mode. (d) The variance
B
Let the observed difference D= 2.5, We have run 10000 permutation test with the following distribution. 1000 tests showed Dnull larger than 3 and 2000 tests showed Dnull>2. Estimate the p value. A. p-value is less than 0.1 B. p-value is between 0.1 and 0.2 C. p-value is between 0.2 and 0.3 D. p-value is between 0.25 and 0.3
B
Problem 2. (what would R say?) u<-c(1:10) w <-c(1,-1,3) u[w>0] what would R say? a. error b. 1 3 4 6 7 9 10 c. 1 3 4 5 6 7 8 9 10
B
The goal of a permutation test is to: A. Prove that the null hypothesis is true. B. Show how often the observed results could happen by random chance C. Prove that order in which elements are added to the dataset is irrelevant D. Show that every permutation of the dataset gives the same result
B
The goal of a permutation test is to: (a) Prove that the null hypothesis is true. (b) Show how often the observed results could happen by random chance. (c) Prove that the order in which elements are added to the dataset is irrelevant. (d) Show that every permutation of the dataset gives the same results
B
To use a z-test for difference of means of two populations: A. We need a standard deviation of union of two populations B. We need standard deviation of each of the two populations
B
We are testing a hypothesis that Rutgers graduates make more than Princeton graduates after 10 years from graduation, Our data shows Rutgers graduates make D=$6500 more annually than Princeton graduates. In disbelief (what a deal, so much less in tuition and debt, and more in earnings!) we decide to run permutation test. (We run permutation test 10000 times, and in 250 cases we get D > 7000 and in 500 cases, D > 6000. What can you say about estimated p-value of hypothes that Rutgers graduates make more than Princeton graduates after 10 years from graduation? A. p<0.05 B. 0.025<p<0.05 C. p>0.05
B
We got negative z, z=-4 What does it mean? A. Fail to reject null hypothesis B. Alternative hypothesis needs to be reversed (i.e instead A<B, B<A) C. We will get very large p value, almost 1
B
We got p value of 0.0001 and showed that French people are on average more happy than British. There are 200 nations represented in HAPPINESS TABLE A. We should accept the finding, p -value is much smaller than 5% B. We should count the number of all possible hypotheses (pairs of countries) and apply Bonferroni coefficient C. We should ask the speaker how many hypotheses did s/he try prior to this one?
B
We observe that average income of immigrants is higher than average income of non-immigrants. We want to validate this observation by calculating p-value. What is null hypothesis A.Average income of immigrants is higher than average income of non-immigrants B. There is no difference in average income between immigrants and non-immigrants C. Average income of immigrants is lower than average income of non-immigrants
B
What is Bonferroni coefficient? A. If p< Bonferroni coefficient than we fail to reject null hypothesis B. It is used to correct for multiple hypothesis. It is not enough for p value to be less than 5% anymore, it has to be smaller than 5% divided by Bonferonni coeffcient C. It is the probability that alternative hypothsis is true
B
What is law of small numbers A. If samples are too small we cannot conclude anything B. Extreme results occurs more often for small examples
B
When z-value goes up, p value goes A. up B. down
B
Which of the following is a correct way to select 2 rows from the data frame traffic? (a) traffic[1,2] (b) traffic[1:2,] (c) traffic[,c(1,2)] (d) traffic[c(1,2)] (e) traffic[2]
B
Which plot is used to display frequency distibution for pair of categorical variables? A. Boxplot B. Mosaic Plot C. Piechart D. Histogram
B
ask_question_grade <- tapply(moody$SCORE, moody$ASKS_QUESTIONS,max) A. maximum score a student got in the class B. maximum score a student got for each of the values of ASK QUESTIONS attributes C. Error
B
assume that the following data frame has been entered into R: t <- data.frame(x=c(1,2,3),y=c(3,2,1),z=c("a","b","c")) In each problem, what would R say if you entered the given command in the console? Choose from the following possible answers. t[,2:1] (a) [1] 1 2 3 (b) x y z 1 1 3 a (b) y x 1 3 1 2 2 2 3 1 3 (b) [1] 3 2 1 (e) Something else
B
assume that the following data frame has been entered into R: t <- data.frame(x=c(1,2,3),y=c(3,2,1),z=c("a","b","c")) In each problem, what would R say if you entered the given command in the console? Choose from the following possible answers. t[1,] (a) [1] 1 2 3 (b) x y z 1 1 3 a (b) y x 1 3 1 2 2 2 3 1 3 (b) [1] 3 2 1 (e) Something else
B
data.frame(Day=c('weekday', 'weekend'), Conditions =c('sunny','rainy', 'cloudy','snow')) what would R say? a. show a data frame with dim = 2,4 b. show a data frame with dim = 4,2 c. error
B
determine which type of plot would be best to use for visualizing/plotting the decsribed data. Choose from the following answers: (a) Box plot (b) Bar plot (b) Scatter plot (c) Mosaic plot (c) Histogram Annual income (in dollars) and education level (high school or post-high school) for 1000 individuals from the census
B
determine which type of plot would be best to use for visualizing/plotting the decsribed data. Choose from the following answers: (a) Box plot (b) Bar plot (b) Scatter plot (c) Mosaic plot (c) Histogram The distribution of midterm grades for this class
B
p-value is A. Probably that null is true B. Probability that observed result can be obtained under condition that null hypothesis is true C. Probability hypothesis is false D. Standard deviation divided by number of observation
B
weather =data.frame(Day=c('weekday', 'weekend'), Conditions =c('sunny','rainy', 'cloudy','snow')) what would R say? a. show a data frame with dim = 2,4 b. show a data frame with dim = 4,2 c. error
B
weather =data.frame(Day=c('weekday', 'weekend', 'weekday', 'weekend'), Temperature =c(55,61,62,47)) To select days where temperature is less than 60 you will write a)weather[weather$Temperature<60] b)weather[weather$Temperature<60,] c) weather(weather$Temperature<60)
B
weather =data.frame(Day=c('weekday', 'weekend', 'weekday', 'weekend'), Temperature =c(55,61,62,47)) u<-rep('warm',4) u[weather$Temperature<60]<-'cold' u what would R say? a) '1','0','0','1' b) "cold" "warm" "warm" "cold" c) error d) show data frame of dim =2,2
B
weather =data.frame(Day=c('weekday', 'weekend', 'weekday', 'weekend'), Temprature =c(55,61,62,47)) u<-rep('warm',4) u[weather$Temprature<60]<-'cold' u a) '1','0','0','1' b) "cold" "warm" "warm" "cold" c) error d) show data frame of dim =2,2
B
For N hypothesis it would be 0.05/N
Bonferroni Correction
For one hypothesis the acceptable p-value is widely viewed as p=0.05
Bonferroni Correction
If an exam has enough questions, then you're bound to get at least one question right by guessing, even if you don't know anything about the subject.
Bonferroni Correction
A p-value of 0.05 means: (a) There is a 5% chance that our claim is correct. (b) There is a 95% chance that our claim is correct. (c) If our claim was incorrect, then we'd expect to encounter a dataset like the one we observed no more than 5% of the time. (d) The results are random.
C
A p-value of 0.05 means: (a) There is a 95% chance that our claim is correct. (b) There is a 5% chance that our claim is correct. (c) If null hypothesis was true, then we'd expect to encounter a result equal or more extreme that we observed no more than 5% of the time. (d) The results are random.
C
Central limit theorem assumes a) Normal distribution of data b) Uniform distribution of data c) Does not make any assumptions about data distribution
C
For Problems 15-19 determine which type of plot would be best to use for visualizing/plotting the described data. Choose from the following answers (a) Box plot (b) Bar plot (c) Scatter plot (d) Mosaic plot (e) Histogram (f) piechart (g) stacked bar plot (h) grouped by plot The price of wine vs its rating
C
Let traffic in Holland and Lincoln tunnels in our sample be distributed according to Holland --- rnorm(50, mean=60, sd=10) Lincoln - rnorm(50, mean=65, sd = 20) Alternative Hypothesis = Traffic in Lincoln tunnel is higher than traffic in Holland tunnel If mean of Holland tunnel sample goes up, say from 60 -> 63, the p-value would A. Go down B. Stay the same C. Go up
C
Our hypothesis that average life expectancy is higher for rich countries than for poor countries will be rejected if A. We accept null hypothesis that " average life expectancy is same for rich countries and for poor countries" B. We fail to reject that "average life expectancy is higher for rich countries than for poor countries" C. We fail to reject null hypothesis that " average life expectancy is same for rich countries and for poor countries"
C
We run permutation test 10,000 times. Observed difference of means is 8.5. We see that there are 1000 results with observed difference D >5 and 200 permutation results with observed difference D > 10. What can we conclude about p-value? A. p>0.2 B. p>1000 C. p>0.02 and p<0.1
C
We run permutation test 10,000, what is the smallest p-value we can obtain? A. p=0.01 B. p=0 C. p=0.00001
C
Which of the following R functions are you most likely to use in a permutation test, as discussed in class? (a) merge() (b) quantile() (c) sample() (d) par() (e) sum()
C
Which of the following is a correct way to select 2 columns from the data frame traffic? (a) traffic[1,2] (b) traffic[1:2,] (c) traffic[,c(1,2)] (d) traffic[c(1,2)] (e) traffic[2]
C
Which of these functions is not related to plotting? (a) abline() (b) histogram() (c) par() (d) legend() (e) sample()
C
Which plot is used for frequency distribution of categorical variable? A. Histogram B. Scatterplot C. Bargraph D. Boxplot
C
Which plot shows grade distribution for students who text frequently in class? A. Scatter Plot B. Boxplot C. Bargraph D. Mosaic Plot
C
Which plot shows score distribution of students depending on how often they ask questions (often, rarely, never) A. Bargraph B. Histogram C. Boxplot D. Mosaic Plot E. Scatter plot
C
determine which type of plot would be best to use for visualizing/plotting the decsribed data. Choose from the following answers: (a) Box plot (b) Bar plot (b) Scatter plot (c) Mosaic plot (c) Histogram The height and average points-per-game for all basketball players in the NBA during the 2011-2012 season.
C
determine which type of plot would be best to use for visualizing/plotting the decsribed data. Choose from the following answers: (a) Box plot (b) Bar plot (b) Scatter plot (c) Mosaic plot (c) Histogram Cancer patients undergoing a new experimental treatment and those whose cancer has recurred.
C
is a statistical theory states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population
Central Limit Theorem
MeaNs of data samples turn to follow normal curve
Central Limit theorem
For any data distribution 1-1/k^2 of data lies within k standard deviations from the mean (k>1)
Chebyschev Theorem
A hypothesis test is done in which the alternative hypothesis is that more than 10% of a population is left-handed. The p-value for the test is calculated to be 0.25. Which statement is correct? A. We can conclude that more than 10% of the population is left-handed. B. We can conclude that more than 25% of the population is left-handed. C. We can conclude that exactly 25% of the population is left-handed. D. We cannot conclude that more than 10% of the population is left-handed.
D
Problem 1. Which of the following R functions would you use to prepare a plot of average score for each grade in Professor Moody puzzle? a. merge() b. histogram() c. sample() d. tapply()
D
Rohit travels to Livingston campus by bus. He finds that the average time taken from College Ave->Livingston is 10 mins whereas average time taken from Livingston -> College Ave is 15 mins i.e. difference of 5 mins (D=5). We ask Devansh to run a permutation test 100,000 times. He reports that 4000 permutations show D>6 and 6000 permutations show D>4. What can we say about the p-value? p > 0.05 p <= 0.05 p <= 0.04 0.04 <= p <= 0.06
D
Suppose v <- c(-1,0,3,2,-10) is entered into the R console. What would R say if you enter v[v>0]? (a) [1] FALSE FALSE TRUE TRUE FALSE (b) [1] -1 0 3 2 -10 (c) [1] 3 4 (d) [1] 3 2 (e) Something else
D
Suppose v <- c(-2,0,2,-5) is entered into the R console. What would R say if you enter v[v>0]? (a) [1] TRUE FALSE FALSE TRUE (b) [1] 1 0 0 1 (c) [1] 1 4 (d) [1] 2 //Partial credit given for some answers
D
Suppose you run a permutation test to determine if the mean difference between two groups is zero or nonzero. A boxplot representing the available data from the two groups is given to the right. What can you conclude about the results of the permutation test? (a) The p-value is greater than 0.1. (b) The p-value is less than 0.05. (c) A permutation test is not appropriate for this setting. (d) Nothing. I need more information.
D
Which R function would you use to plot average score for each grade in professor Moody class? A. table B. read.csv C. data.frame D. tapply
D
Which of the following extracts first four element from the following vector ? X<- c(0,1,2,3,4,5,6,7) X[0:4] X[1:4] X[c(1,2,3,4)] All the above
D
assume that the following data frame has been entered into R: t <- data.frame(x=c(1,2,3),y=c(3,2,1),z=c("a","b","c")) In each problem, what would R say if you entered the given command in the console? Choose from the following possible answers. t["x",] (a) [1] 1 2 3 (b) x y z 1 1 3 a (b) y x 1 3 1 2 2 2 3 1 3 (b) [1] 3 2 1 (e) Something else
E
For Problems 15-19 determine which type of plot would be best to use for visualizing/plotting the described data. Choose from the following answers (a) Box plot (b) Bar plot (c) Scatter plot (d) Mosaic plot (e) Histogram (f) piechart (g) stacked bar plot (h) grouped by plot The distribution of Presidential candidates support among women, men, college educated, in NY and NJ according to a hypothetical poll
G H
Counties in which incidence of kidney cancer is lowest are mostly rural, sparsely populated and located in traditionally Republican states of Midwest, South
Law of Small Numbers
Alternate to Z test
Permutation Test
The _______- ________ can be determined when the population mean and standard deviation are both known. z score is how many standard deviation to the center of curve.
bell curve
PermutationTestSecond::Permutation(__,"___","____",______,"_____","______")
d cat val 10000 GroupA GroupB
Write a data frame to produce the following output: Day Weather Weekday Sunny Weekend Rainy Weekday Rainy
day<- c('Weekday','Weekend','Weekday') weather<- c('Sunny','Rainy','Rainy') Test<- data.frame(day, weather)
According to central limit theorem, the distribution of D_null is a ________ ___________ when n goes infinity.
normal distribution
is approached very quickly as n increases, Note that nis the sample size for each mean and not the number of samples
normal distribution
The _______- is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true
p value
The histograms of __________ _________match the shape of normal distribution
sample means
