STAT 107 (Almost) All CBTF Exam 1 Practice Questions for Self-Study

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The Washington Post recently reported on a study that links restricting screen time for kids to higher mental performance. The study assessed the behavior of 4,500 children, ages 8 to 11, by looking at their sleep schedules, how much time they spent on screens, their amount of exercise, and analyzed how those factors affected the children's mental abilities. They found that kids who spent less than 2 hours on screens had superior mental performance than those who spent more than 2 hours on screens. Say you think that "Amount of Exercise" could be a confounder because children who exercise are more likely to spend less time on screens and exercise stimulates brain development. What's the best way to handle exercise being a confounder in this particular study?

At the end of the study, stratify based on exercise. Compare those who exercise the same amounts in the treatment and control groups, so you are comparing groups that are as alike as possible. (Remember, you always stratify at the END of a study).

Consider this list of numbers: 90, 80, 50, 80, 120 If you need to round, make sure to provide at least three decimal places. What is the average? What is the median? What is the SD?

Average: 84 Median: 80 SD: 25.0998 To find the SD, for each data point we subtract the average and square the difference, add up all those square and divide by (5 - 1), the number of data points minus 1, and square root the whole thing.

A study published in the journal Mayo Clinic found that children who have more than one surgery with general anesthesia before their second birthday have a higher risk of developing ADHD than those who never had general anesthesia. The researchers examined the medical records of 341 children diagnosed with ADHD to find out who had under Results: They found that 18 percent of children who had 2 or more surgeries with general anesthesia when they were babies eventually developed ADHD compared to only 7 percent of children who had no surgeries with general anesthesia as babies. Based only on the info below, state whether the following are confounders, causal links, or neither: Anesthesia-Induced Neurological Damage- Exposure to anesthesia may damage brain development, leading to ADHD

Causal Link

Prof. Karle has included one particular multiple-choice question on the final in large lectures for the past 2 years (approximately 1000 students in the class). She would like to determine whether or not a new phrasing of this question would improve student responses. Prof. Karle decide to include the question with the new phrasing on this semester's final and then compare the percentage of students who answered the question correctly this semester to the percentage of students who answered the question correctly last semester with the old phrasing. Which of the following represents the best improvement on the design of the experiment?

Give half of this semester's students a version of the final with the old phrasing and half of this semester's students a version with the new phrasing. Use a random procedure (like flipping a fair coin) to decide who gets which version. Compare the results from these two groups. Key phrase to look for: "Use a **RANDOM** procedure (like flipping a fair coin)"

A study tried to see whether men's mental abilities diminished when they think they are observed by a woman. Subjects were first given a test, then performed a lip-reading task, then given another test. Some subjects were told that a male was observing their lip-reading, and others were told a female did the observing. (Which sex was decided randomly.) The observers sent instant messages, but otherwise the subject never saw or heard from their observer, so didn't know for sure the observer's sex. Neither the subjects, nor their evaluators knew who was in the treatment group or the control group. Men who thought they were being observed by a female had worse scores on the second test than on the first. Men who thought they were being observed by a male had about the same scores on the first and second tests. (Women scored equally well on both tests no matter what the sex of their observer.) How would you characterize this study? Is it well-designed or poorly designed?

It is quite well-designed. This type of experiment is ideal.

A recent study compared two different treatments for repairing a torn knee ligament. The subjects were 32 active, young adult volunteers who had acute knee ligament injuries. They were randomly divided into two groups: Group A received physical therapy and surgery, while Group B received only physical therapy. No group received a fake surgery. Evaluators who were aware of which patients were in which group rated the subjects on knee strength, stability, flexibility, etc., over a two-year period and found that both groups said that they felt better, but there were no significant differences on any measure between the two groups. Out of the following, which represents the best improvement for this study?

Make sure that the evaluators of the study are not aware of which group the subjects are in.

A study reported in Time Magazine claimed that people who abstain from drinking alcohol die sooner than those who drink moderately to heavily. The study tracked 1,824 subjects aged 55-65 for 20 years and found that those who didn't drink any alcohol at all had the highest death rate (69%), compared to only 41% for moderate drinkers and 60% for heavy drinkers. Based ONLY on the information above, identify if the following factor is a confounders that mix up the study, a causal links that explain the conclusion, or neither. Age: Older people are more likely to die sooner than younger people.

Neither (Remember to draw the diagram if you get stuck.)

A study reported in Time Magazine claimed that people who abstain from drinking alcohol die sooner than those who drink moderately to heavily. The study tracked 1,824 subjects aged 55-65 for 20 years and found that those who didn't drink any alcohol at all had the highest death rate (69%), compared to only 41% for moderate drinkers and 60% for heavy drinkers. Based ONLY on the information above, identify if the following factor is a confounders that mix up the study, a causal links that explain the conclusion, or neither. Religion: The non-drinking group may have included more people whose faith prevents them from drinking.

Neither, religion has no connection to a higher death rate.

Researchers tracked 119,000 men and women over a 30-year period and found that those who ate nuts every day lived longer than those who didn't eat nuts every day. The people who ate nuts were less likely to develop heart disease and cancer as well. Was there a placebo?

No

Researchers tracked 119,000 men and women over a 30-year period and found that those who ate nuts every day lived longer than those who didn't eat nuts every day. The people who ate nuts were less likely to develop heart disease and cancer as well. Were there randomized controls?

No

The Washington Post recently reported on a study that links restricting screen time for kids to higher mental performance. The study assessed the behavior of 4,500 children, ages 8 to 11, by looking at their sleep schedules, how much time they spent on screens, their amount of exercise, and analyzed how those factors affected the children's mental abilities. They found that kids who spent less than 2 hours on screens had superior mental performance than those who spent more than 2 hours on screens. Did this study have a placebo?

No

A study published in the journal Mayo Clinic found that children who have more than one surgery with general anesthesia before their second birthday have a higher risk of developing ADHD than those who never had general anesthesia. The researchers examined the medical records of 341 children diagnosed with ADHD to find out who had undergone a surgical procedure with anesthesia before they were 2. Results: They found that 18 percent of children who had 2 or more surgeries with general anesthesia when they were babies eventually developed ADHD compared to only 7 percent of children who had no surgeries with general anesthesia as babies. This study is an example of a...

Observational Study

A group of students were surveyed on how many ounces of milk they drink in a week. Below is a histogram of the resulting data. Class intervals include the left endpoint but not the right (For example, someone who drinks exactly 10 oz. of milk a week would fall in the 10-20 block, not the 5-10 block). What percentage of students drink ten or more oz. per week? What is the code to create a histogram in python if the data is loaded into a DataFrame called df? [PUT IMAGE HERE LATER]

Percantage: 50 Just do the math 4head Code: df.hist(column = 'colname')

A recent study compared two different treatments for repairing a torn knee ligament. The subjects were 32 active, young adult volunteers who had acute knee ligament injuries. They were randomly divided into two groups: Group A received physical therapy and surgery, while Group B received only physical therapy. No group received a fake surgery. Evaluators who were aware of which patients were in which group rated the subjects on knee strength, stability, flexibility, etc., over a two-year period and found that both groups said that they felt better, but there were no significant differences on any measure between the two groups. Since there were only 32 subjects in this study, after the random division, out of the 16 people in group A, only 5 of them were female. What's the best method that the researcher could use to prevent this?

The researcher could "block" the subjects based on gender first, then randomly assign half of the males to Group A and half to Group B. They would then do the same thing with the females.

Harry's letter begins with a series of numbers: 1, 2, 4, 6, 8, 11, 16, 16, 17, 17, 18, 20 To find The Magic Quantile, given a sequence of numbers, the platform where we can get on the train to Hogwarts is the Q3 value of the data. (Make sure to calculate the best possible approximation of Q3 if it is not an exact number.) (In other words, just find Q3)

The sorted data is 1, 4, 5, 6, 7, 9, 10, 12, 12, 13, 16, 19. Our data set has 12 data points, so the true location of is (#data points -1) ** %-tile we want to find = 11 ** 0.75 = 8.25. So we want to find the 8.25th number. Q3 = low + ((high-low) ** fraction%) = 12+ (13-12) ** 0.25 = 12.25. When counting to find the 8th number, start at 0.

You and a group of 14 other friends started a new business and you're the CEO! The average salary of the 15 employees is $69,000.00 and the median salary is $78,000.00. As the CEO, you are the top earner. If you give yourself a $93,000.00 raise, what happens to the standard deviation of the salaries?

The standard deviation gets larger

Researchers tracked 119,000 men and women over a 30-year period and found that those who ate nuts every day lived longer than those who didn't eat nuts every day. The people who ate nuts were less likely to develop heart disease and cancer as well. Is it appropriate to conclude that eating nuts causes people to live longer?

We aren't sure. The study shows there is an association between eating nuts and living longer, but there could be a confounder.

Prof. Karle has included one particular multiple-choice question on the final in large lectures for the past 2 years (approximately 1000 students in the class). She would like to determine whether or not a new phrasing of this question would improve student responses. Prof. Karle decide to include the question with the new phrasing on this semester's final and then compare the percentage of students who answered the question correctly this semester to the percentage of students who answered the question correctly last semester with the old phrasing. We know historical controls aren't the best because past and present conditions are different! Say we want to use the 40 students in the course this semester and give half of them the new phrasing and half of them the old phrasing. We know that 20 students are "A students" and 20 are "B students". What's the best way to do this?

We could first divide the students into "A", "B", etc. groups and then randomly assign half of each group to treatment and half to the control. (Remember! Block at the beginning, stratify at the end).

Write the Python code to find the average GPA of each subject area at The University of Illinois. Your solution must include an Average GPA variable (column).

a = df.groupby("Subject").agg("sum").reset_index() a["Average GPA"] = (a["A+"] * 4 + a["A"] * 4 + a["A-"] * 3.67 + a["B+"] * 3.33 + a["B"] * 3 + a["B-"] * 2.67 + a["C+"] * 2.33 + a["C"] * 2 + a["C-"] * 1.67 + a["D+"] * 1.33 + a["D"] * 1 + a["D-"] * 0.67 + a["F"] * 0) / (a["Count"]) df_average_gpa = a (When you see "Average GPA", think of the big long tedious answer. This is the one of the two answers where you are going to use ...agg("sum")..) Also! Memorize the GPA scale for this just in case.

Write the Python code to calculate the admission rate *for each Major* and store it in the column Admission Rate.

accepted = df[df.Admission == "Accepted"] accepted = accepted.groupby("Major").agg("count").reset_index() df_applicants = df.groupby("Major").agg("count").reset_index() df_applicants["Admission Rate"] = accepted["Admission"] / df_applicants["Admission"]

Write the Python code to calculate the major that admitted the fewest students.

accepted = df[df.Admission == "Accepted"] df_applicants = accepted.groupby("Major").agg("count").reset_index() df_applicants = df_applicants.nsmallest(1, "Admission") (As seen below, this also works without the creation of an "accepted" variable, just keep in mind that both versions work).

Write the Python code to calculate the total number of ACCEPTED applications for each Major and store it in the column Accepted.

accepted = df[df.Admission == "Accepted"] df_applicants = accepted.groupby("Major").agg("count").reset_index() df_applicants["Accepted"] = df_applicants["Admission"]

Write the Python code to store all students in Data Science DISCOVERY who are currently freshman into the DataFrame df_answer.

df_answer = df[df["School Year"] == "Freshman"]

Write the Python code to calculate the total number of applications for each Major and store it in the column Applicants.

df_applicants = df.groupby("Major").agg("count").reset_index() df_applicants["Applicants"] = df_applicants["Admission"]

Write the Python code to calculate the major that admitted the most students.

df_applicants = df.groupby("Major").agg("count").reset_index() df_applicants = df_applicants.nlargest(1, "Admission")

Write the Python code to store all students who were accepted into major C at UC-Berkeley.

df_applicants = df[(df.Admission == "Accepted") & (df.Major == "C")]

Write the Python code to store all students who were accepted into UC-Berkeley.

df_applicants = df[df.Admission == "Accepted"]

Write the Python code to store ALL professional-only courses into df_courses. A professional-only course is with a number including or between 600 and 699.

df_courses = df[(df.Number >= 600) & (df.Number <= 699)]

Write the Python code to store all upper-level ADV courses into df_courses. A upper-level ADV course is a ADV course with a course number including or between 300 and 499

df_courses = df[(df.Subject == "ADV") & (df.Number >= 300) & (df.Number <= 499)]

Write the Python code to store ALL courses that have a course subject of either CS or STAT (ex: CS 100, STAT 100, CS 101, STAT 101, etc...).

df_courses = df[(df.Subject == "CS") | (df.Subject == "STAT")]

Write the Python code to store ALL of the courses with the course number of exactly 200 (ex: CS 200, STAT 200, etc...) into the Python variable df_courses.

df_courses = df[df.Number == 200]

Write the Python code to store ALL courses with the course subject ANTH (ex: ANTH 100, ANTH 101, etc...) into the Python variable df_courses.

df_courses = df[df.Subject == "ANTH"]

Write the Python code to find the 7 subjects that give the most A-s at Illinois.

df_grade_count = df.groupby("Subject").agg("sum").reset_index() df_grade_count = df_grade_count.nlargest(7, "A-") (Remember , this is the one of the two answers where you are going to use ...agg("sum")..)

Write the Python code to find the standard deviation in the number of pairs of shoes that Freshman students in Data Science DISCOVERY have and store that number in standard_deviation:

freshmen = df[df["School Year"] == "Freshman"] standard_deviation = freshmen["Shoes Owned"].std() ***Question for self: Do all std() questions only require two lines of code?*** So far: answer is yes

Write the Python code to find the standard deviation in the number of hours studying each week that Senior students in Data Science DISCOVERY have and store that number in standard_deviation:

senior = df[df["School Year"] == "Senior"] standard_deviation = senior["Hours Studying"].std()

Write the Python code to find the average number of siblings that Senior students in Data Science DISCOVERY have and store that number in average:

senior = df[df["School Year"] == "Senior"] average = senior["Siblings"].mean() To find the average, aside from the average GPA question, remember to use .mean()!!

Write the Python code to find the standard deviation in the number of siblings that Sophomore students in Data Science DISCOVERY have and store that number in standard_deviation:

sophomore = df[df["School Year"] == "Sophomore"] standard_deviation = sophomore.Siblings.std()

----- CODING QUESTIONS -----

----- CODING QUESTIONS -----

Exam 1 scores of several students are shown below: 72, 79, 79, 79, 80, 80, 83, 88, 93, 94, 96, 97, 97 How many outliers exist in the data above?

0 1) Find Q1 and Q3 (medians of lower and upper halves of data) 2) Find IQR (Q3 - Q1) 3) Do 1.5 * IQR 4) Do Q1 - that product and Q3 + that product to determine the boundaries for outliers. Numbers equal to those values are not outliers.

Consider a large amount of numeric data that was generated in your Math class. You found the following statistics about the data: The mean is 76 The median is 68 The standard deviation is 2 The variance is 4 If we *divide* every number by 2, what is the new variance of the data?

1.0 Take the number you are multiplying or dividing by and square it.

Consider a large amount of numeric data that was generated in your Gender Studies class. You found the following statistics about the data: The mean is 79 The median is 86 The standard deviation is 2.4 The variance is 5.76 If we divide every number by 5, what is the new mean of the data?

15.8 Simply divide the mean by the the same number.

Consider this list of numbers: 2, 11, 16, 18, 33. The mean is 16 and the median value is 16. If we add 1 to every number, what is the new median?

17

Consider this list of numbers: 5, 5, 8, 31, 36. What is the mean?

17 Use Python (or a calculator if you have access to one) import statistics statistics.mean([1, 2, 3, 4, 5])

Consider this list of samples taken from a larger set: 5, 8, 9, 11, 12. What is the standard deviation of the samples?

2.7386 Use Python to find the standard deviation!! import statistics statistics.stdev([1, 2, 3, 4, 5])

Suppose you have a histogram with four bins. The first three bins have a total area of 78%. What is the area of the fourth bin? Express your answer as a decimal, fraction, or simple mathematical expression. Your fraction does not need to be reduced.

22% Simply subtract 78 from 100, since all four bins need to add up to 100%

Exam 1 scores of several students are shown below: 67, 71, 72, 75, 75, 76, 76, 77, 77, 78, 79, 96, 100 How many outliers exist in the data above?

3 Outliers are data points less than Q1 - 1.5 IQR or greater than Q3 + 1.5 IQR. First, we find that Q1 = 75 and Q3 = 78. Then, the IQR is 78 - 75 = 3. Then the outliers are any pointer greater than 82.5 or less than 70.5, so the answer is 3. My question: How do you find Q1 and Q3?

Consider a large amount of numeric data that was generated in your Chemistry class. You found the following statistics about the data: The mean is 60 The median is 64 The standard deviation is 2 The variance is 4 If we subtract 3 to every number, what is the new variance of the data?

4 The variance does not change with additions or subtractions from every number.

Consider a large amount of numeric data that was generated in your Computer Science class. You found the following statistics about the data: The mean is 55 The median is 57 The standard deviation is 2.7 The variance is 7.290000000000001 If we add 5 to every number, what is the new median value of the data?

62 When adding or subtracting, you just add or subtract the number from the median

According to William Butler Yeats, "She is the Gaelic muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for she is restless, and will not let them remain long on earth." A study designed to investigate this issue examined the age at death for poets across different cultures and genders. Here is part of that data: 20, 80, 20, 50, 60, 70, 30, 90, 70, 80 The average of this data is 57 and the median is 65. If, for every age, we add 18, what will be the new average?

75 Again, with addition, simply add to the average what every number is added by.

At the beginning of a party, your friends get different numbers of candies as follows: 2, 3, 3, 6, 6, 6, 8, 9, 9 You have only 5 candies remaining. You distribute your remaining candies to your friends to create the largest possible median value. What is the new, maximal median value after you give away your 5 candies?

8 1) Find median 2) Add 1 to the median until its location changes or you run out of candies 3) Add 1 to the next number after its location changes, repeat 2 & 3

You and a group of 4 other friends started a new business and you're the CEO! The average salary of the 5 employees is $82,000.00 and the median salary is $91,000.00. As the CEO, you are the top earner. If you give yourself a $53,000.00 raise, what is the new average salary?

92600 1) Multiply the average times # of people 2) Add the raise 3) Divide by # of people

==============================

==============================


Set pelajaran terkait

Period One and Period two Review

View Set

Chapter 26, Asepsis and Infection Control

View Set

Chapter 3 -PrepU - Chronic Illness and End-of-Life Care

View Set

The Gas Laws Unit Test Review and Test 96%

View Set