Stats Exam 1
Which one of these random variables is discrete? Height of an adult male GPA Number of phone calls received in a day
# of phone calls
Choose the probability that best matches the following statement: "This event will occur slightly more often than not."
.6
Spread?
Find minimum and maximum
Does the Z score have units?
NO!
Sample space
Set of all possible outcomes (HHH, HHT, HTH, HTT, TTT, TTH, THT, THH)
The distribution of a random variable shows all possible values the random variable could take and how often they occur.
TRUE
What is the measure of center in the 5 number summary?
The median
What is the notation for the mean (center)
μ (mu) : density curve (population) x: Histogram (sample) - point of symmetry in a normal distribution
What is the notation for standard deviation?
σ : Density curve (population) s : Histogram (sample) - Distance from the mean to the point where curve begins to fall less steeply
How can we remember discrete and continuous variable?
- Continuous could have decimals or any number in any range - Discreet fits in an established limit (zip code, shoe size is 8 or 9 not 8.456, # of coins in my pocket I can't have 3.4 pennies)
3 tools to represent quantitative variables?
- Histogram - Stem plot - dot plot
Open questions
- More difficult to summarize - less restrictive
Why do we need RBD?
- This helps equalize the effects of lurking variables (we classify subjects into blocks based on lurking variables then assign subjects to treatments separately within each block Ex: Male rats, Explanatory V = 5 drugs, Response: Time to complete maze
Why do we plot data?
- We are visual beings - Helps us see outliers (did we miss-mark? A person 7000 inches tall when everyone else is 700) - ALWAYS PLOT YOUR DATA
Histogram
- bins need to be evenly spaced out - Each class has a range and the height will tell us how many - The area of the bar will also tell us **** (how ask TA) - 7-15 classes is reasonable
Non-compliance
- failure to submit to the assigned treatment, refusal to follow the protocol of the experiment Consequence: invalid results (using a volunteer might solve this problem because we are not forcing them)
Non-response
- occurs when an individual chosen for the sample can't be contacted or refuses to participate - EX: Hangups, refusal to mail census forms, vacation
Undercoverage
- occurs when some groups in the population are left out of the process of choosing the sample - EX: Homeless people, those without a phone will not be covered
Question wording
- the way in which survey questions are phrased, which influences how respondents answer them - loaded questions
Blocking should be used whenever
- you want to remove variation associated with the blocking variable from the experimental variation. - individuals are grouped before the experiment begins according to some characteristic that is expected to affect the response variable. -individuals are similar within the blocks but very different from block to block.
How to solve "Find area problems"
1. Area to the left = Use the chart after finding Z value and solve 2. Area to the right, find the value using the Z chart and subtract from 1. 3. Area between C and D = Subtract C from D (d being the value further to the right)
Principles of Valid Experiments
1. Control/Comparison (do we have two different treatments/or a control?) 2. Randomization (are we randomly assigning treatments?) 3. Replication (multiple subjects in each group?) 4. Double-Blinding (Does the subject know what the treatment is? Do the evaluators know) **this is optional
Principles of Data ethics
1. Safety of subjects 2. Informed consent of side-effects 3.Patient data must be kept confidential
"Given x-value find area to the right"
1. Standardize x to get z 2. Use z score on the table of values (the cumulative area) 3. This will give the whole area so now subtract by 1 to get the desired area
Blocking
Division of individuals into homogenous groups (males and females each tested) - Act as a control for variables
Was the salk vaccine ethical?
2nd graders given the vaccine it was unknown if it was effective, 2nd graders not given the vaccine were fine because researchers didn't know if it would help them. - Conclusion: it was ethical because after discovering that the vaccine was effective beyond doubt the study was aborted and the unvaccinated with polio were then given the vaccine
Probability of an outcome
A measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.
Which of the following can be considered a population? Select all that apply. all students at BYU all pigeons in Utah some students from BYU's STAT121 class None of these
All students and all pigeons (some is not an entire population it would be the entire class or all students)
Randomized block design (RBD)
An experimental design where treatments are randomly allocated within each block. - Each block is divided into groups of similar characteristic (tend to be equal in number and treatment) - These help control the effects of variables that define the blocks
Replication
Assign more than one subject to each treatment group
When data are arranged in ascending numerical order, the mean is the value which __________________
Balances the data
response variable (dependent variable)
Characteristic measured on each subject
Closed questions
Closed questions can be biased by the options provided - make sure to include "other" or "unsure"
Suppose a researcher is interested in the average ACT score for high school students in Illinois. She randomly selects 150 high schools and then asks each student in the selected high schools what their ACT score was. What kind of sample is this?
Cluster
What is the difference between cluster sampling and stratified sampling?
Clustering: Random sample of clusters, then sample ALL INDIVIDUALS Stratified: Classify a population into groups (gender/race) and then pull a simple random sample some from EVERY GROUP
Event
Collection of possible outcomes (subset of the sample space) ex: Getting two heads when a coin is tossed three times (HHT, HTH, THH)
Control/comparison
Control lurking variables by including comparison treatments, using homogeneous subjects; used to measure placebo effect
Terms continued
Control: effort to reduce lurking variables confounding: lurking variables cannot be distinguished from effects of factors
Where do we cut a distribution?
Cut it so the area on each side is equal (splitting the cake)
Diagnostic bias
Diagnosis of subjects biased by preconceived notions about effectiveness of treatment - Preconception is confounded by lurking variable - Bias on the side of the evaluator (I believe vitamin C is really good so I might discount some minor effects of a sickness to prove what I want to be the outcome so we need a double blind to avoid this)
Probability samples are samples selected in such a way that
Each member of the population has a chance of being selected and that chance can be computed.
How do we calculate z-score?
Ex: u = 3485 g and o = 425g, a baby who weighs 41225.5g is 1.5 standard deviations above the mean 1. The given value from the sample is the value of x 2. The given u is the mean 3. The standard deviation is on the bottom therefore telling us how many standard deviations we are away from the mean
Treatment
Experimentation value applied to subject = value of factor
Experiment terms
Factor: the controlled variable used by experimenters used to determine the response on response variable
What is the standard deviation rule ? (99.7% rule)
For normally distributed data: - 68% of observations fall within 1 standard deviation of the mean - 95% of observations fall within 2 standard deviations of the mean - 99.&% of observations fall within 3 standard deviations of the mean
Randomization
Neutralize effects of lurking variables by assigning subjects to treatments randomly (ex: subjects do not choose what diet they will have etc.) ***Random assignment is the key not just random selection of subjects, we often use volunteers for experiments
Why wouldn't we want a ton of classes for our histograms?
If it is to specific it might add a lot of noise/choppy histogram - 10 class picture for example would te
How do we compare mean and median?
If there is no skew they are the same, if there is a skew the mean will slide closer to whichever side the skew is (mean is more effected by outliers)
What advantage do histograms have over stem plots?
If we have thousands of inputs a stem plot would be hideous
Subject
Individual to which treatment is applied
Which of the following is NOT a measure of center of the data?
Interquartile range (MEAN IS A MEASURE OF CENTER)
Interviewer
Interviewer influences responses - Rude, intimidating, clues, hints - Ex: Interviewer aka black people: "can you trust white people? 35% yes to a white reporter 7% said yes to a black reporter
Difference between matched pairs and block experiment?
Matched pairs is a block but divided into two (2 rats) EX: An experiment was conducted with 6 pairs of rats. Each rat in a pair came from the same litter. One rat from each pair was randomly chosen and assigned to live alone in a cage with no toys. The other rat in each pair was assigned to live with 11 other rats in a cage supplied with toys. After a month, the rats were sacrificed and their brain cortexes were weighed. The researchers were trying to show that a favorable psychological environment stimulates the growth of cortex material (grey matter) in the brain. What type of study is this? - this is Matched pairs
Where is the mean median mode on a skew?
Median will be between the mean and mode! (mean moves to the side with the skew, median middle, and mode the most frequent/highest bar)
Can standard deviation be negative?
NO! (its 1-0)
Do experiments have to have a placebo to be valid?
NO! (they just need a control/comparison)
What do we call facts about a population?
Parameter
Hawthorne effect
Phenomenon where people in an experiment behave differently because they know they are being watched (attention/observation bis) Ex: Nielsen TV ratings and TV watching behavior (If I know my TV is monitored Im probably not going to want them knowing I watch the bachelor) Ex: Diet study: the act of writing down your diet may change how people snack and eat - This causes inaccurate reporting because people's behavior changes! (like lack of realism but more caused by people's choice to act different rather than the situational controls being different)
What is the factor?
Planned explanatory variable - Polio example: children were inoculated with a placebo and with the real vaccine. The type of inoculation is the factor
How do we calculate outliers?
Q1 - 1.5 x IQR Q3 + 1.5 x IQR
Which of the following intervals corresponds to the largest area under a Normal curve? Q1 to Q3 μ to (μ + 3σ) Q1 to (μ + 2σ) (μ - σ) to Q3
Q1 to (μ + 2σ)
ASK TA?? Which of the following intervals corresponds to the LARGEST area on a Normal curve? Q1 to µ + 3σ or Q1 to Q3?
Q1 to µ + 3σ
Researchers want to compare the effectiveness of exercise and dieting compared to dieting alone for weight loss. They have 60 volunteers, 30 men and 30 women. They randomly assign half of the men to Group 1, exercise and diet, and the other half to Group 2, diet alone. They follow the same procedure for the women. Half of the women are assigned to Group 1 and the other half are assigned to Group 2. After 16 weeks, their weight loss was measured and compared. What type of study is this?
Randomized Block Experiment
Lack of realism
Realism is often compromised by controlled study conditions, choice of homogeneous subjects, application of treatments (scenario we create is causing this) - Ex: Patients in a clinical trial are watched and given pills under the guidance of a physician while most people will be home and not very active in treating themselves.
What is the difference between response and explanatory variable?
Response variable is what we want to know, explanatory variable is the cause in the cause effect relationship (can we determine age from height, age is the response, height is the explanatory)
What is multistage sampling?
Samples are taken from each level of a hierarchical structure ex: Educators in California are concerned about a recent newspaper article reporting that students in the United States are falling behind students in other nations in their math skills. They decide to sample 10th grade students throughout the state and test their mathematics skills. They first randomly select 10 school districts. From each of these 10 school districts they randomly select three high schools. From these 3 high schools they randomly select 10 students and test them. What type of sample is this?
Right skew vs. Left skew
Skew = this tail is stretched
The nonprofit group Public Agenda conducted telephone interviews with parents of high school children. Interviewers chose equal numbers of black, white, and Hispanic parents by randomly selecting from within each race using student records. One question asked was "Are the high schools in your state doing an excellent, good, fair, or poor job, or do you not know enough to say?" What type of sample is this?
Stratified
A popular magazine is interested in the average amount of time that their readers spend on the internet each day. They randomly survey 100 of their female readers and 100 of their male readers and ask them about their average internet use. What type of sample is this?
Stratified sample
Randomized Controlled Experiment (RCE)
Subjects assigned to treatments such that each subject has an equal chance of being assigned to any possible treatment (typically with the same number of subjects per treatment) - Ex: put names in a hat, shuffle, draw out desired number of names for treatment A - Ex: scientists can label rats with a number and randomize this way *A randomized controlled double-blind experiment is optimal for establishing causation
Bimodal distribution?
Two groups Ex: men and women's heights // Ages of students
What does the table give us?
The cumulative value TO THE LEFT! (less than probabilities)
Matched pairs design
The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study. - Twins EX: a car is tested with a clean filter and with a dirty filter - ONLY COMPARES TWO TREATMENTS
What is the z-score?
The number of standard deviations above or below the mean of a normal distribution.
Can animals be part of a double-blind experiment?
The rats don't know! double blinding refers to
When do we know to use RCE vs. RBD?
Use RCE when: individuals are all similar Use RBD when: individuals are similar within a block but very different block to block Is this drug as effective for older or younger people? Is this drug as effective for men as it is women? (ONLY THE RBD CAN ANSWER THESE QUESTIONS)
Explanatory variable (independent variable)
Used to explain or predict changes in the response variable ex: Rats are first tested without any drugs then treated with 5 mg of ephedrine. is the explanatory variable the dosage? NO it is wether they were treated or not
Lurking variable
Variables that affect response variable but no measures or included in planned factors
What does the stem plot help us see?
We can see individual values (turn your head to the right to see a histogram :) )
What happens if a proportion has a z-score greater than 1.5?
We cannot use the 68-95-99.7% rule! So we must use: - Standard normal table (table of standard normal probabilities) - Normal distribution function on statistical software
In what circumstances would we use a bar graph?
When we are plotting categorical data (comparing majors, it's not necessarily one is better or higher than the other)
Can Z scores be negative?
YES! a negative z score means that many deviations below the mean
An experiment was designed using school children to determine whether drinking milk prevented their catching colds. The researcher randomly assigned 100 school children to two groups—one group of 50 to receive a cup of milk at school each day and the other group of 50 to receive no milk at school. Does the study incorporate replication?
Yes, since there were 50 children in each treatment group.
An experiment was designed using school children to determine whether drinking milk prevented their catching colds. The researcher randomly assigned 100 school children to two groups—one group of 50 to receive a cup of milk at school each day and the other group of 50 to receive no milk at school. Does the study incorporate replication?
Yes, since there were 50 children in each treatment group. (Replication is not based on the total sample but on at least two subjects in each group)
Experiment
a study in which treatments are imposed upon participants (THIS IS NOT JUST AN OBSERVATION)
Categorical variable
a variable that names categories (whether with words or numerals) - Meaning they can't clearly be ordered but they are different and can be categorized (not numerical)
Question order
earlier questions can change the way respondents understand and answer later questions - Ex: happiness question precedes debt question (Do you consider yourself happy / How much debt do you have?)
Random phenomenon
individual outcome unpredictable, but outcomes from large number of repetitions follow regular pattern Ex: Tossing a coin 3 times
Double blinding
neither the subjects nor the people who evaluate them know which treatment each subject is receiving; used to prevent experimenter effect ***An experiment can still be valid without double blinding
Outliers
note: if it is skewed towards and close to a perceived outlier it might not be an outlier
Types of questions
open ended and closed ended Open: too many responses! (wide variety) closed: limit questions (do you like pop or hip-hop)
Placebo effect
response by human subjects due to the psychological effect of being treated - Psychological effect is confounded lurking variable Consequence: ineffective treatment appears effective relative to untreated subjects - Control group is essential to help us determine an accurate cause-effect relationship
Misleading response
selected individuals lie or give inaccurate answer - things about sexuality, cheating on a spouse, income, religion, political preference (Try to ensure a safe environment so they can answer honestly!)
Observational studies
studies that indicate relationships between nutrition habits, disease trends, and other health phenomena of large populations of humans - "are English majors healthier than math majors" - media often improperly attributes cause-effect conclusions to these - ask yourself was this an experiment or an observational study
The probability of an event can be defined as
the fraction of times the event will occur if the random phenomenon is repeated many times.
Law of large numbers
the larger the number of individuals that are randomly drawn from a population, the more representative the resulting group will be of the entire population (the closer it gets to the theoretical probability)