Stats Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Which one of these random variables is discrete? Height of an adult male GPA Number of phone calls received in a day

# of phone calls

Choose the probability that best matches the following statement: "This event will occur slightly more often than not."

.6

Spread?

Find minimum and maximum

Does the Z score have units?

NO!

Sample space

Set of all possible outcomes (HHH, HHT, HTH, HTT, TTT, TTH, THT, THH)

The distribution of a random variable shows all possible values the random variable could take and how often they occur.

TRUE

What is the measure of center in the 5 number summary?

The median

What is the notation for the mean (center)

μ (mu) : density curve (population) x: Histogram (sample) - point of symmetry in a normal distribution

What is the notation for standard deviation?

σ : Density curve (population) s : Histogram (sample) - Distance from the mean to the point where curve begins to fall less steeply

How can we remember discrete and continuous variable?

- Continuous could have decimals or any number in any range - Discreet fits in an established limit (zip code, shoe size is 8 or 9 not 8.456, # of coins in my pocket I can't have 3.4 pennies)

3 tools to represent quantitative variables?

- Histogram - Stem plot - dot plot

Open questions

- More difficult to summarize - less restrictive

Why do we need RBD?

- This helps equalize the effects of lurking variables (we classify subjects into blocks based on lurking variables then assign subjects to treatments separately within each block Ex: Male rats, Explanatory V = 5 drugs, Response: Time to complete maze

Why do we plot data?

- We are visual beings - Helps us see outliers (did we miss-mark? A person 7000 inches tall when everyone else is 700) - ALWAYS PLOT YOUR DATA

Histogram

- bins need to be evenly spaced out - Each class has a range and the height will tell us how many - The area of the bar will also tell us **** (how ask TA) - 7-15 classes is reasonable

Non-compliance

- failure to submit to the assigned treatment, refusal to follow the protocol of the experiment Consequence: invalid results (using a volunteer might solve this problem because we are not forcing them)

Non-response

- occurs when an individual chosen for the sample can't be contacted or refuses to participate - EX: Hangups, refusal to mail census forms, vacation

Undercoverage

- occurs when some groups in the population are left out of the process of choosing the sample - EX: Homeless people, those without a phone will not be covered

Question wording

- the way in which survey questions are phrased, which influences how respondents answer them - loaded questions

Blocking should be used whenever

- you want to remove variation associated with the blocking variable from the experimental variation. - individuals are grouped before the experiment begins according to some characteristic that is expected to affect the response variable. -individuals are similar within the blocks but very different from block to block.

How to solve "Find area problems"

1. Area to the left = Use the chart after finding Z value and solve 2. Area to the right, find the value using the Z chart and subtract from 1. 3. Area between C and D = Subtract C from D (d being the value further to the right)

Principles of Valid Experiments

1. Control/Comparison (do we have two different treatments/or a control?) 2. Randomization (are we randomly assigning treatments?) 3. Replication (multiple subjects in each group?) 4. Double-Blinding (Does the subject know what the treatment is? Do the evaluators know) **this is optional

Principles of Data ethics

1. Safety of subjects 2. Informed consent of side-effects 3.Patient data must be kept confidential

"Given x-value find area to the right"

1. Standardize x to get z 2. Use z score on the table of values (the cumulative area) 3. This will give the whole area so now subtract by 1 to get the desired area

Blocking

Division of individuals into homogenous groups (males and females each tested) - Act as a control for variables

Was the salk vaccine ethical?

2nd graders given the vaccine it was unknown if it was effective, 2nd graders not given the vaccine were fine because researchers didn't know if it would help them. - Conclusion: it was ethical because after discovering that the vaccine was effective beyond doubt the study was aborted and the unvaccinated with polio were then given the vaccine

Probability of an outcome

A measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome.

Which of the following can be considered a population? Select all that apply. all students at BYU all pigeons in Utah some students from BYU's STAT121 class None of these

All students and all pigeons (some is not an entire population it would be the entire class or all students)

Randomized block design (RBD)

An experimental design where treatments are randomly allocated within each block. - Each block is divided into groups of similar characteristic (tend to be equal in number and treatment) - These help control the effects of variables that define the blocks

Replication

Assign more than one subject to each treatment group

When data are arranged in ascending numerical order, the mean is the value which __________________

Balances the data

response variable (dependent variable)

Characteristic measured on each subject

Closed questions

Closed questions can be biased by the options provided - make sure to include "other" or "unsure"

Suppose a researcher is interested in the average ACT score for high school students in Illinois. She randomly selects 150 high schools and then asks each student in the selected high schools what their ACT score was. What kind of sample is this?

Cluster

What is the difference between cluster sampling and stratified sampling?

Clustering: Random sample of clusters, then sample ALL INDIVIDUALS Stratified: Classify a population into groups (gender/race) and then pull a simple random sample some from EVERY GROUP

Event

Collection of possible outcomes (subset of the sample space) ex: Getting two heads when a coin is tossed three times (HHT, HTH, THH)

Control/comparison

Control lurking variables by including comparison treatments, using homogeneous subjects; used to measure placebo effect

Terms continued

Control: effort to reduce lurking variables confounding: lurking variables cannot be distinguished from effects of factors

Where do we cut a distribution?

Cut it so the area on each side is equal (splitting the cake)

Diagnostic bias

Diagnosis of subjects biased by preconceived notions about effectiveness of treatment - Preconception is confounded by lurking variable - Bias on the side of the evaluator (I believe vitamin C is really good so I might discount some minor effects of a sickness to prove what I want to be the outcome so we need a double blind to avoid this)

Probability samples are samples selected in such a way that

Each member of the population has a chance of being selected and that chance can be computed.

How do we calculate z-score?

Ex: u = 3485 g and o = 425g, a baby who weighs 41225.5g is 1.5 standard deviations above the mean 1. The given value from the sample is the value of x 2. The given u is the mean 3. The standard deviation is on the bottom therefore telling us how many standard deviations we are away from the mean

Treatment

Experimentation value applied to subject = value of factor

Experiment terms

Factor: the controlled variable used by experimenters used to determine the response on response variable

What is the standard deviation rule ? (99.7% rule)

For normally distributed data: - 68% of observations fall within 1 standard deviation of the mean - 95% of observations fall within 2 standard deviations of the mean - 99.&% of observations fall within 3 standard deviations of the mean

Randomization

Neutralize effects of lurking variables by assigning subjects to treatments randomly (ex: subjects do not choose what diet they will have etc.) ***Random assignment is the key not just random selection of subjects, we often use volunteers for experiments

Why wouldn't we want a ton of classes for our histograms?

If it is to specific it might add a lot of noise/choppy histogram - 10 class picture for example would te

How do we compare mean and median?

If there is no skew they are the same, if there is a skew the mean will slide closer to whichever side the skew is (mean is more effected by outliers)

What advantage do histograms have over stem plots?

If we have thousands of inputs a stem plot would be hideous

Subject

Individual to which treatment is applied

Which of the following is NOT a measure of center of the data?

Interquartile range (MEAN IS A MEASURE OF CENTER)

Interviewer

Interviewer influences responses - Rude, intimidating, clues, hints - Ex: Interviewer aka black people: "can you trust white people? 35% yes to a white reporter 7% said yes to a black reporter

Difference between matched pairs and block experiment?

Matched pairs is a block but divided into two (2 rats) EX: An experiment was conducted with 6 pairs of rats. Each rat in a pair came from the same litter. One rat from each pair was randomly chosen and assigned to live alone in a cage with no toys. The other rat in each pair was assigned to live with 11 other rats in a cage supplied with toys. After a month, the rats were sacrificed and their brain cortexes were weighed. The researchers were trying to show that a favorable psychological environment stimulates the growth of cortex material (grey matter) in the brain. What type of study is this? - this is Matched pairs

Where is the mean median mode on a skew?

Median will be between the mean and mode! (mean moves to the side with the skew, median middle, and mode the most frequent/highest bar)

Can standard deviation be negative?

NO! (its 1-0)

Do experiments have to have a placebo to be valid?

NO! (they just need a control/comparison)

What do we call facts about a population?

Parameter

Hawthorne effect

Phenomenon where people in an experiment behave differently because they know they are being watched (attention/observation bis) Ex: Nielsen TV ratings and TV watching behavior (If I know my TV is monitored Im probably not going to want them knowing I watch the bachelor) Ex: Diet study: the act of writing down your diet may change how people snack and eat - This causes inaccurate reporting because people's behavior changes! (like lack of realism but more caused by people's choice to act different rather than the situational controls being different)

What is the factor?

Planned explanatory variable - Polio example: children were inoculated with a placebo and with the real vaccine. The type of inoculation is the factor

How do we calculate outliers?

Q1 - 1.5 x IQR Q3 + 1.5 x IQR

Which of the following intervals corresponds to the largest area under a Normal curve? Q1 to Q3 μ to (μ + 3σ) Q1 to (μ + 2σ) (μ - σ) to Q3

Q1 to (μ + 2σ)

ASK TA?? Which of the following intervals corresponds to the LARGEST area on a Normal curve? Q1 to µ + 3σ or Q1 to Q3?

Q1 to µ + 3σ

Researchers want to compare the effectiveness of exercise and dieting compared to dieting alone for weight loss. They have 60 volunteers, 30 men and 30 women. They randomly assign half of the men to Group 1, exercise and diet, and the other half to Group 2, diet alone. They follow the same procedure for the women. Half of the women are assigned to Group 1 and the other half are assigned to Group 2. After 16 weeks, their weight loss was measured and compared. What type of study is this?

Randomized Block Experiment

Lack of realism

Realism is often compromised by controlled study conditions, choice of homogeneous subjects, application of treatments (scenario we create is causing this) - Ex: Patients in a clinical trial are watched and given pills under the guidance of a physician while most people will be home and not very active in treating themselves.

What is the difference between response and explanatory variable?

Response variable is what we want to know, explanatory variable is the cause in the cause effect relationship (can we determine age from height, age is the response, height is the explanatory)

What is multistage sampling?

Samples are taken from each level of a hierarchical structure ex: Educators in California are concerned about a recent newspaper article reporting that students in the United States are falling behind students in other nations in their math skills. They decide to sample 10th grade students throughout the state and test their mathematics skills. They first randomly select 10 school districts. From each of these 10 school districts they randomly select three high schools. From these 3 high schools they randomly select 10 students and test them. What type of sample is this?

Right skew vs. Left skew

Skew = this tail is stretched

The nonprofit group Public Agenda conducted telephone interviews with parents of high school children. Interviewers chose equal numbers of black, white, and Hispanic parents by randomly selecting from within each race using student records. One question asked was "Are the high schools in your state doing an excellent, good, fair, or poor job, or do you not know enough to say?" What type of sample is this?

Stratified

A popular magazine is interested in the average amount of time that their readers spend on the internet each day. They randomly survey 100 of their female readers and 100 of their male readers and ask them about their average internet use. What type of sample is this?

Stratified sample

Randomized Controlled Experiment (RCE)

Subjects assigned to treatments such that each subject has an equal chance of being assigned to any possible treatment (typically with the same number of subjects per treatment) - Ex: put names in a hat, shuffle, draw out desired number of names for treatment A - Ex: scientists can label rats with a number and randomize this way *A randomized controlled double-blind experiment is optimal for establishing causation

Bimodal distribution?

Two groups Ex: men and women's heights // Ages of students

What does the table give us?

The cumulative value TO THE LEFT! (less than probabilities)

Matched pairs design

The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study. - Twins EX: a car is tested with a clean filter and with a dirty filter - ONLY COMPARES TWO TREATMENTS

What is the z-score?

The number of standard deviations above or below the mean of a normal distribution.

Can animals be part of a double-blind experiment?

The rats don't know! double blinding refers to

When do we know to use RCE vs. RBD?

Use RCE when: individuals are all similar Use RBD when: individuals are similar within a block but very different block to block Is this drug as effective for older or younger people? Is this drug as effective for men as it is women? (ONLY THE RBD CAN ANSWER THESE QUESTIONS)

Explanatory variable (independent variable)

Used to explain or predict changes in the response variable ex: Rats are first tested without any drugs then treated with 5 mg of ephedrine. is the explanatory variable the dosage? NO it is wether they were treated or not

Lurking variable

Variables that affect response variable but no measures or included in planned factors

What does the stem plot help us see?

We can see individual values (turn your head to the right to see a histogram :) )

What happens if a proportion has a z-score greater than 1.5?

We cannot use the 68-95-99.7% rule! So we must use: - Standard normal table (table of standard normal probabilities) - Normal distribution function on statistical software

In what circumstances would we use a bar graph?

When we are plotting categorical data (comparing majors, it's not necessarily one is better or higher than the other)

Can Z scores be negative?

YES! a negative z score means that many deviations below the mean

An experiment was designed using school children to determine whether drinking milk prevented their catching colds. The researcher randomly assigned 100 school children to two groups—one group of 50 to receive a cup of milk at school each day and the other group of 50 to receive no milk at school. Does the study incorporate replication?

Yes, since there were 50 children in each treatment group.

An experiment was designed using school children to determine whether drinking milk prevented their catching colds. The researcher randomly assigned 100 school children to two groups—one group of 50 to receive a cup of milk at school each day and the other group of 50 to receive no milk at school. Does the study incorporate replication?

Yes, since there were 50 children in each treatment group. (Replication is not based on the total sample but on at least two subjects in each group)

Experiment

a study in which treatments are imposed upon participants (THIS IS NOT JUST AN OBSERVATION)

Categorical variable

a variable that names categories (whether with words or numerals) - Meaning they can't clearly be ordered but they are different and can be categorized (not numerical)

Question order

earlier questions can change the way respondents understand and answer later questions - Ex: happiness question precedes debt question (Do you consider yourself happy / How much debt do you have?)

Random phenomenon

individual outcome unpredictable, but outcomes from large number of repetitions follow regular pattern Ex: Tossing a coin 3 times

Double blinding

neither the subjects nor the people who evaluate them know which treatment each subject is receiving; used to prevent experimenter effect ***An experiment can still be valid without double blinding

Outliers

note: if it is skewed towards and close to a perceived outlier it might not be an outlier

Types of questions

open ended and closed ended Open: too many responses! (wide variety) closed: limit questions (do you like pop or hip-hop)

Placebo effect

response by human subjects due to the psychological effect of being treated - Psychological effect is confounded lurking variable Consequence: ineffective treatment appears effective relative to untreated subjects - Control group is essential to help us determine an accurate cause-effect relationship

Misleading response

selected individuals lie or give inaccurate answer - things about sexuality, cheating on a spouse, income, religion, political preference (Try to ensure a safe environment so they can answer honestly!)

Observational studies

studies that indicate relationships between nutrition habits, disease trends, and other health phenomena of large populations of humans - "are English majors healthier than math majors" - media often improperly attributes cause-effect conclusions to these - ask yourself was this an experiment or an observational study

The probability of an event can be defined as

the fraction of times the event will occur if the random phenomenon is repeated many times.

Law of large numbers

the larger the number of individuals that are randomly drawn from a population, the more representative the resulting group will be of the entire population (the closer it gets to the theoretical probability)


Ensembles d'études connexes

13. Palabras con el sonido que, qui y con la letra c

View Set

Chapter 1 - The Accounting Profession

View Set