Stats Exam 1

Ace your homework & exams now with Quizwiz!

(Split) Stem Plot

0 3 0 8 1 24 1 779 2 01123 2 567899 3 3 59 2 0 2 223 2 455 2 66677 2 8888999 3 00011 3 222333 3 445 3 67 3 9

Principles of Valid Experiments

1. Control/Comparison 2. Randomization 3. Replication 4. Double-Blinding

Study Designs

1. Observational Study 2. Sample Survey 3. Experiment

Computing the Median (n Even)

1. Order the data 2. If the number of observations is even, median equals the mean of two middle observations

Computing the Median (n Odd)

1. Order the data 2. If the number of observations is odd, median equals the middle observation

Three Principles of Experiments

1. Randomly assign the two treatments to the two individuals within each pair (block) OR randomize the order of applying the treatments to each individual 2. Replication equals the number of pairs 3. Compare the two treatments. Each pair serves as its own control

Skills of Metacognition

1. assess task of learning statistics 2. evaluate your strengths and weaknesses 3. plan an approach to your learning 4. monitor your performance 5. reflect and adjust plan

Reasons to avoid bad sampling

1. bias • sample favors certain outcomes • not representative 2. impossible to assess uncertainty • more on this later

Correct Sampling Requirements

1. explicitly describe population 2. explicitly describe variable 3. select representative sample (but how?)

For Experiments to Work

1. explicitly describe response variable 2. if possible, choose homogeneous subjects 3. choose treatments to control effects of lurking variables (but how?) 4. assign subjects to treatments such that groups nearly identical other than treatments - no confounding (but how?)

Mean vs. Median General advice:

1. first construct histogram or stem plot, evaluate skewness and outliers 2. use median if markedly skewed or outliers are present 3. use mean if roughly symmetric

Reasons for Possible Outliers

1. if distribution is long-tailed and value is legitimate: • keep outlier 2. if values produced under different conditions than rest of data set: • remove outlier 3. if value is a mistake or typo: • correct if possible; otherwise remove outlier

What is Statistics?

1. science of extracting meaning from data 2. art of persuading the universe to divulge information about itself 3. methodology for using data to answer questions in the presence of variation

Cluster Sampling

1. used when population is naturally divided into groups called clusters. (e.g., households are divided into city blocks) 2. each cluster is essentially representative of the population as a whole 3. a random sample of clusters is taken 4. all individuals in the selected clusters are included in the sample

Stem Plot

7 5 8 05555 9 0000000000055555555555555 10 000000000000000000555555555555555 11 000000000000000000000055555 12 0000000 13 000 14 0 15 0

IQR Calculation

= range occupied by middle 50% of data = 3rd quartile - 1st quartile (more on this later)

Self-check In a famous study, 5200 patients were categorized into 2 groups according to their soda habit. After 4-years of follow-up, the rate of heart disease was higher in the "regularly drank" group than the "sparingly drank" group. What kind of study is this? (a) historical comparison experiment (b) unreplicated experiment (c) confounded experiment (d) observational study

???

Self-check Five men in a room have a mean height of 70 inches. A tall man, 80 inches, enters the room. Now the mean height is: (a) impossible to say (b) 70.4 inches (c) 71.7 inches (d) 75.1 inches

??? I think A

Self-check 22 5 12 13 59 What is M for the data set above? (a) 12 (b) 12.5 (c) 13 (d) 27

A

Self-check 240 subjects are available for an experiment testing the effects of different diets. Software randomly assigns 60 subjects to Diet 1, 60 subjects to Diet 2, 60 subjects to Diet 3, and 60 subjects to Diet 4. What type of study is this? (a) a randomized controlled experiment (b) a randomized block design, with four blocks (c) a matched pairs design (d) an observational study (e) none of the above

A

Self-check In a famous randomized vitamin C study, most patients could tell from taste whether they were receiving vitamin C pills or placebo pills. The rate of cold/flu was lower in the vitamin C group. What do you conclude? (a) vitamin C reduces the cold/flu rate (b) nothing - the difference could be due to vitamin C or a placebo effect

A

Self-check Which is better? (a) random sample of size 400 (b) a nonrandom sample of size 5000

A

Sample

A subgroup of the population which we can examine or observe and collect data from.

Outliers

Ask: • Is data point miscoded? • Were conditions for outlier unusual? • Should data point be excluded?

Self-check A given data set has Q1 = 25 Median = 37 Q3 = 45 Use the IQR rule to determine if the following statement is true or false: "73 is an outlier in this data set." (a) True (b) False

B

Self-check Gallup has never been wrong in predicting the winner of a presidential election. (a) True (b) False

B

Self-check Gallup's predictions have always been more accurate when the sample has been larger. (a) True (b) False

B

Self-check Subjects in an experiment knew that they were being observed, so they behaved better than they usually did. This is an example of: (a) diagnostic bias (b) Hawthorne effect (c) placebo effect (d) Simpson's paradox

B

Self-check To participate in the Time.com poll, go to the Time website and click. What kind of sample is Time.com poll? (a) convenience (b) volunteer response (c) quota

B

Self-check What are the treatments in the Salk vaccine experiment? (a) syringe, school nurse (b) polio, vaccine (c) polio status (d) vaccine, placebo

B

Self-check What is an advantage of histograms over stem plots? (a) they can be created by hand (b) the data set can be any size (c) the actual data can be extracted from them (d) they can be horizontal or vertical

B

Self-check What treatments are being compared in the Visual Cliff experiment? (a) Whether infants crawled to their mothers when called (b) Whether mothers stood on the checkered side of the table (c) Whether the table had a checkered pattern (d) Whether the observer was in the room

B

Self-check Who are the subjects in the Salk vaccine experiment? (a) 400,000 children who participated in the study (b) 200,000 children who received the vaccine (c) second-grade American children (d) all American children

B

Self-check What is the dogma of statistics? (a) numbers are fun (b) variation has to be dealt with (c) uncertainty must be avoided (d) statistics don't lie

B( & C?)

Self-check A stemplot of the 29 measurements made by Henry Cavendish in 1798 when he measured the density of the earth (in g/cm3) is shown here. What is the median value of his measurements? (Leaf unit=0.01) (a) 5.42 (b) 5.44 (c) 5.46 (d) 5.47 Variable : Cavendish 48 : 8 49 : 50 : 7 51 : 0 52 : 6799 53 : 04469 54 : 2467 55 : 03578 56 : 12358 57 : 59 58 : 5

C

Self-check An automobile salesman tells you that he gets a bonus if you report on a post-sale survey that he was effective and courteous. What kind of bias might be present in this survey? (a) nonresponse (b) undercoverage (c) misleading response (d) no bias

C

Self-check In a study of religious practices among U.S. college students, 127 students were interviewed. Of those interviewed, 107 said that they pray at least once in a while. What is the population? (a) All Americans (b) All U.S. students (c) All U.S. college students (d) The 127 students who were interviewed (e) The 107 students who said they prayed at least once in a while

C

Self-check The IRS obtains a sample of Utah tax returns by taking a random sample of Utah counties and then taking a random sample of returns filed in those counties. What kind of sample is this? (a) simple random sample (b) stratified random sample (c) multistage sample (d) cluster sample

C

Self-check What is the population in the potato example? (a) all potatoes in the world (b) all potatoes in the U.S. (c) all potatoes in the truckload (d) all potatoes in the buckets

C

Self-check What is the response variable in the Salk vaccine experiment? (a) type of inoculation (b) polio, vaccine (c) polio status (d) vaccine, placebo

C

Producing Data

Choosing a sample, and collecting data from it.

Incorrect Sampling Methods

Convenience Sampling Volunteer Response Sampling Quota Sampling

Self-check In a study of religious practices among U.S. college students, 127 students were interviewed. Of those interviewed, 107 said that they pray at least once in a while. What is the sample in this study? (a) All Americans (b) All U.S. students (c) All U.S. college students (d) The 127 students who were interviewed (e) The 107 students who said they prayed at least once in a while

D

Self-check What does the distribution of textbook costs tell you? (a) who the students were (b) how the students were selected (c) that the costs were measured in dollars (d) how frequently the various costs occurred in the data set

D

Self-check What is the factor in the Salk vaccine experiment? (a) type of inoculation (b) vaccine (c) placebo (d) polio status

D

Self-check What type of discipline is statistics? (a) an art (b) a science (c) a methodology (d) all of the above (e) none of the above

D

Self-check 200 students were assigned to teaching method A or B. The method A group was taught by Dr. R. while the method B group was taught by Dr. T. What kind of study is this? (a) historical comparison experiment (b) unreplicated experiment (c) confounded experiment (d) observational study

D (???)

First Quartile of Distribution

Data contains at least 25% of distribution

Second Quartile of Distribution

Data contains at least 50% of distribution

Third Quartile of Distribution

Data contains at least 75% of distribution

Data Set

Data identified with contextual information In a table: rows = individuals columns = variables

Diagnostic bias

Diagnosis of subjects biased by preconceived notions about effectiveness of treatment

Self-check Do cars get better gas mileage with clean air filters? Gas mileage for 10 cars with dirty air filters and clean air filters was studied. Each car was tested once with a clean air filter and once with a dirty air filter (with the order of the testing randomized). What type of study is this? (a) an observational study based on a simple random sample (b) an observational study based on a stratified random sample (c) an observational study based on a multistage random sample (d) a randomized controlled experiment (e) a matched pairs experiment

E

Quota Sampling

Force the sample to meet specified quotas

Types of Bad Experiments

Historical Comparison Experiments Unreplicated Experiments Confounded Experiments

Volunteer Response Sampling

Individuals select themselves

IQR

Interquartile Range

Interviewer

Interviewer influences responses Examples: • rude • intimidating to some people • subtle clues or gestures

Lesson 5 START

Lesson 5 START

Lesson 6 START

Lesson 6 START

Lesson 7 START

Lesson 7 START

Lesson 8 START

Lesson 8 START

Metacognition

Literally "thinking about thinking"

Center

Look for a value with • roughly half of the data to the left and • half to the right

Calculate Range

Max - Min

Replication

Multiple subjects for a given treatment

Types of Questions in Sample Surveys

Open Questions Closed Questions

Question Order

Order of questions promotes certain responses

Experimental Design Principles

Principle #1: Comparison Principle #2: Randomization Principle #3: Replication Principle #4: Double-blinding

Correct Sampling Methods

Probability Sampling

Non-Sampling Bias

Probability samples may still have bias due to: • undercoverage • non-response • misleading response • interviewer effect • question order • question wording

Stratified Random Sample

Quota sampling done right! 1. classify population into groups (strata) that are different from each other (e.g., classify according to age or gender) 2. individuals within a group (stratum) share a similar characteristic (e.g., all are males or all are children) 3. select SRS from every group 4. combine SRS's

Placebo Effect

Response by human subjects due to the psychological effect of being treated

Simple Random Sample (SRS)

Sample of specified size chosen such that every possible set of that size has equal chance of being the sample

Convenience Sampling

Select individuals in easiest possible way

Misleading Response

Selected individuals lie or give inaccurate answer (sensitive issues)

Non-Response

Selected individuals refuse to answer or can't be contacted

Undercoverage

Some individuals have no possibility of being selected

Population

The entire group of individuals that is the target of our interest

Spread

The full 'Range' Look for • minimum and • maximum

Uncertainty

The unknown regarding relationships of linked subjects

Flat or Uniform

Where there is no 'hill'

Question Wording

Wording of question leads, misleads, or confuses

Experiment Definition

a study design where treatments are imposed on individuals before observing response

Define: Sample Survey (Poll)

a type of observational study in which individuals report variables' values themselves, frequently by giving their opinions

Terminology Control

an effort to reduce effects of lurking variables

Individual

an entity that is observed e.g., student, person, rat, classroom, plot of ground

Replication

assign more than one subject to each treatment group

Response variable

characteristic measured on each subject

Variable

characteristic that is measured

Variable

characteristic that is measured on each individual e.g., cost, height, yield, opinion

Process of Statistics

collect data summarize data interpret data

Why Sample?

compared to census: • practical • cheap • often more accurate!

Control/Comparison

control lurking variables by including comparison treatments, using homogeneous subjects; used to measure placebo effect

Statistic

corresponding numerical fact in the sample

Experiment Purpose

determine if treatments cause change in response

Population

entire group of individuals of interest

Treatment

experimental condition applied to subject = value of factor

Subject

individual to which treatment applied

Sample

individuals that are selected from the population and measured

Data

measurements for a set of individuals e.g., textbook costs for the sample of students

Flagging possible Outliers

min outliers = Q1 − 1.5(IQR) max outliers = Q3 + 1.5(IQR)

Double Blinding

neither the subjects nor the people who evaluate them know which treatment each subject is receiving; used to prevent experimenter effect

Randomization

neutralize effects of lurking variables by assigning subjects to treatments randomly

Parameter

numerical fact about the variable in the population

Hawthorne effect

phenomenon where people in an experiment behave differently from how they would normally behave; attention/observation bias

Factor

planned explanatory variable

Bias Due to Question Wording Occurs when...

questions have leading phrases, loaded words, or ambiguities that influence the response.

Lack of realism

realism is often compromised by controlled study conditions, choice of homogeneous subjects, application of treatments

Confounding

situation in which effects of lurking variables cannot be distinguished from effects of factors

Multistage Sample

take sample at each level: e.g., 1. SRS of states 2. for selected states, SRS's of counties 3. for selected counties, SRS's of people 4. combine SRS's of people

Purpose: Sample Survey (Poll)

use sample fact in place of population fact • e.g., use sample mean as (uncertain) estimate of population mean

Explanatory variable

used to predict or explain changes in the response variable

Measurement:

value of a variable for an individual e.g., textbook cost for Nathan

Quantitative Variable

variable whose possible values are meaningful numbers e.g., cost, height, yield

Categorical Variable

variable whose possible values are non-quantitative categories e.g., gender, opinion

Lurking variables

variables that affect response variable but no measures or included in planned factors 19 / 36

Matched Pairs

• Examples: • Twins: each receiving a treatment • Two treatments on each individual • Measurements before and after treatment on each individual

Valid Experimental Designs

• Randomized Controlled Experiment: subjects randomly assigned to treatments • Randomized Block Design (RBD) • matched pairs, a special case of RBD

Dogma of Statistics

• always variation • variation leads to uncertainty • converting data into useful information requires understanding and dealing with variation/uncertainty

• collect data • summarize data • exploratory data analysis • inference • distribution of a variable

• collect data: Get data from a population sample • summarize data: Turn data into useful information • exploratory data analysis: 5 Steps on card listed 2 ago • inference: Assumptions made about the general population based on the • distribution of a variable: the values of the variables and how often they occured

Noncompliance

• failure to submit to the assigned treatment • refusal to follow the protocol of the experiment

Why Experiment? Compared to observational study:

• no confounded lurking variables • can validly draw cause-effect conclusions

Exploratory Data Analysis

• organize and summarize data • discover: features, patterns, striking deviations from patterns • interpret patterns in context • single variable patterns (distribution) and two variable patterns (relationship) • visual displays and numerical summaries

Pitfalls in Experimentation Randomized comparative experiments may still have problems (What are they)

• placebo effect • diagnostic bias • lack of realism • Hawthorne effect • noncompliance

Principles of Data Ethics

• safety and well-being of the subjects must be protected • all individuals must give their informed consent before data are collected • individual data must be kept confidential

Observational Studies

• subjects choose which treatment to receive or naturally belong to one of the treatment groups • lurking variables that influence choice confounded with treatments • passive data collection: observing, measuring, counting, subjects are undisturbed • media often improperly attribute cause-effect conclusions to these


Related study sets

Statistics chapter 15-Chi-Square test

View Set

Battles of the Revolutionary War

View Set

World History(The Crusades Notes)

View Set

AP Euro Semester One Final Review (Early 19th Century Test)

View Set