Stats- Summer Notes

Ace your homework & exams now with Quizwiz!

Population of interest:

-the entire collection of individuals or objects about which information is desired

Blocked experiment

-used when researchers know or suspect another variable (other than experiment that influences response) -group individuals into blocks based on identified variable and THEN randomize eg: drug on heart attacks: split patients into low-risk and high-risk blocks, then randomly assign half patients from each block to control and treatment group

Stratified sampling

-used when we know that there is something associated with quantity we want to estimate. 1. divide population into strata based on similar cases 2. then, sampling method (simple random sampling) used to select certain number eg: good for grouping MLB players by salary and making sure we have adequate representation for each team.

Numerical value

-wide range of numerical values, but intervals in between are equally spaced

General process of investigation:

1. Identify a question or problem 2. Collect relevant data on the topic 3. Analyze the data 4. Form a conclusion

Well conducted experiments are built on three principles

1. direct control 2. randomization 3. replication

Multistage sampling

1. take a cluster sample 2. random selected clusters 3. simple random sampling is used WITHIN each selected cluster

Response Bias

-broad range of factors that influence how person responds -Question wording, order, influence of interviewer -Can even be present in census.

Ordinal

-categorical variable but levels have natural ordering -eg: economic status: low, medium, high, rating something from 1-10

Variable

-characteristic we measure for each individual and case

Observational study

-collecting data without interfering how data arises eg: collecting data via surveys, records, or follow cohort -only sufficent to show associations, not for causal conclusions

What happens when there are more than one variable at a time

-combinations Eg: volume has two levels, (soft and loud), type (dance, classical, punk). -Experiment would be carried out in 6 combinations. -Each combination is a treatment.

Experiment

-conducted when researchers want to investigate possibility of causal connection -scientists must impose a treatment -if randomized well, then can prove causation

Discrete value

-countable in a finite moment, fo time. (eg population in 2010)

Categorical

-eg states -one or more categories, but not intrinsic ordering to categories

Non-response bias

-eg: if only 30% of people respond- that might not be representative of entire population. (can skew results)

Randomization

-evens out differences & prevents accidental bias

Selection bias

-happens when individuals of a population are inherently more likely to be included in sample than others.

Systematic sampling

-if you have convenient list of all individuals of the population -number each person, and select a person every 10 or n number or so.

Replication

-making sure experiment can be replicated to study more cases and prevent bias.

Double-blind

-more common today, where researchers that interact with patients don't know who is receiving treatment

Simple random samping

-most intuitive form -assigning number to each person, randomly generating number to use in sample -Following are true: 1. Each case in population has equal chance of being included in sample 2. Each group of n cases has an equal chance of making up the sample.

Continuous value

-not countable in a finite amount of time. (may take forever) -time, age

Statistic

-numerical summary based on sample.

Parameters

-overall quantity of interest: eg: mean, median, proportion, some other summary of a population. -we estimate value of parameter by taking a sample and computing a numerical study

Volunteer sample

-people's responses are solicited and those who choose to participate respond. -Problem b/c those who choose to participate may have different opinions than rest of the population.

Types of observational studies

-prospective: identifies individuals and collects as events unfold -retrospective study: collects data after events have taken place (eg retails sales, country populations)

Direct control

-researchers do their best to control for any other differences in groups -eg; how much water with the pill, what time of day

Single-blind

-researchers don't let patients know what group they're in, and give some groups a placebo

Cluster sampling

-sampling technique that randomly selects GROUPS of people. -most helpful when there is a lot of case-to-case variability WITHIN a cluster, BUT clusters don't look very different from one another. -Eg: if neighborhood represented clusters, cluster sampling works best if each neighborhood is very diverse.

Types of sampling

-simple random sampling -systematic sampling -stratified sampling -cluster sampling -multistage sampling

Completely randomized experiment

-subjects randomly assigned to each group -assigning subjects unique numbers, 1-100 would get treatment 1, 101-200 would get treatment 2

Convenience sample

individuals who are easily accessible are more likely to be included in sample. (location, etc.)

p

proportion in context of a population

Case

an individual about whom or which we have data

On a large college campus first-year students and sophomores live in dorms located on the eastern part of the campus and juniors and seniors live in dorms located on the western part of the campus. Suppose you want to collect student opinions on a new housing structure the college administration is proposing and you want to make sure your survey equally represents opinions from students from all years. What type of study is this? Suggest a sampling strategy for carrying out this study.

Observational study. Sampling strateg: stratified sampling: divide into two groups (or four) and randomly select a few students from each strata.

Matched pairs

Pairs of ppl are matched on as many variables as possible -special kind of block experiment, blocks are sizes of two. -comparison happens between similar cases

Identify population of interest and the sample in the studies described. The Buteyko method is a shallow breathing technique developed by Konstantin Buteyko, a Russian doctor, in 1952. Anecdotal evidence suggests that the Buteyko method can reduce asthma symptoms and improve quality of life. In a scientific study to determine the effectiveness of this method, researchers recruited 600 asthma patients aged 18-69 who relied on medication for asthma treatment. These patients were split into two research groups: one practiced the Buteyko method and the other did not. Patients were scored on quality of life, activity, asthma symptoms, and medication reduction on a scale from 0 to 10. On average, the participants in the Buteyko group experienced a significant reduction in asthma symptoms and an improvement in quality of life.

Population: all 18-69 year olds diagnosed and currently treated for asthma w/ medication. Sample: 600 patients 18-69 diagnosed and currently treated for asthma.

Identify population of interest and the sample in the studies described. (a) Researchers collected data to examine the relationship between pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM10) in ug/m^3. Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM 10 and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

Population: all births in Southern California Sample: 143,196 births in Southern California b/w 1989-1993

State the scope of conclusions of the study: Researchers randomly sampled 100 high school students and randomly assigned them into two study groups. Throughout the school year, one group was told to study in a room with a TV on while the other was told to study in silence. At the end of the year the researchers compared the grade point averages of the two groups and found that the mean grade point average of students who did not watch TV while studying was significantly higher than the grade point average of students who did watch TV while studying.

Since this is an experiment, causal relationships can be concluded. -since highschool students are randomly sampled and randomly assigned to groups, the conclusion can be generalized to the public.

25 Stressed out, Part II In a study evaluating the relationship between stress and muscle cramps, half the subjects are randomly assigned to be exposed to increased stress by being placed into an elevator that falls rapidly and stops abruptly and the other half are left at no or baseline stress. a. What type of study is this? b. Can this study be used to conclude a causal relationship between increased stress and muscle cramps?

a. experiment b. yes, because it is an experiment. since patients are randomly assigned, then conclusion can be generalized.

Relaxing after work The 2010 General Social Survey asked the question, "After an average work day, about how many hours do you have to relax or pursue activities that you enjoy?" to a random sample of 1,155 Americans. The average relaxing time was found to be 1.65 hours. Determine which of the following is an observation, a variable, a sample statistic, or a population parameter. a. An american in the sample. b. Number of hours spent relaxing after an average work day. c. 1.65 d. Average number of hours all Americans spend relaxing after an average work day

a. observation b. variable c. sample statistic d. population parameter

Suppose you want to estimate the percentage of videos on YouTube that are cat videos. It is impossible for you to watch all videos on YouTube so you use a random video picker to select 1000 videos for you. You find that 2% of these videos are cat videos. Determine which of the following is an observation, a variable, a sample statistic, or a population parameter. a. Percentage of all videos on YouTube that are cat videos b. 2 c. A video in your sample d. Whether or not a video is a cat video

a. population parameter b. sample statistic c. observation d. variable

22 Sampling strategies A statistics student who is curious about the relationship between the amount of time students spend on social networking sites and their performance at school decides to conduct a survey. Various research strategies for collecting data are described below. In each, name the sampling method proposed and any bias you might expect. a. He randomly samples 40 students from the study's population, gives them the survey, asks them to fill it out and bring it back the next day. b. He gives out the survey only to his friends, making sure each one of them fills out the survey. c. He posts a link to an online survey on Facebook and asks his friends to fill out the survey. d. He randomly samples 5 classes and asks a random sample of students from those classes to fill out the survey. e. He stands outside the student center and asks every third person that walks out the door to fill out the survey.

a. simple random sampling b. convenience sampling c. convenience sampling d. multi-stage sampling (not stratified because he randomly samples 5 classes THEN samples students in each) e. systematic sampling

21. City council survey A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments, and others a diverse mixture of housing structures. Identify the sampling methods described below, and comment on whether or not you think they would be effective in this setting. a. Randomly sample 50 households from the city. b. Divide the city into neighborhoods, and sample 20 households from each neighborhood. c. Divide the city into neighborhoods, randomly sample 10 neighborhoods, and sample all households from those neighborhoods. d. Divide the city into neighborhoods, randomly sample 10 neighborhoods, and then randomly sample 20 households from those neighborhoods. e. Sample the 200 households closest to the city council offices.

a. simple random sampling; This is usually an effective method as it assigns equal probability to each household to be picked. b. stratified sampling; This is an effective method in this setting since neighborhoods are unique and this method allows us to sample from each neighborhood. c. cluster sampling;This is not an effective method in this setting since the resulting sample will not contain households from certain neighborhoods and we are told that some neighborhoods are very different from others. d. multistage sampling; This method will suffer from the same issue discussed in part (c). e. convenience sampling; This is not an effective method since it will result in a biased sample for households that are similar to each other (in the same neighborhood) and the sample will not contain any houses from neighborhoods far from the city council offices.

Simple random sample

equivalent of using a raffle -each case in population has equal chance of being included -no implied connection b/w cases in sample

Identify (i) the cases, (ii) the variables and their types, and (iii) the main research question in the studies described below. (a) Researchers collected data to examine the relationship between pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM10) in ug/m^3. Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM 10 and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

i. Case: the 143,196 births in Southern California, 1989-1993 ii. Variables: Length of gestation data (continuous, response), levels of carbon monoxide (discrete, explanatory), nitrogen dioxide (discrete, explanatory), ozone (discrete, explanatory), and coarse particulate matter (discrete, explanatory). All are numerical variables. iii.Research question: Is there an association between air pollution exposure and preterm births?

Identify (i) the cases, (ii) the variables and their types, and (iii) the main research question in the studies described below. The Buteyko method is a shallow breathing technique developed by Konstantin Buteyko, a Russian doctor, in 1952. Anecdotal evidence suggests that the Buteyko method can reduce asthma symptoms and improve quality of life. In a scientific study to determine the effectiveness of this method, researchers recruited 600 asthma patients aged 18-69 who relied on medication for asthma treatment. These patients were split into two research groups: one practiced the Buteyko method and the other did not. Patients were scored on quality of life, activity, asthma symptoms, and medication reduction on a scale from 0 to 10. On average, the participants in the Buteyko group experienced a significant reduction in asthma symptoms and an improvement in quality of life.

i. Case: the 600 asthma patients ages 18-69 that are currently taking medication for asthma treatment ii. Variables: the practice of Buteyko method (categorical, explanatory), quality of life (ordinal, response), activity (ordinal, response), asthma symptoms (ordinal, response), medication reduction (ordinal, response). iii.Research question: Is there an association between practice of the Buteyko method and the quality of life, asthma symptoms, and/or medication reduction?

μ stands for...

mean of a population

Is μ a parameter or statistic? What about p-hat?

mu is population average, so it is a parameter. p-hat is sample proportion, so it is a statistic

p-hat

sample proportion

x-bar

the sample mean


Related study sets

Health and Illness II Exam One Objectives

View Set

Quiz: Module 06 Wireless Networking

View Set

Georgia Property and Casualty Practice Test Review Questions

View Set

Chapter 55: Ecosystem and Restoration Ecology

View Set

SPANISH (NAPLES,MADRID,SEVILLE), RUBENS, FLEMISH, DUTCH

View Set

NUR 3420 Pharmacology PrepU Chapter 49

View Set