Week 4- Quiz 4 - Causation
Nonbiased sample-
Cluster, stratified and simple random sample
Recruit participants for a study. While they are presumably waiting to be interviewed, half of the individuals sit in a waiting room with snacks available and a TV on. The other half sit in a waiting room with snacks available and no TV, just magazines. Researchers determine whether people consume more snacks in the TV setting.
Experiment assigning each individual to watch tv.
Noncompliance ((Pitfalls of experimentation)
Failure to submit to the assigned treatment - could enter in on such a large scale as to render the results invalid.
Poll a sample of individuals with the following question: While watching TV, do you tend to snack: (a) less than usual; (b) more than or usual; or (c) the same amount as usual?
Sample survey
Recruit participants for a study. Give them journals to record hour by hour their activities the following day, including when they watch TV and when they consume snacks. Determine if snack consumption is higher during TV times.
This is an observational study because participants determine whether or not to watch tv. No attempt on researcher's part to interfere.
Blind
Eliminates bias
Case QQ
In the SAT States, increasing the percentage doesn't seem to decrease the scores In the ACT states, the negative trend does seem to become more pronounced. The combination - forms the decreasing trends we see in the graph. One final example. In this case the observed association is not reversed when we consider the lurking variable but considering the lurking variable helps us better understand the overall trend. Case Q→Q: We examine the relationship using: Display: scatterplot. When describing the relationship as displayed by the scatterplot, be sure to consider: Overall pattern → direction, form, strength. Deviations from the pattern → outliers. Labeling the scatterplot (including a relevant third categorical variable in our analysis), might add some insight into the nature of the relationship. Case C→C: Exploring the relationship amounts to comparing the distributions of the categorical response variable, for each category of the explanatory variable. To do this, we use: Display: two-way table. Numerical measures: conditional percentages (of the response variable for each value (category) of the explanatory variable separately).
How to control for lurking variables
Randomizing subjects into our treatments in experiments In observational studies, we can control for lurking variables by measuring them and stratifying our analysis - or adjusting for them in other ways in more advanced methods.
5: Obtain a student directory with email addresses of all the university's students, and send your music poll to a simple random sample of students.
Simple Random Sample Individuals are just as likely to be selected As long as the students respond, they are not subject to any bias, and should succeed in being representative of the population of interest.
Obtain a student directory with email addresses of all the university's students, and send the music poll to every 50th name on the list.
Systematic Sampling Starting from a randomly chosen individual in the ordered sampling frame, select every i-th individual to be included in the sample.
They key to establishing causation is
The key to establishing causation is to rule out the possibility of any lurking variable, or in other words, to ensure that individuals differ only with respect to the values of the explanatory variable.
Case QQ
The lurking variable helps us understand the overall trend.
Designed experiments have
subjects and factors (explanatory variables)
Hawthorne Effect (Pitfalls of experimentation)
A change in a subject's behavior caused simply by the awareness of being studied
Lurking Variable
A lurking variable is a variable with an important effect on the outcome which is not included among explanatory variables under consideration. It may be known or unknown! It is the unknown lurking variables that are of the most concern in any statistical analysis.It can separate the data points into two groups.
Biased
A sample that produces data that is not representative because of the systematic under- or over-estimation of the values of the variable of interest is called biased. Bias may result from either a poor sampling plan or from a poor design for evaluating the variable of interest.
Designing Studies
Carry out an observational study, in which values of the variable or variables of interest are recorded as they naturally occur. There is no interference by the researchers who conduct the study. Take a sample survey, which is a particular type of observational study in which individuals report variables' values themselves, frequently by giving their opinions. Perform an experiment. Instead of assessing the values of the variables as they naturally occur, the researchers interfere, and they are the ones who assign the values of the explanatory variable to the individuals. The researchers "take control" of the values of the explanatory variable because they want to see how changes in the value of the explanatory variable affect the response variable. (Note: By nature, any experiment involves at least two variables.)
Multistage sampling
Complex form of cluster sampling. Another commonly used sampling technique is multistage sampling, which is essentially a "complex form" of cluster sampling. When conducting cluster sampling, it might be unrealistic, or too expensive to sample all the individuals in the chosen clusters. In cases like this, it would make sense to have another stage of sampling, in which you choose a sample from each of the randomly selected clusters, hence the term multistage sampling.
Simpson's Paradox
Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation. While the overall relationship between X and Y is negative, the relationship within the two groups created by this lurking variable (represented by the blue and red groups), is positive. Therefore, including this lurking variable in our exploration does change the apparent direction of the relationship between X and Y. Recall that an instance of Simpson's paradox occurs whenever including a lurking variable causes you to rethink the direction of an association. In this activity, we will reinforce our understanding of Simpson's paradox.
How is control used in different senses?
Control for a confounding variable Controlled experiment stresses that the values of the experiment's explanatory variables (factors) have been assigned to researchers, as opposed to having occurred naturally.
What is this an example of? Stand outside the Student Union, across from the Fine Arts Building, and ask the students passing by to respond to your question about musical preference.
Convenience sample This is an example of a convenience sample, where individuals happen to be at the right time and place to suit the schedule of the researcher. Depending on what variable is being studied, it may be that a convenience sample provides a fairly representative group. However, there are often subtle reasons why the sample's results are biased. In this case, the proximity to the Fine Arts Building might result in a disproportionate number of students favoring classical music. A convenience sample may also be susceptible to bias because certain types of individuals are more likely to be selected than others. In the extreme, some convenience samples are designed in such a way that certain individuals have no chance at all of being selected, as in the next example.
Randomized Response
Effective techniques for collecting accurate data on sensitive questions are a main area of inquiry in statistics. One simple method is randomized response, which allows individuals in the sample to answer anonymously, while the researcher still gains information about the population. This technique is best illustrated by an example.
Producing Data
In the first step of the statistics "Big Picture," we produce data. The production of data has two stages. First we need to choose the individuals from the population that will be included in the sample. Then, once we have chosen the individuals, we need to collect data from them. Sampling and Study Design
Nonresponse
Individuals selected to participate do not respond or refuse to participate.
Retrospective observational studies
Investigator identifies cases that already have certain outcomes (the response variable) and looks back into their past to see what may be explaining these outcomes (the explanatory variable) The values of the variables of interest are recorded backwards in time
Case CQ
Lurking variable is confounded by the explanatory variable nationality. An observed association between two variables is not enough evidence that there is a causal relationship between them. Case C→Q: Exploring the relationship amounts to comparing the distributions of the quantitative response variable for each category of the explanatory variable. To do this, we use: Display: side-by-side boxplots. Numerical measures: descriptive statistics of the response variable, for each value (category) of the explanatory variable separately.
How can randomization enter an experiment
Prevents bias select a random sample to determine who will participate . Once we have our sample, subjects are randomly assigned to treatments in order to minimize the impact of lurking variables and to have a greater chance to establish causal relationships between our treatment and the response of interest to the researcher. Minimizes the effect of any important lurking variables, by evenly distributing subjects.
randomized controlled double-blind experiment
The most reliable way to determine whether the explanatory variable is actually causing changes in the response variable. Depending on the variables of interest, such a design may not be entirely feasible, but the closer researchers get to achieving this ideal design, the more convincing their claims of causation (or lack thereof) are.
Sample Surveys
Sample surveys are occasionally used to examine relationships, but often they assess values of many separate variables, such as respondents' opinions on various matters. Survey questions should be designed carefully, in order to ensure unbiased assessment of the variables' values. Observational type of study in which respondents assess variables' values (often by giving an opinion) Open questions are less restrictive, but responses are more difficult to summarize. Closed questions may be biased by the options provided. Closed questions should permit options such as "other:______" and/or "not sure" if those options may apply. Questions should be worded neutrally. Earlier questions should not deliberately influence responses to later questions. Questions shouldn't be confusing or complicated. Survey method and questions should be carefully designed to elicit honest responses if there are sensitive issues involved.
Designed experiments best way to prove causality True or false
Showing an association or relationship between two variables is easy, determining the causal nature of the observed association is much more difficult.
Causation QQ
Shown the depiction of number of firefights and the amount of damage but we did not observe the cause Case Q→Q: We examine the relationship using: Display: scatterplot. When describing the relationship as displayed by the scatterplot, be sure to consider: Overall pattern → direction, form, strength. Deviations from the pattern → outliers. Labeling the scatterplot (including a relevant third categorical variable in our analysis), might add some insight into the nature of the relationship. In the special case that the scatterplot displays a linear relationship (and only then), we supplement the scatterplot with: Numerical measures: Pearson's correlation coefficient (r) measures the direction and, more importantly, the strength of the linear relationship. The closer r is to 1 (or -1), the stronger the positive (or negative) linear relationship. r is unitless, influenced by outliers, and should be used only as a supplement to the scatterplot. When the relationship is linear (as displayed by the scatterplot, and supported by the correlation r), we can summarize the linear pattern using the least squares regression line. Remember that: The slope of the regression line tells us the average change in the response variable that results from a 1-unit increase in the explanatory variable. When using the regression line for predictions, you should beware of extrapolation. When examining the relationship between two variables (regardless of the case), any observed
Which type of study?
Study the effect on bone mass of a calcium supplement given to young girls (observational study) Study the effect of new medicine for heart disease? (Experimental) Study the effect of lung capacity (Observation) Study the amount of time college students take a full course load spent watching TV- observational A study randomly assigned volunteers to one of the two groups 1) directed to use social media sites as we usually do, 2) blocked from social media sites. The researchers looked at which group tended to be happier- Exeperimental A study took a random sample of people and examined their social media habits. Each person was classified as either, light, moderate, or heavy social media user. The researchers looked at which groups tended to be happier.
Observational Studies
The explanatory variable's values are allowed to occur naturally. Because of the possibility of lurking variables, it is difficult to establish causation. If possible, control for suspected lurking variables by studying groups of similar individuals separately. Some lurking variables are difficult to control for; others may not be identified.
Experiments
The explanatory variable's values are controlled by researchers (treatment is imposed). Randomized assignment to treatments automatically controls for all lurking variables. Making subjects blind avoids the placebo effect. Making researchers blind avoids conscious or subconscious influences on their subjective assessment of responses. A randomized controlled double-blind experiment is generally optimal for establishing causation. A lack of realism may prevent researchers from generalizing experimental results to real-life situations. Noncompliance may undermine an experiment. A volunteer sample might solve (at least partially) this problem. It is impossible, impractical, or unethical to impose some treatments. Tagged as: Blind, CO-3, Control for Lurking Variables, Control Group, Double-Blind, Ex
Sampling
The sampling needs to be representative of the population of interest. Ex: Zoloft effective for teenagers- would not be good to only sample the teens who have been admitted to the hospital, because their depression may be more severe.
Prospective observational studies
The values of the variables of interest are recorded forward in time.
How is control used in different contexts?
We can control for confounding variables by stratifying our analysis in an observational study We can perform a controlled experiment where researchers assign the treatments as opposed to having them occur naturally We can have a control group in a designed experiment for the purpose of comparison to the treatment under study.
Cluster Sampling
Used when "natural" groupings are evident in a statistical population and each group is generally representative of the population. In this technique, the total population is divided into these groups (or clusters) and a sample of these groups is selected. For example randomly selecting courses from all courses and surveying ALL students in selected courses.
Volunteer Response
Volunteer response is not as problematic as a volunteer sample (presented in example 1 above), but there is still a danger that those who do respond are different from those who don't, with respect to the variable of interest.
What is this an example of? 1: Post a music-lovers' survey on a university Internet bulletin board, asking students to vote for their favorite type of music.
Volunteer sample Where individuals have selected themselves to be included Such a sample is almost guaranteed "to be biased. - people who an opinion about the issue, or are looking to voice it. We cannot generalize to any large group at all.
Stratified Sampling
When subpopulations within an overall population vary, it can be advantageous to take samples from each subpopulation (stratum) independently. For example, take a random sample of males and a separate random sample of females.
Lack of Realism (Pitfalls of experimentation)
When the treatments, the subjects, or the environment of an experiment are not realistic. Lack of realism can limit researchers' ability to apply the conclusions of an experiment to the settings of greatest interest. One of the greatest advantages of an experiment — that researchers take control of the explanatory variable — can also be a disadvantage in that it may result in a rather unrealistic setting. Lack of realism (also called lack of ecological validity) is a possible drawback to the use of an experiment rather than an observational study to explore a relationship. Depending on the explanatory variable of interest, it may be quite easy or it may be virtually impossible to take control of the variable's values and still maintain a fairly natural setting.
Double Blind
an observation whose true purpose is hidden from both the observer and the person being observed If neither the subjects nor the researchers know who was assigned what treatment, then the experiment is called double-blind.
Design
design for producing data must be considered carefully. Studies should be designed to discover what we want to know about the variables of interest for the individuals in the sample.
placebo effect
experimental results caused by expectations alone; any effect on behavior caused by the administration of an inert substance or condition, which the recipient assumes is an active agent. When patients improve because they are told they are receiving treatment, even though they are not actually receiving treatment, this is known as the placebo effect.
Lurking variables explain for
why association does not explain causation. Help us gain a deeper understanding of the observed relationship. Can change the severity of a situation in the hospital example.
Cluster vs. Stratified Sampling
•For strata we almost always sample from every strata •For clusters we almost always randomly select only a few clusters to sampl