STA2014 - Chapter 1 : Data Collection

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

closed question

A closed question requires the respondent to choose from a list of predetermined​ responses.

continuous variable

A continuous variable is a quantitative variable that has an infinite number of possible values that are not countable.

parameter

A parameter is a numerical summary of a population.

presurvey

A presurvey could give the researcher an idea of what the most common responses are from a population. The researcher could then use these responses as the answers to closed questions in the actual survey.

prospective study

A prospective study collects the data over time.

qualitative variable

A qualitative variable allows for classification of individuals based on some attribute or characteristic.

sample

A sample is a subset of the population that is being studied.

inferential statistics

Inferential statistics uses methods that generalize results obtained from a sample to the population and measure the reliability of the results.

statistics

Statistics is the science of​ collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. In​ addition, statistics is about providing a measure of confidence in any conclusions.

countable

The term​ "countable" means that the values result from​ counting, such as​ 0, 1,​ 2, 3, and so on.

undercoverage error

Undercoverage bias occurs when the proportion of one segment of the population is lower in a sample than it is in the population.

types of nonsampling errors

Undercoverage, nonresponse​ bias, response​ bias, or​ data-entry errors are all types of nonsampling errors.

sampling without replacement

When sampling without​ replacement, once an individual is​ selected, the individual is removed from the possible choices for that sample and cannot be chosen again.

sampling bias

When the technique used to obtain the individuals to be in the sample tends to favor one part of the population over​ another, this is known as sampling bias.

cohort study

A cohort study first identifies a group of individuals to participate in the study​ (the cohort). The cohort is then observed over a long period of time. During this time​ period, characteristics about the individuals are recorded and some individuals will be exposed to certain factors​ (not intentionally) and others will not. At the end of the study the value of the response variable is recorded for the individuals.

confounding variable

A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study. The big difference between lurking variables and confounding variables is that lurking variables are not considered in the study whereas confounding variables are measured in the study.

designed experiment

A designed experiment is when a researcher assigns individuals to a certain​ group, intentionally changing the value of an explanatory variable, and then recording the value of the response variable for each group. A designed experiment allows the researcher to claim causation between an explanatory variable and a response variable.

discrete variable

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values.

lurking variable

A lurking variable is an explanatory variable that was not considered in a​ study, but that affects the value of the response variable in the study. In​ addition, lurking variables are typically related to explanatory variables in the study. A relation that appears to exist between a certain explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. These variables are called lurking variables.

nonresponse bias error

A nonresponse means that an individual selected for the sample does not respond to the survey. Nonresponse bias exists when individuals selected to be in the sample who do not respond to the survey have different opinions than those who do. Nonresponse bias can be controlled using callbacks. For​ example, if nonresponce occurs because a mailed questionnaire was not​ returned, a callback might mean phoning the individual to conduct the survey. If nonresponse occurs because an individual was not at​ home, a callback might mean returning to the home at other times in the day or on other days of the week. Using rewards and incentives is another method to improve nonresponse. Rewards may include cash payments. Incentives may include a cover letter that states that the responses to the questionnaire will determine future policy.

response rates

A possible advantage of offering rewards or incentives to increase response rates is that respondents put more effort into completely and accurately answering the survey questions because they feel obligated. A possible disadvantage of offering rewards or incentives to increase response rates is that the people interested in the rewards or incentives differ from the population in some way that is important to the​ study, causing biased results.

response bias error

A response bias is a question that is not balanced. That​ is, it is worded in such away to influence the response of those being surveyed. Response bias exists when the answers on a survey do not reflect the true feelings of the respondent.

retrospective study

A retrospective study requires that individuals look back in time or require the researcher to look at existing records.

statistic

A statistic is a numerical summary of a sample.

individual

An individual is a person or object that is a member of the population being studied.

observational study

An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables.

open question

An open question allows the respondent to choose his or her response.

confounding

Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. Confounding is potentially a major problem with observational studies. Often, the cause of confounding is a lurking variable.

descriptive statistics

Descriptive statistics consists of organizing and summarizing information collected.

cross-sectional vs case-control

Neither study is always the superior to the other. Both have advantages and disadvantages that depend on the situation. Both studies are inexpensive and can be done relatively quickly. A​ case-control study is limited in that it requires individuals to recall information​ correctly, and to answer questions truthfully. A​ cross-sectional study is limited in that it only gives information at a specific point in time or over a very short period of​ time, and might not contain valuable information that occurs outside of that point in time.

nonsampling error

Nonsampling error is the error that results from the process of obtaining the data.​

PRACTICE: Researchers wanted to determine if there was an association between the level of happiness of an individual and their risk of high blood pressure. The researchers studied 1546 people over the course of 88 years. During this 88​-year ​period, they interviewed the individuals and asked questions about their daily lives and the hassles they face. In​ addition, hypothetical scenarios were presented to determine how each individual would handle the situation. These interviews were videotaped and studied to assess the emotions of the individuals. The researchers also determined which individuals in the study experienced any type of high blood pressure over the 88​-year period. After their​ analysis, the researchers concluded that the happy individuals were less likely to experience high blood pressure.

Q1: What type of observational study was​ this? A1: This was a cohort study, because information was collected about a group of individuals by observing them over a long period of time. ​Q2: What is the response​ variable? A2: The response variable is whether or not high blood pressure was contracted, because it is the variable of interest. Q3: What is the explanatory​ variable? A3: The explanatory variable is level of happiness, because it affects the other variable. ​Q4: In the​ report, the researchers stated that​ "the research team also​ hasn't ruled out that a common factor like genetics could be causing both the emotions and the high blood pressure​." Explain what this sentence means. A4: The researchers may be concerned with confounding that occurs when the effects of two or more explanatory variables are not separated or when there are some explanatory variables that were not considered in a​ study, but that affect the value of the response variable.

PRACTICE: Researchers wanted to determine if there was an association between daily pomegranate consumption and the occurrence of high blood pressure. The researchers looked at 93,166 women and asked them to report their pomegranate​-eating habits. The researchers also determined which of the women had high blood pressure. After their​ analysis, the researchers concluded that consumption of two or more servings of pomegranate per day was associated with a reduction in high blood pressure.

Q1: What type of observational study was​ this? A1: This was a​ cross-sectional study because all information about the individuals was collected at a specific point in time. Q2: What is the response variable in the​ study? Is the response variable qualitative or​ quantitative? A2: The response variable is whether the woman has high blood pressure or not. The response variable is qualitative. Q3: What is the explanatory​ variable? A3: The explanatory variable is consumption of pomegranate. Q4: In their​ report, the researchers stated that​ "After adjusting for various demographic and lifestyle​ variables, daily consumption of two or more servings was associated with a​ 30% reduced prevalence of high blood pressure​." Why was it important to adjust for these​ variables? A4: The researchers may be concerned with confounding that occurs when the effects of two or more explanatory variables are not separated or when there are some explanatory variables that were not considered in a​ study, but that affect the value of the response variable.

sampling error

Sampling error is the error that results because a sample is being used to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.

PRACTICE: Suppose that a magazine predicted that Candidate A would defeat Candidate B in a certain election. They conducted a poll of telephone directories with a response rate of 23​%. On the basis of the​ results, the magazine predicted that Candidate A would win with​ 57% of the popular vote.​ However, Candidate B won the election with about​ 62% of the popular vote. At the time of this​ poll, most households with telephones belonged to the party of Candidate A. Name two biases that led to this incorrect prediction.

Sampling​ bias: Using an incorrect frame led to undercoverage. Nonresponse​ bias: The low response rate caused bias.

yields

Similar individuals will not necessarily yield the same data. People can be different​ heights, different weights, different​ ages, and so on. In addition to​ this, a​ person's age will change over time, and their height and weight can change over time as well. Measuring two​ people's heights at the same​ time, or measuring one​ person's height at different times can yield different results.

closed vs open question

Since closed questions limit the possible​ responses, they are easier to analyze. Open questions are harder to analyze due to the variety of answers and the chance of misinterpreting an answer. Closed questions are easier to​ analyze, but limit the responses. Open questions allow respondents to state exactly how they​ feel, but are harder to analyze due to the variety of answers and possible misinterpretation of answers.

population

The entire group of individuals to be studied is called the population.

frame

The frame is a list of the individuals in the population being studied. If the population of interest is all the students at a​ school, the frame would be a list of all the students currently attending that school. It is rare for frames to be accurate because frames are obtained​ periodically, whereas populations are constantly changing. For​ example, a frame that consists of all of the students in a school would be inaccurate as soon as any student leaves the​ school, or any new student joins the school.

simple random sampling

The most basic sample survey design is simple random sampling. A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample. To obtain a simple random​ sample, typically, each individual in the population is assigned a unique number between 1 and​ N, where N is the size of the population. Then n distinct random numbers from this list are​ selected, where n represents the size of the sample. To number the individuals in the​ population, one needs a​ framea list of all the individuals within the population.

response variable

The response variable is the variable of interest to be measured in the study. The value of the response variable is affected by the explanatory variable.

categories of observational​ studies

There are three major categories of observational​ studies: cross-sectional​ studies, case-control​ studies, and cohort studies.​

variables

Variables are the characteristics of the individuals within the population. If variables did not​ vary, they would be​ constants, and statistical inference would not be necessary.

under-represented

When a part of the population is proportionally smaller in a sample than in its​ population, this part of the population has been​ under-represented. This could be caused by many different types of​ bias, or even by random chance.

case-control studies

​Case-control studies are observational studies that are​ retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In​ case-control studies, individuals that have a certain characteristic are matched with those that do not. A disadvantage to this type of study is that it requires individuals to recall information from the past. Plus it requires the individuals to be truthful in their responses. An advantage of​ case-control studies is that they are relatively inexpensive to conduct and can be done relatively quickly.

cross-sectional studies

​Cross-sectional studies are observational studies that collect information about individuals at a specific point in time or over a very short period of time. For​ example, a researcher might want to assess the risk associated with smoking by looking at a group of​ people, determining how many are smokers and comparing the incidence rate of lung cancer of the smokers to the nonsmokers. A clear advantage of​ cross-sectional studies is that they are cheap and quick to do.​ However, cross-sectional studies have limitations. For the lung cancer​ study, it could be that individuals develop cancer after the data are​ collected, so the study will not give the full picture.


Ensembles d'études connexes

Essentials of Psychiatric Mental Health Nursing Chapters 1-24

View Set

Set12_news, congrats,what about,correct

View Set

PrepU Atraumatic care of children and families

View Set