MITx: Data Collection & Management

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Which of the following performance strategies are NOT advisable Be approachable Ensure prompt payments End-of-survey round bonuses "Piece-rate" compensation (paying per survey completed) Providing surveyors with the potential for professional development (if possible)

"Piece-rate" compensation (paying per survey completed) Explanation As Asman mentions, it is not advisable to pay by survey as it may evoke negative motivation. For example, a surveyor may falsify data to show a higher number of 'completed' surveys to secure additional compensation

CAPI vs PAPI selection:

- Complexity of survey - Time to launch - How quickly do I need incoming data - Skill / training of enumerators - Hardware / infrastructure availability - Connectivity / security - Budget

Surveyor instruction guidelines:

- Include precise definitions of all key concepts - Clear instructions (when answers should be prompted) - Clearly distinguish between instructions and what should be read out - Select one / multiple choices

Advantages of CAPI resulting in less errors:

- Less errors due to logic checks and automated skip patterns - Quicker access to data for monitoring and quality control - Advanced survey auditing tools (GPS, audio recording, timer etc.) - Dynamic data loading across devices / surveys / questionnaires - Dynamic choice options

Survey settings details:

1. Determine target respondent (most knowledgeable person) 2. Consent / Informed consent in case of children 3. Location and comfort 4. External presences / social pressure

Match the survey team member to his/her role in the team 1. Enumerator 2. Editor 3. Tracker 4. Supervisor 5. Field Manager

1. Enumerator Asks questions to respondents and records responses 2. Editor Checks if questionnaire is complete and if there are any errors before sending it to the back office 3. Tracker Identifies and executes strategies to interview respondents who are unavailable in the initial visit correct 4. Supervisor Manages people who collect data and checks for the quality of data collected 5. Field Manager Responsible for recruiting training and daily management of the team

Questionnaire design phases

1. Indicator of interest 2. Draft 3. Pilot 4. Revise draft ( and pilot again / goto 3) 5. Finalize

Modes of data collection in survey:

1. Paper based 2. Digital

Surveyor management details:

1. Recruitment 2. Training 3. Incentives 4. Management

Steps in drafting a questionnaire:

1. Start with previous surveys 2. Question and response wording 3. Organization / flow of questions and responses 4. Formatting

Modes of survey:

1. Surveyor administered: Higher response rate 2. Self administered

Suppose you need a team of 50 surveyors. How many individuals would you put through a surveyor training, given the recommendations from the lecture? 35 50 60 65 100 200 Don't know

65. As Asman mentions, we recommend training 30% more individuals than needed for the survey team. So for a team of 50, it is recommended to train around 65 individuals.

The main reason we break a question down into as many components as possible is: To identify if any word or words may have alternate meanings To identify leading questions To identify questions that are potentially sensitive

A Explanation As Marc mentions, it is useful to break the sentence into individual words - its subject, its object, the verb, and so on- to ensure each question we ask (and each response given) is unambiguous. Breaking the sentence down, and almost trying to break the intended meaning of the sentence, will help us find many of the pitfalls that could lead to ambiguity.

When conducting data entry operations in-house, we should always invest in a power backup solution (e.g. Uninterrupted Power Supply or UPS) so that, at a minimum, We have enough time to save whatever data has been entered and we can safely close the program Data entry can occur uninterrupted during work hours Data entry can continue at least one hour after a power cut Data entry can occur around the clock

A Explanation If we need electricity for 24 hours/day or 10 hours/day, we want to select a location that has a reliable power supply, and should not rely on self-generated power, or a power backup. The power back up is primarily to ensure we do not lose data we've already entered.

What are the benefits of outsourcing data entry operations? (Select all that apply) We would not need to invest in hardware (computer, power backups, etc) We would not need to invest time in learning how to program software (assuming we have no prior knowledge) We can spend less time or effort on testing data entry software We can be assured that data entry will be of high quality

A, B Explanation While we can save investments in time and money on space and learning the software, mistakes may be less transparent to us, so we need to invest more time to test the software ourselves. Also, data entry firms are not always accustomed to clients who require such a low error rate. So we need to be even more diligent about tracking quality.

When collecting data with paper surveys, what data entry options might be available? (select all that apply) Manual data entry of data from paper surveys Scanning paper surveys and having data entered remotely Optical character recognition None of the above

A, B, C Explanation Prathap listed all three of these as methods that can be used for entering information from paper surveys.

When publishing data, the code book should be created: Before data are collected Before each round of data collection begins After each round of data collection After the dataset is final, before data publication After data are published

After the dataset is final, before data publication

Which of the following are context effects (Select all that apply)? Anchoring Bias Framing Effect Recency Effect Primacy Effect Telescoping Bias

Anchoring Bias Framing Effect Recency Effect Primacy Effect Explanation Telescoping bias is a measurement error. The other listed above are affected by contextual factors such as question or response order, survey setting, etc.

As recommended by Asman: if we need a minimum of 50 surveyors for our data collection, how many (prospective) surveyors should there be at each stage? (Match the stage with the appropriate number) Position hired (assuming there may be attrition) Applications received Trained Tested

Applications received = 10x = 500 Tested = 5x = 250 Trained = requirement + 30% = 65 Positions hired = requirement + 20% = 60

What is the main reason it is unadvisable to pay data entry operators (DEOs) per survey entered? It increases the likelihood that respondents' identities are compromised It encourages speed over quality Many are motivated by career advancement rather than their pay Many are intrinsically motivated, and take pride in their work It encourages DEOs to enter data slowly

B Explanation Piece rate compensation encourages DEOs to go fast, not slow. It should not affect PII. In fact, in an ideal setup, DEOs do not have access to most PII. While there are other factors that motivate DEOs in their work, paying per survey explicitly incentivizes volume of surveys over accuracy.

Match each form of software check to the appropriate activity and purpose. Entering mock values to ensure responses can be entered Entering mock values to ensure skips work properly Checking that batteries have a long-enough life for field work Checking that data entered are stored accurately

Bench Bench Device Data flow Explanation Bench testing ensures the skips and validations are working Device testing is meant to check whether the hardware and software function. Data flow testing is to ensure the integrity of data is maintained

If a survey contains sensitive questions, it's often best (select all that apply) : To ensure the respondent has the support of family members by encouraging them to be present during the survey That the respondent is aware of the surveyor's position of authority (education, socio-economic status) relative to the respondent That gender dynamics between the surveyor and respondent are considered when the surveyors are recruited That the survey be conducted in a neutral location such as a community center or public market

C Explanation When engaging in the social interaction of a survey, we're usually more likely to get honest, accurate responses if the interviewee is comfortable. We also need to be aware of the power dynamics between the respondent and his or her family members, or other members of the community. Having other people around could put our respondent in danger of harm if for example she answer questions in a way that could anger the men in her household. In the middle there's the risk that her responses may not be accurate if others pressure her to answer a certain way either consciously or subconsciously. In such cases, it's usually wise to have a strategy on how to deal with family, other company, onlookers or crowds. It is also useful to consider gender dynamics when interviewing. For example, when asking about sexual behavior of a female respondent, it is best if she is interviewed by a female surveyor.

Which of the following (set of) questions is most leading i.e. likely to bias answers in a certain direction? (Choose one) Do any of your children currently attend school regularly? Would you be willing to pay for an insecticide-treated bed net? [If yes] How much would you be willing to pay? How strongly do you agree with the administration's plan to improve the nation's health insurance?

C. Explanation The question "How strongly do you agree with the administration's plan to improve the nation's health insurance?" is leading. The question assumes that the respondent agrees with the administration's plan to improve health insurance.

In terms of time investment, CAPI is front-loaded and PAPI is more back-loaded PAPI is front-loaded (i.e., more work up front) and CAPI is more backloaded (i.e., more work at the end) Both take the same amount of time

CAPI is front-loaded and PAPI is more back-loaded Explanation It takes time to build the questionnaire (including skips, validations, audio audits etc.) on CAPI, but since there is no data entry phase - quality checks and data analysis is almost instantaneous. In that sense, CAPI is more front loaded and less back-loaded

CAPI vs PAPI quality control, data cleaning and availability:

CAPI: - Automated flow of control - Automated logic checks, skip patterns, prefilling info - Secret recording for later audit and review - Data cleaning can begin at the end of first day - Revisits easy to rectify the faults (within 24 hours) - Data available right away (within 1-2 day) PAPI: - Needs extra step for a rigorous review process after data collection - Typically needs "Scrutinizers" to review completed questionnaires and catch enumerator errors - Correction can take long and be expensive - Data cleaning begins after data entry is completed (typically delay in months) - Data is available only after entry and cleaning (takes months)

CAPI vs PAPI cost, technology, software and initial setup logistics:

CAPI: - High cost of starting as computer and tablets needed initially - Requires coding the survey template - Data entry cost is avoided here - Different software options available (with range of pricing) - No printing required (considerable cost saving) - Corrections are easy PAPI: - Low initial cost - Does not require coding the survey template - Computer needed for data entry later (high cost later) - For data entry software, some options available - Printing required (expensive) - Corrections are cumbersome, requiring more printing

CAPI vs PAPI enumerator training:

CAPI: - Skip patters and field checks automated, enumerators can be trained quickly. - If enumerators have no experience with digital devices, training may take longer PAPI: - Complex and numerous skip patterns difficult to master, training takes a longer.

CAPI vs PAPI survey logistics:

CAPI: - Typically transported by USB or over internet and stored at a server - Needs encryption and other precaution to ensure data in secure during transportation and rest PAPI: - Secure transport system needed to transport surveys from field to office and then securely stored at office. - Secure digital storage post data entry

CAPI vs PAPI data entry:

CAPI: - Data is digitized on field and available instantly. PAPI: - Leads to delay - Data entry is template needed. - Trained data entry operator needed - Requires double entry and correction checks

CAPI vs PAPI template:

CAPI: - Needs to be designed before data collection - Longer time to field - Template loaded on device on field - Data recorded directly on device PAPI: - Need to be designed before data collection - Shorter time to field - Data entry template needed before data entry can start

CAPI vs PAPI question types:

CAPI: - Supports all types of responses. - More options for rich media - Can capture additional data (GPS, audio recording, barcode, drawing, signature) PAPI: - Supports single / multiple choice and write in types of responses - Requires separate device for additional data like GPS, camera - Can capture signature, drawing

With respect to cost, hiring a survey firm is: More expensive than conducting survey in-house Less expensive than conducting survey in-house Can be either more or less expensive

Can be either more or less expensive Explanation While survey firms often charge more (on overhead) to make a profit, they are also able to reduce costs along other dimensions for example, they can afford to pay surveyors lower salaries because in return, they can offer surveyors more long-term job security.

Validation:

Check for consistencies and bounds In paper survey, as instructions or stops / surveyor checks In digital surveys, as "constraints" (logical / cross logic conflict)

If a skip pattern is ignored, and questions that should have been skipped are filled out, this is an error of: Assimilation Omission Attrition Commission

Commission Skip patters can result in two types errors - errors of commission, where a skip should have been followed but wasn't and errors of omission where a question that should not be skipped is not asked.

Why is it important to include special codes, and why in the form of: -999, -888, etc? (Select all that apply) We need to be able to distinguish between different respondents who happen to have the same name It allows our data entry operators to enter data more quickly If some questions are updated mid-survey round, it allows us to know which version was given to the respondent Values may be missing for different reasons with different interpretations We do not want missing values to be within the normal range of valid responses None of the above

D, E Explanation Special codes such as -999, etc, are for missing values. We want to know whether a value is missing because a respondent not knowing the answer suggests a very different interpretation from refusing to answer. Using a negatives allows us to not confuse this missing response with real data (eg age). The other options relate to unique IDs and questionnaire design.

After entering our "audit" sample, to which dataset should we compare the resulting audited data? The dataset from the first entry The dataset from the second entry The list of errors we found after comparing the first and second entry The reconciled dataset from the first two entries after correcting for errors

D. The reconciled dataset from the first two entries after correcting for errors Explanation Double data entry is part of the data entry process. When we want to know the data entry error rate of our "dataset," we care about the latest, cleanest dataset we have.

One benefit of PAPI (as compared to CAPI) is: No need to invest in software development There is less opportunity for measurement error It is easier for surveyors to follow instructions The paper trail makes it easier to catch surveyor error Changes or corrections to the survey can be deployed more quickly Data collection can potentially begin sooner because there is no need to invest in learning CAPI software

Data collection can potentially begin sooner because there is no need to invest in learning CAPI software Explanation Identifying errors, making changes or corrections is harder in PAPI surveys. However, as Chris mentions, PAPI allows us to get to the field quickly when compared to CAPI which requires more time investment upfront.

Which process is used for creating digital data collection interface but not for creating manual data entry interface (Select all that apply) Creation of a variable list (survey specification) Extensive testing of the software (e.g. with real or mock data) Device testing (e.g. battery life, extreme conditions) Check resulting data to ensure no data are missing after entry None of the above

Device testing (e.g. battery life, extreme conditions) Explanation Digital data collection is a actually a form of data entry (where data are collected and answered simultaneously). The only difference is that the devices used are different. In digital data collection devices are mobile. In manual data entry, the computers typically have a continual power source.

What problem(s) can you identify with the following question: "In the coming election, do you support Party A and not Party B?" (Select all that apply) Double barreled Leading question Use of double negatives Jargon-filled

Double barreled Leading question Explanation The question is double barreled, it refers to both Party A and B and assumes that if one supports party A, they are not likely to support party B, which may not be true. It also creates an impression that the surveyor wants the respondent to support party A.

If the research team never comes into contact with individuals in the study, they do not need to get IRB approval to use the individuals' administrative data. True False

False.

Which data collected is typically used in analysis? (select all that apply): Final and intermediate outcomes Personally identifiable information Unique IDs Covariates to improve precision Baseline characteristics for sub-group analysis Compliance, and predictors of compliance Context and local conditions Process, implementation Behavioral responses

Final and intermediate outcomes Covariates to improve precision Baseline characteristics for sub-group analysis Compliance, and predictors of compliance Context and local conditions Process, implementation Behavioral responses Explanation Data collected on personally identifiable information and unique IDs should not be used for analysis. All other data collected can be used for analysis.

What is the correct sequence of events when recruiting the survey team?

First level screening Application review Testing Letters of reference Training Final hiring and contracting

Validations, Conditions, or Constraints, are meant to prevent or correct (Select all that apply) Minor typos (such as recording someone's age as 25 rather than 24) Illogical responses (such as answering "Red" when a number response is expected) Quantities that are unrealistic (such as visiting the school 100 times in the past week) Response options that are overlapping or non-exhaustive (for example daily, weekly, or monthly) Selecting the wrong response option (such as very satisfied rather than satisfied)

Illogical responses (such as answering "Red" when a number response is expected) Quantities that are unrealistic (such as visiting the school 100 times in the past week) Validations are used to ensure that responses are not illogical or unrealistic. Marc uses the examples of age of the respondent and entering a month as "13" as examples here. Validation checks cannot be used to check for enumerators errors such as typos or entering an incorrect response.

Advantages of in person (PAPI/CAPI) versus automated survey (CAWI/CATI/SMS/IVR/):

In person has the least attrition rate among all. CAWI: Computer assisted web interface (requires literate and extremely motivated respondents) CATI: Computer Assisted Telephone Interview (Enumerator interviews over telephone and enters data into computer) IVR: Interactive voice response SMS: Short Messaging Service

What problem(s) can you identify with the following question: "Do you not agree that it is not ok to smoke cigarettes?" (Select all that apply) Double barreled Leading question Use of negatives Jargon-filled

Leading question Use of negatives Explanation This question has too many negatives and can be leading.

If the surveyor records a respondent's response incorrectly on the questionnaire, at what point will this likely be caught in the data entry process First entry Second entry Comparison of first and second entry Comparison of list of discrepancies to original survey None of the above

None of the above The double data entry process is designed to catch data entry error, not surveyor error.

Which of the following is NOT typically part of the questionnaire piloting stage (Select all that apply) Focus groups Surveyor debriefing Surveyor training Data analysis False launch None of the above

None the above All options listed are part of the questionnaire piloting phase

The tendency to over-report responses at the beginning of a list of response options is known as: (Choose one) Recall bias Primacy effect Contrast effect Telescoping bias Assimilation effect Recency Effect

Primacy Effect Explanation As Marc mentions, research has shown that questions with a large set of response options, those responses either at the beginning or end are disproportionately more likely to be chosen. This is known as a primacy or recency effect. Primacy is when the response options at the beginning of a list are disproportionately chosen. Recency is when options at the end of a list are chosen.

Which documents are included in the "manual" for the published data? (Select all that apply) The master do file The folder structure ReadMe file Code book Analysis code None of the above

ReadMe file Code Book

In digital data collection case management, assigning a "case" to a "user" means the same thing as assigning: A questionnaire to a surveyor A questionnaire to a response unit Response unit to a surveyor None of the above

Response unit to a surveyor Explanation Each case refers to a respondent (response unit) with a unique ID. A user (surveyor) is the person who uses the digital survey tool to elicit responses. So, assigning case to a user means assigning a respondent to a surveyor.

Which of the following can be true about the head-of-household? (Select all that apply) Not necessarily knowledgeable about the topics we care about in a survey Household members may not be familiar with the term May be the eldest member of the household May not be the appropriate respondent for questions about the household

Select All Explanation The head of the household is typically the eldest member of the household. As Marc mentions, the household head isn't always the best person to provide accurate answers and may not have the knowledge about the topics covered in a survey.

Which of the following should be recorded in tracking sheets in the field? (Select all that apply) Date of visit Whether respondent was found Reason respondent was not found Whether survey was completed Whether supervisor was present Surveyor (ID) who conducted survey

Select All. Please see slide 20 of lecture "Collecting High quality data - complete data" (week 8) for a typical example of a tracking sheet showing the necessary information

Reporting bias in administrative data is analogous to _______ in survey data Sampling bias Self-selection bias Social desirability bias Recall bias

Social Desirability Bias

For which of the following procedures is the surveyor likely to know not only whether but also precisely when a check is happening during the survey? Select all that apply Spot checks Accompaniments Back-checks Audio audits None of the above

Spot checks Accompaniments Since both spot checks and accompaniments involve observing surveys as they are being implemented, the surveyor is aware whether and when precisely the check is happening. When it comes to audio audits, the surveyor is aware that s/he is being recorded (based on the consent form), but s/he is not aware of precisely at what point the audio audit is initiated. The surveyor is not aware of at what point back-checks take place.

If we decide to hire a survey firm, which components of data collection can we delegate to the firm? (Select all that apply) Questionnaire development Surveyor Training Survey Logistics Recruitment Human resource management Quality control

Survey Logistics Recruitment Human resource management Explanation When hiring survey firms, it is ok to delegate logistics, recruitment and management of survey staff. It is important that the researcher continues to manage training and data quality.

What is the purpose of data?

To measure: - Outcome - Covariates (reduces standard error) - Takeup / Treatment compliance - Heterogeneous effects - Context for external validity

Which of the following should NOT be included in a tracking sheet? Respondent address Unique ID Date scheduled (first visit) Treatment status Surveyor (team) assigned Respondent name

Treatment status Explanation As Asman mentions, it is ideal if the surveyor is not aware of the treatment status to avoid any possible surveyor biases or effects

In the context of randomized evaluations, researchers can obtain administrative data from both public (i.e., governmental) and private institutions. True False

True

Survey specifications include (select all that apply): The digital hardware used to collect data The operating system of your hardware Unique IDs Variables Possible values Logical checks Skip patterns

Unique IDs Variables Possible values Logical checks Skip patterns Survey specification is a digital blueprint of the survey template and data to be collected. Specifically, it includes which information needs to be coded into the data collection software to reflect the survey.

Match the high-frequency check to its purpose: Validation of responses Percentage of "don't know" and "refusal" responses Completion by treatment status

Validation of responses Logic check Percentage of "don't know" and "refusal" responses Enumerator check Completion by treatment status Project check

Which method(s) can be used to transport CAPI data from the field to a database? (Select all that apply) Cellular network Wireless network USB A large envelope

a, b, c Explanation Several technologies such as cellular, wireless, USB devices or SD cards can be used to transport CAPI data from the field to a server.

Which of the following should be provided to prospective surveyors during the training? (Select all that apply) A survey manual Food or money for food Compensation for their time Stationary and other survey materials Training on human subjects and informed consent A detailed description of the intervention(s) and research design A contract for the full survey period

a, b, c, d, e. The first five options are examples of the material provided during training. It is not advisable to provide a detailed description of the intervention or the full contract (before the training is complete).

Which of the following can be done with CAPI? (Select all that apply) Random audio recordings Alerts when enumerators performing interviews too quickly Immediate rejection of out-of-range responses None of the above

a, b, c. Explanation As Chris mentions, a number of quality control tools can be applied when using CAPI methods.

The following are questions that you might ask the data provider to learn more about the data universe and the data content. Which of the following questions should you ask to learn more about the data content? (select all that apply) a. Do you have a data dictionary that documents the variables you collect, the context in which they were collected, and a range of possible values? b. Is your data individual-level data or aggregate-level data? c. Which births are captured in the data? d. Why would some births be included and some excluded? e. Are birth certificate data actively reported by an individual or are they collected passively? f. Does birth certificate data contain variables that indicate low birthweight?

a, b, e, f With primary data collection, researchers determine the data content by selecting a list of questions and deciding how to ask each question. On the other hand, with administrative data, the data content is predetermined by the needs of the program and data agency. The answers to the questions above would help you gain a better insight into the data content by understanding which indicators are available, how they were collected, and ranges of possible values

Respondents' Unique IDs can potentially be used to: (Select all that apply) Link respondents' personally identifiable information to their responses Link different tables with a different structure in a relational database. Link raw data to analysis code to analysis output None of the above

a, b.

When planning for back-checks, we take only a subsample of... (Select all that apply) Respondents to be back-checked Questions to be asked in the back-check questionnaire Surveyors who will be back-checked None of the above

a, b. Explanation Back-checking is only done on a subset of respondents. The back-check questionnaire does not include all questions in the original survey. It only includes a subset of questions that fall under specific categories as mentioned in the previous video. However all surveyors should be backchecked.

Which of the following strategies are used to protect personally identifiable information (PII) (Select all that apply) Include a unique ID on every sheet of the survey Include PII on every page of the survey Keep PII clustered together on a few pages of the survey Separate the record information bundle from the rest of the survey as soon as possible Store the survey together with the PII in the same locked cabinet None of the above

a, c, d. Explanation As Marc mentions, we do two three things to protect PII at the questionnaire design stage: (a) We try to put all of the identifying information or the record information bundle up-front at the beginning, so the page (or pages) can be detached (b) We ensure unique IDs (different form PIIs) are filled on each page of the survey and (c) Separate the record information bundle from the rest of the survey as soon as possible

Which characteristics do back-checks and audio audits share? (Select all that apply) Done on all enumerators Require revisits of households Usually conducted with a subset of questions Enumerators are unaware of which questions are audited None of the above

a, c, d. Explanation Household revisits do not apply to audio audits. In fact, audio audits make household visits for the most part unnecessary. All other options (except for none of the above) are true for both back-checks and audio audits.

For large-scale data collection, which of the following reasons would likely require we use paper surveys over digital data collection (Select all that apply)? If we do not have enough time to work on digital data collection software prior to survey launch If we have a limited budget If we are administering exams to children If we are collecting data from within a factory that does not allow digital devices for fear of losing intellectual property

a, c, d. Paper surveys can be deployed faster than digital surveys because they do not necessarily require the up-front data collection software. Exams for children, especially in settings with low digital literacy, require paper because we do not wish to confound testing of educational concepts and comfort with digital devices. Some companies would not allow digital devices inside their premises out of fear of visual or audio recording of trade secrets. Paper surveys are not necessarily (and usually not) cheaper than digital devices, although this may not be true for smaller, shorter surveys.

Compared to survey data, administrative data are less susceptible to _______ bias because _______. recall, data are collected at the time of occurrence social desirability, administrative data are always self-reported attrition, both treatment and control individuals are guaranteed to never drop out of administrative databases. All of the above

a.

What is the largest potential risk of conducting spot checks and accompaniments They can lead to or exacerbate social desirability bias They must be planned in advance They cannot catch fraud, since the surveyor is aware of the supervisor's presence They can only catch errors on a subsample of questions

a. Explanation As Chris mentions, it is important to be mindful about how observers are introduced to the respondent during spot checks and accompaniments. It is best if they are introduced in a general, non-threatening way, so that the respondent is not nervous about the extra scrutiny. For example, the enumerator might just say that they have a colleague joining for the interview. The ultimate concern, of course, is that an extra observer might bias the data in some way, maybe by increasing the social desirability effect whereby respondents are uncomfortable giving unpopular or embarrassing responses

Which of the following is the best reason to start your survey with a module on demographics? Because demographics are considered less intrusive in some contexts and allow the surveyor to build a rapport with the respondent Because collecting demographic information does not require informed consent Because it contains sensitive questions that we want answered earlier in case the respondent chooses to end the survey prematurely Because it contains identifiers which are critical for later analysis

a. Explanation In most developing countries, demographic questions are considered less intrusive. Opening the survey with more generic questions allows the surveyor to build a rapport with the respondent and ensures the latter's engagement through the survey

The primary purpose of survey tracking is to ensure that: All (willing) respondents are surveyed Respondents' private information is not compromised The control group is not contaminated Respondents are aware of what time surveyors will arrive The questionnaire does not take too long to administer

a. Explanation Respondent tracking is a process undertaken to ensure that every individual in our sample who is willing to participate is surveyed. It involves a set of activities that defines the direct process of data collection that takes place both in the field and in the office

Most evidence shows that CAPI (Select all that apply): Results in less measurement error than PAPI Results in higher attrition than PAPI surveys Increases the duration of a research project Increases the cost of research projects None of the above

a. Based on the study discussed by Chris in the previous video, using CAPI resulted in markedly fewer missing responses, markedly fewer responses that shouldn't have even been given, and markedly fewer responses that were clearly or likely in error, when compared to PAPI.

Which of the following can be checked reliably only with digital devices? Speed limits for questions and modules Entering data within valid ranges Following skip patterns correctly None of the above

a. For paper surveys, editors or scrutinizers can be used to check skip patterns and validity of responses, or most of the output for logical constraints. Speed is a process, and checking for speed limits can be best done only with digital devices

Which of the following are discussed in the safety component of training? (select all that apply) a. How to use GPS devices b. Where to store equipment at the end of each day c. How surveyors should dress d. Local traffic safety laws e. Which protective gear they should wear f. Where the survey should take place (e.g. inside or outside of the house) g. Sexual harassment h. How to deal with Standardized versus conversational styles i. Using a separate sample of respondents (not the study sample) for training and the false launch

b, c, d, e, f, g.

Which of the following should go into a survey plan? (Select all that apply) Research question Paper vs electronic data collection In-house vs outsourced data collection Local Permissions Pre-analysis plan

b, c, d. As Asman mentions, survey planning involves several stages including thinking about decisions related to modes of data collection, field team sourcing decisions and securing permission and authorizations to conduct the research

Which of the following are true about PAPI surveys (Select all that apply) Do not require any form of software interface for inputting data Can allow us to begin collecting data sooner Require separate device(s) for collecting GPS, photos, etc. Typically get us data quicker to analyze

b, c. Explanation PAPI requires investments in data entry software and data analysis can only begin after data entry is complete. It involves carrying additional devices to the field such as a GPS device, camera or an audio recorder, if our study requires them. That said, we can get to the field sooner when using PAPI surveys because we do not necessarily need to develop the software for digital devices before data collection

What is the disadvantages of IVR and SMS techniques when compared to PAPI and CAPI? (Select all that apply) It is difficult to analyze data from IVR and SMS IVR and SMS may be difficult to implement for long surveys Respondent cannot easily seek any clarifications on the question/survey None of the above

b, c. Explanation Since there is no direct contact with the surveyor, IVR and SMS techniques may not work for long surveys or if the respondent requires any clarification

What goals can be accomplished by reviewing audio audits in the office? (Select all that apply) We can verify which respondents gave consent to be audio recorded, and which did not We can review the way questions were asked to see if protocols are followed Review respondents' answers to check for accuracy None of the above

b, c. Explanation Verification of whether respondents provided consent to audio audit or not must be done before any audio is recorded. Audio audits can be used to review how questions were asked and also to verify the accuracy of responses.

When choosing identifiers for matching study data to administrative data, which of the following identifiers would be preferable to using an individual's street address (Select all that apply)? An email address A government-issued, unique identification number Date of birth None of the above

b, c. A government-issued, unique ID number and date of birth are correct because these are both numerical identifiers as opposed to identifiers comprised of letters and numbers. An email address is incorrect because people often have multiple email addresses, and they are prone to be misspelled because they often have letters and numbers.

We may want to review surveys used in prior research on the specific topic in our discipline because: (Select all that apply) It has already been field tested in our local context We can leverage the thought that went into producing the prior survey If we use the same questions, we can potentially merge the resulting data sets for meta analysis We want our questions, and therefore responses to be impartial None of the above

b, c. It is always useful to look at existing surveys as we can use their learnings to develop our instruments. Also, it is useful to use the same questions to create a meta data. This was also discussed in Measuring Gender last week where Rachel uses both DHS questions and questions created specifically for her survey in Bangladesh to understand if and how responses differ.

The metadata we track for the data collection process includes (Select all that apply) Questionnaire questions and response options Surveyor assignments All data from the field (i.e. responses from respondents) Completion rate Surveyor attrition Folder structure

b, d, e. Metadata for data collection are data that relate to details of the data collection process and progress (excluding the data themselves).

Which of these is an example of administrative data (Select all that apply)? Text from a tweet Information from a birth certificate Demographic information collected during a baseline survey Income information from tax records None of the above

b, d.

Sometimes, survey companies may ask for additional payments outside those mentioned in the contract. What are some of the reasons it is usually appropriate to make additional payment? (select all that apply) The company was not able to find as many enumerators as hoped, so the survey was in the field longer than expected The survey roll-out was delayed as the researcher requested changes to the survey and the enumerators had to be trained on the additional section It took longer to find respondents than anticipated The survey tool was delayed because IRB permission was not in place at the anticipated start date Some households had to be re-surveyed as the enumerators were found to be making up data None of the above. It is never appropriate to make additional payments not covered in the original contract

b, d. Explanation The main takeaway here is that as long as the time lapse is due to the research team, it is ok to make additional payments. However, the survey company usually need not be compensated for any delays due to problems with team management or their own ability to estimate costs. That said, the research team may choose to compensate the survey company if the request is reasonable (e.g. unforeseen events were could not have been expected by anyone) and it rewards an otherwise fruitful business relationship.

Why is it important to minimize attrition? (Select all that apply) a. Because attrition can introduce untruthful reporting by respondents b. Because attrition can reduce statistical power c. Because attrition expands the size of the sample d. Because attrition can effectively turn a random sample into a non-random sample

b, d. Explanation If you have a fixed sample, then the more attrition you have the lower your statistical power will be. Similarly, there is always the danger that respondents who leave the sample in the treatment group are different, and leave for different reasons, from those who leave in the control, which means that a random sample can, after attrition, no longer be random. This can bias results in unpredictable ways.

Chris Robert argues that "you can create separate digital instruments for different tracking purposes -- but that's not strictly necessary". Why might we not need separate tracking instruments with digital surveys? Because if supervisors are out of range (of, for example, 2G or 3G internet connections), data will not be immediately available Because tracking information can be built into the questionnaire directly Because we typically do not use supervisors to track progress when collecting data digitally Because we only track respondents with paper surveys

b. Explanation As Chris mentions, creating separate digital instruments for different tracking purposes is not strictly necessary because your primary digital instrument can adapt to the circumstances. If nobody is available to be surveyed, for example, it can ask why not and when to try again, and then end the survey. The point is, digital instruments are adaptive - and filling them in doesn't waste paper, even if most of the survey is skipped.

In addition to the data provider, who should sign the Data Use Agreement (DUA)? The study subjects An official institutional representative The research assistant(s) Anyone who will come in contact with the data The principal investigator(s) All of the above

b. An official institutional representative DUAs should be signed by an official institutional representative rather than an individual PI or staff member.

In the paper, "Code and Data for the Social Sciences: A Practitioner's Guide", Gentzkow and Shapiro refer to a file, rundirectory.bat. An equivalent we've been discussing is: The code header The master do file Analysis do files Cleaning do files Merging do files The global macro to set a particular path

b. The Master do file.

The exact/deterministic matching strategy may lead to more ___________, while the fuzzy/probabilistic matching strategy may lead to more ___________. False positives, false negatives False negatives, false positives None of the above

b. false neg, false pos.

During the data matching process, the _______ file and the _______ file are combined to create the _______ file. identified finder, de-identified analysis, administrative data identified finder, administrative data, de-identified analysis administrative data, de-identified analysis, identified finder

b. identified finder, administrative data, de-identified analysis

Certain information is collected in the field during respondent tracking implementation. What do we typically do with that information? (Select all that apply) Use the information to pre-load (pre-print) names, addresses, etc into tracking sheets Update question wording and/or surveyor instructions for difficult questions Schedule revisits for missing households Estimate response rates Update productivity estimates and timeline

c, d, e. Respondent tracking information is primarily used to schedule revisits. Apart from this, the tracking data can also be used to update productivity estimates, estimate response rates, update timeline and refine efficiency strategy, if needed

Which of the following are true about High-Frequency Checks? (Select all that apply) Are easier to conduct with paper surveys Are usually conducted on a subsample of respondents surveyed to date Typically involve basic statistical analysis Can be conducted every day

c, d. Explanation High frequency checks are mostly office based and typically involve statistical analysis that can be conducted on a daily basis. Once programmed, they are usually easy and fast to conduct, and are therefore conducted on the full data set as of any given point in time.

We use relative references in our code so that (Select all that apply) Old files are hidden from the working directory to avoid confusion and clutter Raw data are kept in their purest form We do not need to repeat the full file path of the working directory for each file used or created Different analysts who have different locations for their project folder do not need to change the file path for each file We maintain version control as well as changes made by each user.

c, d. Relative references allow us to quickly get to the working directory once, which allows us to skip writing in the full file path each time a file is called or created, which is useful for collaborators. Archives are used to store old files. The advisable coding practice of never overwriting existing raw files protects raw data. Version control can be maintained using tools like Git or GitHub.

The following are questions that you might ask the data provider to learn more about the data universe and the data content. Which of the following questions should you ask to learn more about the data universe? (select all that apply) a. Do you have a data dictionary that documents the variables you collect, the context in which they were collected, and a range of possible values? b. Is your data individual-level data or aggregate-level data? c. Which births are captured in the data? d. Why would some births be included and some excluded? e. Are birth certificate data actively reported by an individual or are they collected passively? f. Does birth certificate data contain variables that indicate low birthweight?

c, d. To understand the data universe, you must determine which individuals are captured in the data set and identify why they are included and why others are excluded from the data set.

The benefits of choosing a "conversational interview" style include: (select all that apply) Questions can be answered more quickly Reduces surveyor effects or bias Increases the likelihood that respondent understands the questions (and response options) Allows respondents to give responses that are not necessarily listed as response options None of the above

c. Explanation The main advantage of a conversational style is that question can be clearly conveyed to the respondent. It allow Interviewer to say whatever it takes to be sure that the respondent understands question as intended. However, it is time consuming and can lead to surveyor effects or bias

When might we consider providing special compensation to respondents for participating in the survey? When respondents become upset when asked a sensitive question When respondents are unwilling to participate in the survey When respondents claim they have no time to answer questions When our tracking sheet suggests that we have data, but we have no questionnaire filled out

c. As Asman mentions, we may consider providing some compensation during revisits to only for those respondents who have problems committing time.

Which of the following can be a challenge for using CAPI in the field? (Select all that apply) Requires respondents have access to telephones Compared to PAPI, CAPI is more sensitive to different dialects, accents, or audio quality Most devices need a power source for recharging Requires regular cellular connectivity

c. The need for a power source is a challenge with CAPI, especially given the length of surveys and the number of surveys completed in a day. Cellular connectivity is only required to transfer the data, so lack of 'regular' connectivity is not a huge issue. Also, there are several alternatives for data transfer such as wireless connectivity, SD cards etc.

At what point in the lifecycle of the evaluation should you approach the Vital Records Office to begin negotiating a Data Use Agreement? After the intervention has finished. This way, you will have anecdotal evidence of the impact of the nurse home visiting program and can request the right variables to measure the program's effect. While you are entering and cleaning the baseline enrollment survey data. As soon as you determine that you will need access to administrative data for your research project. After you have conducted the baseline enrollment survey and completed random assignment, so that you know the list of individuals to which you need to link data and can specify this information in the DUA.

c. As soon as you determine that you will need access to administrative data for your research project.

Master code files are meant specifically to: Describe the purpose of the code generally including output Mark key events (e.g. merging, appending, expected events, etc) Run (call) all other coding files in the project Label discrete tasks (e.g. renaming, working with unique IDs, etc) Credit the authors and list software version compatibility

c. Run (call) all other coding files in the project

When conducting double data entry, what is the main reason we want to use two separate data entry operators (DEOs)? The second DEO can act as a supervisor or monitor of the first DEO One DEO can enter respondents' personally identifiable information and the other DEO can enter responses, maintaining the respondents' privacy There is a lower likelihood of different DEOs making the same mistake twice It allows us to complete data entry in roughly half the time

c. There is a lower likelihood of different DEOs making the same mistake twice

The primary reason researchers should assign households to a specific survey team is that: Some survey teams are given specific training on how to collect data from treatment households The workload for each surveyor should be balanced and fair We need to minimize transportation time for our survey teams The proportion of respondents from each treatment arm should be balanced across survey teams

d. Explanation Assigning households to specific survey team ensures that the proportion of respondents from each treatment arm is balanced across survey teams.

Which of the following is true about the piloting phase of questionnaire development (Select all that apply)? It happens only after the questionnaire is complete. It should be complete before the survey team is trained It is usually not necessary if the questionnaire has been tried and tested in another (peer-reviewed, high-quality) study It should not use the same households that are part of the study sample None of the above

d. Explanation Piloting happens both during questionnaire development and after. Similarly, it can be a part of the training process. However, we should not pilot the questionnaire on the households that are part of the study sample. Also, it is important to pilot the questionnaire even if they have been tried and tested in another study.

According to Gentzkow and Shapiro, rather than naming the latest version of a file: regressions_022713_mg.do, one should instead: name it: regressions_2013.02.27_mg name it: regressions_2013-02-27_mg name it: regressions_2013.02.27.b name it: regressions_20130227b Use version control software, and not use dates

e. Use version control software, and not use dates

In regards to administrative data, the _______ identified and sensitive the data that you are asking to be released, the _______ challenging it will be to get those data outside of the agency for research. more, more less, more more, less

more, more.

What type of questions cannot be asked in PAPI? (Select all that apply) Open ended questions Multiple response questions Questions that require response filters Questions where responses are based on rankings or ratings Field coded questions None of the above

none above. Explanation There are no restrictions on type of questions within PAPI and CAPI surveys.

Research has shown that relative to CAPI, PAPI (Select all that apply) Is less expensive Leads to higher attrition Results in less measurement error Allows us faster access to digital data None of the above

none of the above. Based on research and evidence presented by Chris, PAPI is not necessarily less expensive and does not allow us faster access to data. The research also shows that CAPI can result in less measurement error when compared to PAPI. Attrition is not associated with the mode of data.


Set pelajaran terkait

spanish 3 semester 1 exam true/false

View Set

Physics Chapters 16, 17, 18, 21, & 22

View Set

RN CELLULAR REGULATION PREP U QUESTIONS

View Set

Chapter 5 Part C: Types of Annuities

View Set

Quinqueer Chapter 8 online quizzies

View Set

Geography Chapter 8 Review, Chapter 8

View Set