MITx: Research Integrity, Transparency, and Reproducibility

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What does a Gantt chart have that is not typically included in a WBS? Assignment of responsibilities to people working on the project List of potential stakeholders associated with the project Expected duration or deadline of tasks 'Dependencies' that link tasks together Listing of all tasks and subtasks All of the above

'Dependencies' that link tasks together Explanation What makes a Gantt slightly different is the setting of dependencies. And this is where more sophisticated project management software can help, like Microsoft Project or Google's Gantter. For example, some tasks simply cannot start until others are complete. That linking of two tasks is called a "Dependency".

Research Transparency issues ( 2/3 Specification Searching)

- Also known as Data mining, phishing, p-hacking, etc. - Researchers might choose specifications that make their results look better Solution: Pre-analysis Plans (also known as Statistical Analysis Plans)

Research Transparency issues ( 3/3 Ex-post Replicability or Reproducibility)

- If I take my data and I run the regressions again, do I get the same answer - All analysis should be reproducible Solution: Data access and data publication

Research Transparency issues ( 1/3 File Drawer Problem and Publication Bias)

- Researchers don't write up the results of the study if the study shows no effect - Publication bias: journals are more excited to publish results if the study shows something new and novel - Scientific community does not see all the results Reasons: 1. Editors/referees only accept significant results. 2. Authors only submit significant results. 3. Authors manipulate / data mine until they achieve just barely significant results. Solution: - Registration of studies - Pre-analysis plans - "Two-stage review" - "Registered reports"

Where is an ideal location to publish your research data? My organization/institution's website A trusted data repository Your personal website Any of the above None of the above

A trusted data repository Trusted data repositories have strong commitments to long-term storage and archival of data. Many data repositories even have agreements with other data repositories to back up and store their data in the event of an accident. In addition, most repositories will provide unique identifiers to individual datasets

IRB approval may be required (Select all that apply) From a principal investigator's host institution From the country where research is taking place If the activities are part of practice, but not research Even when no information on or from individuals are being collected or used None of the above

A, B Explanation IRB approval may be required from the institutions if research is being conducted by students, faculty or staff and from country where the research is taking place. IRB need not be required if there is no research (i.e. if part of practice), or no research on human subjects.

In which of the following scenarios would IRBs likely waive the requirement of obtaining informed consent? (Select all that apply) When risk to respondents is low, and the cost of collecting consent is high When risk is low, but obtaining informed consent itself may change respondents' behavior and therefore reduces the value of the experiment When the respondents are children who cannot properly weigh the costs and benefits, and therefore "informed consent" may not be considered particularly "informed" When the respondents are prisoners and may be under duress when giving informed consent When the program is a cluster randomized trial, and we are only collecting data from a sample of potential beneficiaries None of the above

A, B Explanation If the risk of research is low, IRB may deem informed consent unnecessary if it prevents the research from happening. Vulnerable populations (such as children and prisoners) are given extra protections, not less. While cluster randomized trial means a cluster of people are treated, even if we only collect data from a subsample; however, we will still likely require informed consent from those from whom we collect data.

What should one consider when preparing data for publishing? (Select all that apply) Could anonymizing the data bias the results Does suppressing or redacting data make the data less useful to other researchers Is restricted access to sensitive data a better option than de-identifying data How much of the data needs to be shared in order for researchers to replicate the results Does the informed consent clause allow for publishing the data None of the above

A, B, C, D, E All of these factors are some important considerations to think about when thinking about publishing data.

4 Metron's norms:

Four core values: 1) Universalism: Research findings as fundamentally "impersonal" 2) Communality: Open sharing of scientific knowledge 3) Disinterestedness: Researchers should be motivated by identifying the truth rather than self-interested professional or monetary motivations 4) Organized skepticism: The ability to verify data and scrutinize claims is thus critical for research to live up to its own standards

Which of the following is NOT true about IRBs? (Select all that apply) IRB approval is sufficient in protecting researchers from legal obligations IRBs can provide advice on whether the data publication process protects study participants Data management and data security plans are often subject to approval by IRBs IRBs can determine the sensitivity of the data being collected None of the above

IRB approval is sufficient in protecting researchers from legal obligations Explanation IRB approval is not sufficient in protecting researchers from legal obligations. It's the researchers' responsibility to understand the legal framework that dictates their research.

The Role of IRB

IRB review consent procedures and documentation IRB may review data management plans - May require procedures to minimize risk of disclosure - May require procedures to minimize harm resulting from disclosure IRBs make determination of sensitivity of information IRBs make determination regarding whether data is de-identified for "public use""

True or False: If files are encrypted and the encryption key is lost, the files can be easily retrieved. True False

False Explanation Encryption keys cannot be easily retrieved. Encryption is one of the best methods for securing data.

True or False: Only those individuals that participate in the study can be harmed by a data release. True False

False Explanation Some study data can provide information about vulnerable populations. For example, data that is shared about one participant may expose information about people within that participant's population.

Informational harm

An informational harm occurs when others use research results or data; and then violate the rights of an individual or organization; or negatively impact their interests. Direct harm: Subject recognized directly from the data Indirect harm: Subject inferred by a company indirectly and then treated differentially. Groups can be harmed in this way (like increases insurance premium inferred from zipcode).

Research Transparency issues:

1) File Drawer Problem and Publication Bias 2) Specification Searching 3) Ex-post Replicability or Reproducibility

What's in a pre-analysis plan (PAP)?

1. Designate and define primary outcome variable, co-primary outcome variables, or index variable. 2. Designate and define secondary outcome variables, if necessary 3. Specify rules for including/excluding data or observations 4. Identify specific regression specification and statistical tests to be used 5. State any planned sub-group analysis

How does Hammermesh define a "pure" replication? A replication where researchers use the same data, perform the same analysis, and employ the same methods as the original authors A replication where researchers use different data, but perform the same analysis, and employ the same methods as the original authors A replication where researchers test the same hypothesis, but choose different methods than the original authors A replication where research test the same hypothesis, but use a different dataset, a different timeframe, or a different setting than the original authors None of the above.

A Explanation A "pure" replication is one of the most basic types of replications. In a pure replication, the researchers performing the replication try to exactly replicate the work of the original researchers.

What could be an example of p-hacking? (Select all that apply) Running different specifications to get more favorable results Dropping unforeseen variables or observations ex-post to increase the impact of your intervention Publishing only the results that are significant Running an experiment multiple times until you get the results you wanted Publishing only segments of your data online

A, B, D Explanation Any time a researcher or set of researchers uses a technique that is intended to positively affect the p-value of a study, this is considered p-hacking. These are just a few examples of what could be considered p-hacking.

Which of the following criteria might suggest to an IRB that they should not only review the ethics of the research itself, but also the ethics of the program being evaluated? Select all that apply The researcher designed and implemented the program The program would have been implemented with or without the evaluation The program design has been altered to facilitate the evaluation The program is considered "common practice"—i.e. widely used by practitioners None of the above

A, C Explanation As Rachel mentions, there are several criteria for judging if the program is covered by IRB rules. This includes (a) whether the program is designed and implemented by the researcher or a third party, (b) would the program have gone ahead without the evaluation? (c) To what extend is the design of the program influenced by the evaluation and/or the evaluator?, and (d) Is the program novel or of the type in common usage?

What are some strategies for mitigating risks when making measurement choices? (Select all that apply) Determine if the sensitive data is necessary for the study Categorize responses (i.e. income or age) into groups or brackets Randomized responses Collecting group responses None of the above

A-D Explanation All of the choices are possible ways for reducing and mitigating risks to your study participant when collecting data.

Which of the following is a reason you could have publication bias? (Select all that apply) Editors/referees only accept significant results Editors/referees know the authors that are submitting the publication Authors only submit significant results Authors manipulate / data mine until they achieve just barely significant results. None of the above

All Explanation Publication bias can occur for a number of reasons. When an action affects the type of results that get published (whether its by journal editors or authors), it is publication bias. In this question, all four responses are examples of publication bias.

Why was it important for the researchers in this example to register a pre-analysis plan? (Select all that apply) The researchers were only testing one outcome The partners had a strong vested interest in the outcomes of the program Since the researchers were collecting a lot of data, it would have been easy for them to cherry-pick the results The researchers were conducting an additional qualitative component to the study None of the above

B, C Explanation In the case study we looked at, the second and third options were the reasons that led the researchers to decide that writing a pre-analysis plan would be important for their study.

Which of the following can be considered a risk of potential harm from the research (select all that apply) We do not know whether a commonly practiced intervention is helpful or harmful The program's effect is unknown and implemented as part of the research project The program is known to be effective, but withheld to maintain a valid comparison group The research leads to different targeting of the program The research will not be able to distinguish correlation from causation None of the above

B, C, D Explanation If the intervention is commonly practiced, it is unlikely the research per se would be considered as causing harm. However, if the intervention is being implemented because of the research, the unknown benefit or harm can be considered a risk. Similarly, if the program's impact on a separate target population is unknown, and tested as part of research, this could be a risk of harm. And if a program is effective and is withheld because of research, that would be considered a harm. Research may not pass our ethical principles because it fails to have sufficient benefits, but that does not mean it causes harm.

Which are the three key principles from the Belmont Report? (choose 3) Equipoise Justice Beneficence Respect for persons IRB approval

B, C, D Explanation The three principles are Justice, Beneficence, and Respect for persons.

Which of the following is a violation of the justice principle? (Select all that apply) Designing an experiment poorly, and failing to obtain valid lessons Testing medicine only on high-school boys, because the drug's safety during pregnancy has not been established Testing a program's effectiveness in a poor country even though only people in rich countries would be able to afford the program None of the above

C Failing to learn lessons is judged under the beneficence principle. Choosing to test medicine on boys only would only violate the justice principle if the drug was developed primarily for women or older or younger men. Not testing high-school women could be the appropriate conclusion of the beneficence principle. Taking a program only accessible for the rich, and testing on the poor is a clear violation of the justice principle.

In the India education program, which activities required IRB approval? Select all that apply: The NGO Pratham's activities in these villages Sharing the educational rights of villagers, if it impacts the village leader Measuring teacher attendance at school Surveys of knowledge of rights

C, D Explanation Activities that are considered research require IRB approval. In this case, approval is required for measuring teacher attendance and conducting surveys to measure knowledge.

Information life cycle stages / Data Management Plan:

Collection: - Threats from direct observation - Threats from recording - Threats from transmission Transformation: - Data partitioning - DeID / Redaction: Anonymization and de-identification are legal concepts. Rigorous statistical definitions are not clear - Encryption Retention and Storage: - Violations of integrity: Data modified illegally - Violations of availability: Data not available when required - Violations of confidentiality: Data accessed only with proper authorization Access: - Identifiability: What is shared / deid level - Disclosure limitation: How much is shared ? - Data sharing and access: How is it shared ? Post release: - Auditing and Monitoring: -- Provide subjects with rights to action over their data -- Look for changes in auxiliary data affecting identifiability and sensitivity -- Plan to discover and notify re-identified subjects -- Plan to detect and respond to improper use

Information privacy

Control and protection over the extent and circumstances of information collection, sharing, and use. Focuses on the entire information lifecycle. Privacy protection balances usefulness and privacy.

Information security

Control and protection protection against unauthorized access, use, disclosure, disruption, modification, or destruction of information. Pertains to access control and use case.

What is not something that would typically go into a pre-analysis plan? Primary (and potentially secondary) variables Criteria for data cleaning Regression specifications and statistical tests Final results None of the above

Final Results Explanation Final results will not be known until the data is collected. They therefore cannot be included into a pre-analysis plan. All other answers are examples of things that should go into pre-analysis plans.

What is one drawback to using Dropbox or Google Drive for hosting data? Data are not encrypted during transmission Cloud service allows for sharing across multiple users Data can be decrypted by these companies if they are legally required Difficult to use They encrypt files in transmission and on the server None of the above

Data can be decrypted by these companies if they are legally required Explanation Dropbox and Googled Drive do not have client-side encryption or zero knowledge encryption. Therefore, they do have the keys to your data and can decrypt your information.

Which of the following is a primary challenge in writing a pre-analysis plan? Prevents one from conducting exploratory analysis Decreases confidence in your results It can strain relationships with partners Difficult to write for complex studies None of the above

Difficult to write for complex studies Explanation As we discussed in this section, there are some costs to writing pre-analysis plans. One of those costs is that it can be extremely difficult to write a pre-analysis plan for a complex study because sets of results may rely on previous results found in the study.

What is one benefit of writing a pre-analysis plan? Prevents one from conducting exploratory analysis that's not pre-specified Allows for interim looks at the data Improving relationships with partners Limits learning ex-post about your data None of the above

Improving relationships with partners Explanation One strong case for why researchers should write pre-analysis plans is to strengthen their relationships with their partners. Pre-analysis plans can act as an agreement between the research and the partner prior to the beginning of the intervention.

Key documentation:

Information privacy and security plan: Should be prepared during research Consent document SLA Data Use Agreement Implementation Checklist

The "use of research results or data to learn about an individual as a result of their participation in the research, and then violate their rights or negatively impact their interests" relates to which concept? Information Utility Information Privacy Informational Harm Information Security None of the above

Informational Harm Explanation Informational harm occurs when data is used against a participant. Identification isn't by itself an example of information harm unless we know that the research subjects suffer embarrassment or other harm (e.g. they could've consented to publication of identity).

When developing a password, what is a good practice? (Select all that apply) Make the password longer Use common names Set up multi-factor authentication Make them hard to remember Build higher randomness into the password Share passwords via email to ensure they are not forgotten None of the above

Make the password longer Set up multi-factor authentication Build higher randomness into the password Explanation Passwords tend to be some of the weakest set of controls within a data security system because they are subject to human error. A password manager, such as LastPass, can help people create strong passwords, manage a large number of passwords, and build a multi-factor access system.

Which of the following fields could potentially be used to identify someone and must be considered closely when publicly releasing the data? (Select all that apply) Movie preferences Friends Zip codes and birthdates Ice cream preferences Neighbors None of the above

Movie preferences Friends Zip codes and birthdates Ice cream preferences Neighbors Explanation All of the options above could potentially be used to identify someone. Zip codes and birthdays are direct identifiers; friends and neighbors are indirect identifies; movie and ice cream preferences could potentially identify someone given other available data.

Information Utility

No universal definition, depends on the intended use and other available info.

In the Belmont report, the line between research and practice (select all that apply) Is clearly articulated for both social science and medicine Is the same in both social science and medicine Is not a factor when determining whether something requires IRB approval None of the above

None of the above Explanation The Belmont report says the line between practice and research is different between medicine and social science, but does not define it in social science. One example of whether the line matters is when a researcher designs a new intervention.

According to Franco et al., which of the following are least likely to be published? Strong results Mixed results Null results

Null Results Explanation Results that fail to reject the null hypothesis are less likely to be published because authors are less likely to write them up. Additionally, missing results cannot be published (as they are missing).

A hardening system built for high risk information might include (Select all that apply): Single-sign on Password complexity enforcement Default password changes Dropbox storage folder All of the above

Password complexity enforcement Default password changes A hardening system for high risk information must be carefully thought through and apply tight controls to ensure that data leakages do not occur.

What does PII stand for? Persons Information and Identification Proof of Individual Identification Personally Identifiable Information None of the above

Personally Identifiable Information Explanation PII refers to Personally Identifiable Information. It refers to any Information that can be used to identify an individual or households with a reasonable level of confidence

Indentifiability

Potential for learning about individuals from computations based on data in which they are included. Measures (weakest to strongest): - Record linkage: Match a real person to precise record in a database. - Indistinguishability: Individuals can be linked only to a cluster of records (of known size) - Limits on adversarial learning (Differential privacy): Formally bounds the total learning about any individual that occurs from a data release.

Information sensitivity

Potential harm resulting from disclosure

Which of the following solutions is most relevant for solving the problem of publication bias? Pre-registration of studies Writing a pre-analysis plan Performing robustness checks Peer review Making research data accessible None of the above

Pre-registration of studies Explanation If we are able to know all of the potential studies testing a particular outcome, then we would be able to know the full extent of publication bias towards testing that outcome. Pre-registration of studies would ensure that researchers at least report to the greater scientific community that they are planning to test a particular outcome.

What are the activities involved in the executing and monitoring phase of project management? (Select all that apply) Defining the Scope: Deliverables, Activities Identifying Stakeholders Identifying Resources (Constraints) Establishing acceptable level of Risk/Quality Progress and Changes Stakeholder Communication Managing information

Progress and Changes Stakeholder Communication Managing information Explanation As Marc mentions, the executing and monitoring phase involves tracking the progress and changes, managing stakeholder communications, and managing the information we gather along the way.

The place to document all major decisions is: The Stakeholder Map Email to PIs RACI chart The Project Log The WBS (Gantt)

Project Log Explanation The project log should document all major decisions taken during a research project including detailed notes on the reasons and nature of the changes.

What are the key deliverables at the end of a research project? (Select all that apply) Publication/Report Data Questionnaires Description of process, including key decisions Policy influence None of the above

Publication/Report Data Questionnaires Description of process, including key decisions The key deliverables at the end of a research project are typically the lessons, documentation of the tools and procedures that got us there, both of which are often discussed in the published paper, along with the questionnaires, and underlying data.

Research lifecycle phase actions:

Research Design Phase: - Identify major information threats & sensitive information - Identify privacy and security framework - Identify key control families - Create data management plan Collection, Storage, Retention, Access: - Select individual controls - Select tools - Implementation - Monitoring and auditing Post-Dissemination: - Monitoring & auditing

At which point during the research lifecycle is it important to think about data security? (Select all that apply) Research design Research implementation Research analysis None of the above

Research design Research implementation Research analysis Explanation There are different considerations to be made during each stage of the research lifecycle. At each stage you should be considering how to implement your data security plan.

Which of the following principles states that participants should be informed of risks and given a choice about participation Equipoise Justice Beneficence Respect for persons IRB approval None of the above

Respect for persons Explanation Respect for persons is the principle that leads to the practice of receiving informed consent. Research subjects should be able to volunteer to participate, and to be knowledgeable about what they are getting themselves into.

Information controls:

Safeguards to avoid, detect, counteract, or reduce information risks

What is not one of Merton's norms? Communality Secrecy Universalism Disinterestedness Organised skepticism

Secrecy Explanation Secrecy is not a norm of the scientific community, as defined by Merton. It actually goes against one of Merton's key norms, communality.

What should a Data Management plan include? (Choose all that apply) Method of recording consent on all surveys Where paper surveys will be stored When and how paper surveys will be disposed of How PII will be encrypted

Select All. Explanation Data management plan for paper surveys should Include method of recording consent , questionnaire design specifications (such that PII data can be easily removed etc.) and information on how the paper surveys will be stored and disposed off. For data in digital form, the data management plan also needs to provide details regarding encryption plans for all data with PII, including data collection software and data entry software

What must you do if the intervention changes significantly in both content and duration? (Select all that apply) Update your Gantt chart Update your finances Get IRB approval Inform your stakeholders of how other deadlines are affected Document in your Project Log

Select All. Any changes in the content and duration of the project must first be documented clearly in the project log. Then, updated your WBS/Gantt, timelines and financials accordingly. Any major changes in the intervention require IRB approval and finally stakeholders need to be updated as per the RACI chart.

What is one problem we will discuss in research transparency? Specification searching (data mining) Non-compliance with treatment assignment Spillover Attrition None of the above

Specification searching (data mining) Attrition, non-compliance with treatment assignment, and spillover are all issues related to challenges in research that we've already covered in this course. One problem with scientific research which we will discuss in this section is the prevalence of specification searching and how to prevent it.

Who is primarily responsible for following privacy law in a research study? (Select all that apply) The IRB Lawyers Study participants The researchers Research assistants None of the above

The Researchers IRBs/lawyers are responsible for knowing the law, so are research assistants, but primary responsibility for complying with the law falls on the researchers undertaking the study.

Which scenario represents the best process for transferring sensitive data among research partners? A researcher encrypts her files and then sends the files and passwords to the files via email A researcher stores the files in a shared Dropbox folder and shares with research partners A research manager stores unencrypted data on their computer and then sends an encrypted file to the research team The research team encrypts all files, stores them on Dropbox, and shares passwords over the phone All scenarios are sufficient

The research team encrypts all files, stores them on Dropbox, and shares passwords over the phone Explanation The scenario where the research team stores encrypted files on Dropbox and then shares passwords over the phone is the best method for transferring sensitive data. It is important to not write passwords in emails or other forms of communications.

What is included in an informed consent document? (Select all that apply) Communicates how the researchers will protect the confidentiality of the participants Explains the purpose of the study Assures complete confidentiality of the study participants Discusses how data will be destroyed at the end of the research Allows the participant to opt out of the study at any time Discusses the benefits of the research None of the above

a, b, e, f The informed consent document should be seen as an agreement with the researchers and the survey participants. At no point in time during data collection or thereafter should the researchers do something that goes against the signed informed consent clause.

A dataset is considered K-anonymous when: For each record, at least k-1 records contain the same identifying characteristics to make them indistinguishable There are k variables that, when removed, make the dataset fully de-identified K records contain identifying characteristics After some records are removed, k records remain in the dataset that are indistinguishable

a. For each record, at least k-1 records contain the same identifying characteristics to make them indistinguishable K-anonymous is an attribute designated to a dataset that contains identifying characteristics, but because enough records contain the same characteristics, the records cannot be identified.

What are some ways in which data can be de-identified? (Select all that apply) a Rewriting b Redaction/Removal c Hiding d Partitioning e Encryption f None of the above

b, d, e Explanation Redaction, partitioning, and encryption are three ways data can be de-identified.

For what activities is IRB review and approval or exemption usually required? (Select all that apply) Exploratory research Piloting of the questionnaire, when no PII is collected Administering questionnaire for research, when no PII is collected Administering questionnaire when PII is collected Use of administrative data on study subjects

c, d, e. IRB is not required for exploratory research or when piloting questionnaires where PII data is not collected. All other activities require IRB review and approval.

The state of California government has the resources to provide low-cost homes for roughly 500,000 families who live below the poverty line in the San Francisco region, and wants to evaluate the impact of this program using a randomized evaluation. Which approach might optimize the ethical cost-benefit trade-off? Expand program eligibility to the poorest 1 million, and randomize half to receive the program Provide houses to the poorest 500,000 households, and assign the next 500,000 into the control group From the 500,000, randomly assign only 250,000 to receive housing, keeping the other half in the control group Expand program eligibility to the next poorest 100,000 households, and out of the 600,000 poorest households, randomize 500,000 to receive the program.

d. The method of expanding the eligibility by 100,000 and randomizing 500,000 to receive the program would lead to an expected 5/6ths of the original target population receiving the program. The other methods either result in a far lower proportion of the target population reached, or produce results that are far less reliable. Because the study has such a large sample size, it is likely adequately powered regardless of the allocation to treatment and control

Which of the following is NOT a basic requirement of IRBs? (Choose one) All personnel on the project are human-subjects certified Precise wording of consent forms must be approved Any breach in IRB compliance or any adverse events are reported to IRB committee De-identified data are made public at completion of research Annual updates of research status and subjects reached Research design including all surveys reviewed and approved before research starts

d. De-identified data are made public at completion of research While it may be a common best practice as a part of research transparency, IRBs do not require that the data are made public after research. All other oprions listed are mandated by IRB

Transforming data is one control used when making data publicly available. What type of transformation is done when a birthdate field is transformed to age (in years)? a. Local Suppression b. Partitioning c. Aggregation d. Perturbation e. Generalization None of the Above

e. generalization When you change all the values of a field such as birthdate to age, you are making the data less accurate. Thus, you are generalizing the data. While this helps protect your study participants, it could potentially affect your final results if someone were trying to replicate them.

What website should you visit if you want to register a randomized controlled trial in the social sciences? www.registerstudies.org www.povertyactionlab.org www.socialscienceregistry.org www.aeaweb.org None of the above

www.socialscienceregistry.org Explanation The AEA Social Science Registry (www.socialscienceregistry.org) was created in 2013 for the purpose of registering randomized controlled trials in the social sciences.


Ensembles d'études connexes

Central America, Colombia, and Venezuela

View Set

"Si" clauses : présent, futur simple, imparfait, plus que parfait, conditionnel présent ou conditionnel passé? Conjuguez le verbe en parenthèses selon besoin.

View Set

Assault, battery, False imprisonment. (Professional nursing & ethics)

View Set

Fundamentals of Netoworking FINAL v.7

View Set

Combo with "Western Civilization ch1,2,3 - EXAM 1"

View Set

AFI 36-2903 - Ornamentation and Accessories Standards (Male & Female)

View Set