Week 2 Ensuring data integrity
Bias
A preference in favor of or against a person, group of people, or thing
Open data
Data that can be accessed, updated, reused, and exchanged by anyone and everyone.
Openness
Free access, usage, and sharing of data
GDPR (General Data Protection Regulation)
General Data Protection Regulation of the European Union
Which of the following are usually good data sources? Select all that apply.
Governmental agency data Academic papers Vetted public datasets
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it's processed, and how it's shared
What are the main benefits of open data? Select all that apply
Open data makes good data more widely available Open data combines data from different fields of knowledge
What aspect of data ethics promotes the free access, usage, and sharing of data?
Openness
Privacy
Preserving a data subject's information and activity any time a data transaction occurs
ROCCC
Reliable Original Comprehensive Current Cited
Kaggle's datasets and Data Explorer allow you to do which tasks?
Search for datasets Upload your own datasets Access datasets
What is most often anonymized
Telephone numbers Names License plates and license numbers Social security numbers IP addresses Medical records Email addresses Photographs Account numbers
Data Interoperability
The ability of data systems and services to openly connect and share data
A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply.
The data is not current The data is not original
Data anonymization
The process of protecting people's private or sensitive data by eliminating identifying information
Observer Bias (Experimenter Bias/ research bias)
The tendency for different people to observe things differently
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative way
comfirmation bias
The tendency to search for interpret information in a way that confirms pre-existing beliefs
An unbiased sample is representative of the population being measured. Which of the following helps ensure unbiased sampling?
Using random sampling during data collection
Data Ethics
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society fairness, or specific virtues
Unbiased sampling
When a sample is representative of the population being measured
Aspects of data ethics
- ownership - transaction transparency - consent - currency - privacy - openness
Which of the following are examples of sampling bias? Select all that apply.
A clinical study includes three times more men than women A survey of high-school-age students does not include homeschooled students A national election poll only interviews people with college degrees.
Data bias
A type of error that systematically skews results in a certain direction
transaction transparency
All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
Consent
An individual's right to know explicit details about how and why their data will be used before agreeing to provide it
There are 50 students in a class. A data analyst wants to know if a majority of students like the instructor. They decide to survey the 15 students who earned an A in the class because these students were clearly paying attention to the instructor. Which of the following statements best describes this sample?
Biased
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
Consent
A data analyst removes personally identifying information from a dataset. What task are they performing?
Data anonymization
Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply.
Everyone must be able to use, re-use, and redistribute open data No one can place restrictions on data to discriminate against a person or group.
Currency
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions
To determine if a data source is cited, you should ask which of the following questions? Select all that apply
Is this dataset from a credible source Who created this dataset
Which of the following terms are also ways of describing observer bias? Select all that apply.
Research bias Experimental bias
Fill in the blank: _____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
Transaction transparency
Sampling bias
When a sample isn't representative of the population as a whole
Fill in the blank: The tendency to search for or interpret information in a way that validates pre-existing beliefs is _____ bias.
confirmation
Personal identifiable information (PII)
information about an individual that identifies, links, relates, or describes them.