Data and Society Combined
ideal user assumption
-data generated by people who express themselves honestly through their personal accounts -fails to hold under many circumstances
deontology (salganik)
-focuses on ethical duties independent of their consequences -respect for persons rooted in this -focus on means
develop a code of conduct (10 rules)
-internalizing debated over ethics is key for successful research -public attention to ethical use of data shouldn't be avoided -provides guidance in peer review -visible case of unethical research can bring problems to an entire field
law enforcement (brayne)
becomes involved once criminal incident occurs
big data (Herschel)
capturing, storing, sharing, evaluating, actving upon info from humans and devices
dragnet surveillance (brayne)
collect data on everyone, increased monitoring of groups previously exempt
predictive, reactive or explanatory
data are increasingly used for ________, rather than _________ purposes
stratified surveillance (brayne)
differentially surveilling individuals according to their risk score
human subject (def 1)
living individual who participates in research investigation as a recipient of an item regulated by the FDA, as a control, or on whose specimen an investigational device is used
merged
previously separate data systems are ___________
utilitarianism and big data (Herschel)
-acts and rules assessed to see where good and bad weighed on scale -would have to quantify plusses and minuses of big data consequences -ambiguity inherent in trying to identify pros and cons
justice
-addresses distribution of burdens and benefits of research -should not be the case that one group in society bears the costs of research while another reaps its benefits
justice (Salganik)
-addresses distribution of burdens and benefits of research-one group shouldn't bear costs while another reaps benefits -researchers shouldn't be allowed to intentionally prey on the powerless -views of this around 1990 went from protection access -often interpreted to raise questions about appropriate compensation
LAPD case study (brayne)
-at forefront of data analytics - invests heavily in data collection and analysis -involved in multiple high-profile scandals in 90s, in response department of justice and them have decree that mandates creation and oversight of new data-driven employee risk management system -2011 began using platform for compiling and analyzing massive and disparate data -shift from traditional to big data surveillance associate w migration of law enforcement operations toward intelligence activities -use risk scores and field interview cards -shift from reactive to proactive, problem-oriented policing strategies
minimal risk standard
-attempts to benchmark risk of particular study against risks participants undertake in their daily lives -can make decision if something meets this standard even if don't know the absolute level of risk
Richard Herschel, Virginia M. Miori
-big data enables collection and use of massive amounts of data from man and machine -data characterized in terms of volume, variety, velocity, veracity, variability, complexity -four ethical theories talked about: kantianism, utilitarianism, social contract theory, virtue theory -digital media increasingly more data intensive and media rich -big data requires examination of those that have control over it bc can be used to target and manipulate people
more data are coming
-big data grow into more domains -also reach into past as libraries digitize their collections -more linkages between different big data will become more common
models will become more generic
-creating generic models and making them available to public -let researchers use pretrained machine learning models on their own data -use big data to make most effective models, then make those models standard for processing unstructured data -generic not always better than specified
big data (brayne)
-data environment characterized by it being vast, fast, disparate, and digital -analysis of large amounts of info -high frequency observations, fast data processing -comes from wide range of institutional sensors, merging of previous separate data
beneficence (Salganik)
-do not harm - shouldn't injure person regardless of possible benefits to others -maximize possible benefits and minimize possible harms -researchers should do risk/ benefit analysis and make decisions about whether risks and benefits strike appropriate ethical balance
virtue ethics
-emphasizes moral character rather than duties, rules, or the consequences of actions -character and actions of the people who deploy and use big data -also considering the intended and unintended effects of their actions on others
kantianism (Herschel)
-ethical theory concerns not about what we do but what we should do -dutifulness reflects good will -dutiful person acts way they do bc of morale rule -rules are paramount -everyone held to same standard and there are clear guidelines for appropriate behavior -not outcome that matters but rule behind the action
respect for law and public (Salganik)
-explicitly encourages researchers to take wider view, include law in considerations -compliance: researchers should attempt to identify and obey relevant laws, contracts, terms of service -transparency-based accountability: researchers should be clear about goals, methods, results at all stages of research, take responsibility -the IRB is a floor, not ceiling -not a ceiling - just filling out forms and following rules isn't enough, ethical responsibility still lies w reseracher
consequentialism (utilitarianism) (salganik)
-focuses on taking actions that lead to better states in the world -beneficence rooted in this -focus on ends
consider the strengths and limitations of your data (big does not automatically mean better) (10 rules)
-ground datasets in proper context including conflicts of interests -during data acquisition important to understand source and rules and regulations of data gathered -being mindful of data's context lets you clarify when data and analysis are working or not -sensitive to potential multiple meanings of data
engage with the broader consequences of data and analysis practices (10 rules)
-how might big data research lessen environmental impact of data analytics work -should researchers take lead in asking cloud storage to shift to sustainable energy -big data research has societal-wide effects
different data are coming
-image processing can analyze pictures now -tools to analyze these data increasingly being made available
practice ethical data sharing (10 rules)
-in some cases sharing data is expecation and key part of researchers -asking participants for broad, not narrowly structured consent, makes it easier to share data -even if broad consent gained shoul still consider best interest of participant -people followed by data clouds collected under mandatory terms of service -burden of ethical use and sharing placed on researcher
respect for persons (Salganik)
-individuals should be treated as autonomous -individuals w diminished autonomy should be entitled to additional protections -researchers shouldn't do things to people w/o consent -get informed consent when possible
qualitative approaches to big data
-innovative approaches weaving together qualitative methods and computational approaches -searching and sorting archives
design your data and systems for auditability (10 rules)
-internal auditing processes flowing easily into audit systems keep track of factors that could contribute to problems -automated testing processes for assessing outcomes can strengthen research -clearly document when decisions are made and backtrack to earlier dataset if needed
Sarah Brayne
-intersection of two structural developments: growth of surveillance and rise of big data -adoption of big data analytics facilitates amplifications of prior surveillance practices -data used for predictive purposes, rather than reactive or explanatory -automatic alert systems makes it possible to surveil very large number of people -social consequences of big data surveillance for law and social inequality -some individuals, groups, areas more surveilled than others, different populations surveilled for different purposes -two theories for why many institutions adopted big data surveillance: technical/ rational perspective and institutional perspective -US criminal justice surveillance increased dramatically - incarceration, parole and probation, untinended one -data driven decision-making become systematically incorporated into law enforcement practices in recent decades
guard against the reidentification of your data (10 rules)
-when datasets thought to be anonymzed combining w other variables can have unexpected reidentification -metadata associated w digital activity useful in identifying indivuduals -difficult to recognize these vulnerable points
automated alerts
________ make it possible to surveil many people
human subject (def 2)
a living individual about whom a research obtains data through intervention or interaction w the individual or identifiable private information
respect for a person
all persons have moral worth and should be treated w dignity
power analysis
allows researchers to calculate sample size they need to reliably detect effect of given size
stakeholder
any group of individuals who can affect or are affected by the achievement of the organization's objective
big data hubris
belief that volume can solve all problems
technical/ rational perspective (brayne)
big data is means to improve efficiency through improving prediction, filling analytic gaps, effectively allocating scarce resources
virtue (Herschel)
character trait that is well entrenched in possessor, makes that person good
dragnet surveillance practices
collect data on everyone, rather than merely individuals under suspicion
surveillance (brayne)
collection, recording, classification of info about people, processes, institutions
convenience census
complete record of certain set of individuals or behaviors matching certain criteria
variability (Herschel)
data flows can be inconsistent w periodic peaks
predictive purposes
data increasingly used for ____________
complexity (Herschel)
data is structured and unstructured from multiple sources
walled garden approach (Salganik)
data shared w people who meet certain criteria and who agree to be bound by certain rules
query-based system (brayne)
databases to which users submit requests for info in form of a search
direct police contact
datasets now have info on individuals who haven't had __________
direct police contact
datasets now include info on individuals who have not had any __________
moral virtues (Herschel)
deep-seated habits or dispositions formed through repetition of virtuous actions over time
intellectual virtue (Herschel)
derived from reasoning and truth
unintended use paradox (Herschel)
different sets of data that wouldn't previously been considered as having privacy concerns being combined in ways that threaten privacy
risk score
discretionary assessments of risk are supplemented and quantified using __________
risk scores
discretionary assessmnets of risk supplemented and quanitifed with __________
virtue ethics (Herschel)
emphasizes moral character rather than duties, rules, or consequences
deontological argument for informed consent (salganik)
focus on researcher's duty to respect autonomy of participants
intelligence (brayne)
fundamentally predictive: gathering data, identifying suspicious patterns, locations, etc.
alert based system (brayne)
get real-time notifications when certain variables are present in the data
context-relative informational norms
govern flow of info in specific settings -actors (subject, sender, recipient) -attributes (types of info) -transmission principles (constraints under which info flows) -differences in any of these three make different sets of these in situation
consequentialist argument for informed consent (salganik)
helps prevent harm to participants
mass surveillance in the US
law enforcement databases now include facial recognition of 117 million people (about 1 in 2 adults)
staged trials
move up step by step (ex. testing effectiveness of new drug
palantir
one of premier platforms for compiling and analyzing massive and disparate data by law enforcement and intelligence agencies
data from multiple platforms will become standard
possible and easier for researchers to perform studies on different platforms
informational risk (Salganik)
potential for harm from disclosure of info
merged
previously separate data systems are __________
anonymization (Salganik)
process of removing obvious personal identifiers, much less effective than people realize
systematically surveil
proliferation of automated alerts makes it possible to ________ unprecedently large number of people
beneficence
researchers should undertake two separate processes: a risk/ benefit analysis and then a decision about whether the risks and benefits strike an appropriate ethical balance
ethical-response surveys
reserachers present brief decription of proposed research project then ask: -if someone you cared about was candidate participant would you want them to be included -do you believe researchers should be allowed to proceed w this experiment
common rule (Salganik)
set of regulations governing human subjects research
ethics
study of what it means to do the right thing
surveillance
systematic investigation or monitoring of the actions or communications of one or more persons
dataveillance
systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons
function creep (brayne)
tendency of data collected for one purpose to be used for another, unintended one
research ethics
this suggests that researchers should make their studies as small as possible
respect for law and public
transparency-based accountibility - researchers should be clear about their goals, methods, and results at all stages of their research and take responsibility for their actions
risk/ benefit analysis (Salganik)
understanding and improving risks and benefits of study
confirmation bias (Herschel)
when data selectively used to confirm preexisting viewpoint while disrefarding data that refutes it
mass surveillance in the UK
-one CCTV (closed circuit television) per 12 people
David Lazer and Jason Radford
-issue w data is who and what get represented -certain big data can be vulnerable to changes in data generation process -scale of big data creates illusion they contain all relevent info on all relevant people -twitter has become 'model organism' for social media scholars -generalizability is a question of reference -results from one pop. of users doesn't necessarily apply to other -fix this by using data form multiple sources to validate findings -big data systems susceptible to various kinds of error and misappropriation -more and different data are coming, models will become more generic, data from multiple platforms will become standard -qualitative approaches to big data -methodological integration - big data increasingly integrated w existing research methods in sociology
debate the tough, ethical choices (10 rules)
-many big data ethical issues outside of governance mandate of IRBs -debate issues w groups of peers -precondition of formal ethics rules is capacity to have such debates -if debate done well provies means to understand ethic issues from range of perspective
IRB and data science
-may involve human subject as individual or may aaffect much wide group of people -moves ethical inquiry away from traditional harms like physical pain to less tangible concepts such as info privacy impact and data discirmination -fundamentally changes our understanding of research data to be infinitely connectable, indefinitely re-purposable, continuously updatable and easily removed from the context of collection
kantianism and big data (Herschel)
-organizations w big data not respecting autonomy of people, using personal data to further self-interest -by default, people's privacy compromised for gain of another -no one truly has abililty to determine how thier data is actually shared and used -should everyone assent to rule that states everyone's info can be shared w/o their permission -challenge of rights and fair treatment of individual
IRB origin
-originally developed in direct response to research abuse: -post-WWII doctors' trial - tuskegee syphilis study -some were ijected w disease but didn,t know, even after cures available the people weren't told
social contract theory (Herschel)
-person's moral and/ or political obligations dependent upon contract or agreement people have made to form the society in which they live -people understand that must cooperate and agree to follow certain guidelines to gain benefits of social living -chose rationality over natural selfish instincts
acknowledge that data are people and can do harm (10 rules)
-places difficulty of disassociating data from specific individuals from and center -seemingly benign data can have sensitive and private info -data seemingly having nothing to do w people could impact their lives in unexpected ways -harm also when datasets about pop-wide effects used to shape lives or stigmatize groups
recognize that privacy in more than a binary value (10 rules)
-privacy is contextual and situational -depends on nature of data, context in which created and obtained, expectations and norms of those affected -social media utilizng locations to push info or tracking it for intelligence has been seen as breaches of privacy -privacy extends to groups
american statistical association's ethical guidelines
-professiona integrity and accountability -integrity of data and methods -responsibilities to the science/ public -instituted for protection and support of statistician
operation laser
-program to identify and deter people likely to commit crimes -premised on idea that small percentage of high-impact players are disproportionately responsible for most violent crime -list distributed to patrolmen w orders to monitor and stop the pre-crime suspects as often as possible -at each contact officers fill out field interview card
institutional perspective (brayne)
-questions assumption that organizational structures stem from rational processes -role of culture - organizations operate in technically ambiguous fields, adopt big data due to wider beliefs of what organizations should be doing -big data may confer legitimacy
know when to break these rules (10 rules)
-recognize when is appropriate to stray (ex. in times of natural disaster may be important to temporarily put aside questions to serve larger public good) -review regulatory expectations and legal demands associated w protection of privacy -ethics often about finding good or better, but not perfect
utilitarianism (Herschel)
-right or wrong based on consequences of act or rule -right act is one that produces greatest happiness for community or society -wrong act decreases total happiness of affected parties -right moral rule of conduct is one where if adopted by everyone will have greatest net increase in happiness
mass surveillance
-surveillance of large groups of people -reason for investigation or monitoring is to identify individuals who belong to some particular class of interest to the surveillance organization
mass dataveillance
-systematic use of personal data systems in investigation or monitoring of actions or communications of groups of people -reason for investigation or monitoring is to identify individuals who belong to some particular class of interest to surveillance organization
utilitarianism theory
-theory of the good is fundamental -look at greater good/ greater benefits -examines right or wrong based on the consequences of an act or rule -the right one is one that produces the greatest happiness for a community or society -a wrong act decreases total happiness of the affected parties -focus on ends
deontological theory
-theory of the right is fundamental -choices should be made based on the rules -everyone is treated w dignity -obligation is independent of value -obligation based on reason alone -treat people as ends in themselves, never only as means to an end
Matt Salganik
-uncertainty about appropriate conduct of digital-age social research -ethical uncertainty had chilling effect preventing ethical research from happening -if can develop ethical norms and standards shared by researchers and public can harness capabilities of digital age in responsible and beneficial ways -norms around abstract concepts like privacy still actively debated, no uniform consensus -four principles: respect for persons, beneficence, justice, respect for law and public interest