Boyd & Crawford, "Critical questions for Big Data"

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

2. Claims to Objectivity and accuracy are misleading

-All researchers are interpreters of data. Data need to be imagined as data in the first instance, and this process of the imagination of data entails an interpretative base: 'every discipline and disciplinary institution has its own norms and standards for the imagination of data'. As computational scientists have started engaging in acts of social science, there is a tendency to claim their work as the business of facts and not interpretation. A model may be mathematically sound, an experiment may seem valid, but as soon as a researcher seeks to understand what it means, the process of interpretation has begun. This is not to say that all interpretations are created equal, but rather that not all numbers are neutral -Even if data is intended to be "objective", it can be subjective due to the interpreters of the data and the creators of the database. -Large data sets from Internet sources are often unreliable, prone to outages and losses, and these errors and gaps are magnified when multiple data sets are used together. -To make statistical claims about a data set, we need to know where data is coming from; it is similarly important to know and account for the weaknesses in that data. Researchers must be able to account for the biases in their interpretation of the data. To do so requires recognizing that one's identity and perspective informs one's analysis -Interpretation is at the center of data analysis -Sub. Of Data: social media data, there is a 'data cleaning' process: making decisions about what attributes and variables will be counted, and which will be ignored. This process is inherently subjective -Data Error: data mining techniques could show a strong but spurious correlation between the changes in the S&P 500 stock index and butter production in Bangladesh.

1. Big Data changes the definition of knowledge

-Big Data not only refers to very large data sets and the tools and procedures used to manipulate and analyze them, but also to a computational turn in thought and research -Like Fordism in the early 20th century, Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community -Big Data provides 'destablising amounts of knowledge and information that lack the regulating force of philosophy'. Instead of philosophy - which Kant saw as the rational basis for all institutions 'computationality might then be understood as an ontotheology, creating a new ontological "epoch" as a new historical constellation of intelligibility'

4. Taken out of context, Big Data loses its meaning

-Data are not generic. There is value to analyzing data abstractions, yet retaining context remains critical, particularly for certain lines of inquiry. Context is hard to interpret at scale and even harder to maintain when data are reduced to fit into a model -Articulated networks are those that result from people specifying their contacts through technical mechanisms like email or cell phone address books, instant messaging buddy lists, 'Friends' lists on social network sites, and 'Follower' lists on other social media genres. The motivations that people have for adding someone to each of these lists vary widely, but the result is that these lists can include friends, colleagues, acquaintances, celebrities, friends-of-friends, public figures, and interesting strangers. -Behavioral networks are derived from communication patterns, cell coordinates, and social media interactions. These might include people who text message one another, those who are tagged in photos together on Facebook, people who email one another, and people who are physically in the same space, at least according to their cell phone -Both behavioral and articulated networks have great value to researchers, but they are not equivalent to personal networks - Example: although contested, the concept of 'tie strength' is understood to indicate the importance of individual relationships. When mobile phone data suggest that workers spend more time with colleagues than their spouse, this does not necessarily imply that colleagues are more important than spouses. Measuring tie strength through frequency or public articulation is a common mistake: tie strength - and many of the theories built around it - is a subtle reckoning in how people understand and value their relationships with other people. Not every connection is equivalent to every other connection, and neither does frequency of contact indicate strength of relationship. Further, the absence of a connection does not necessarily indicate that a relationship should be made.

5. Just because it is accessible does not make it ethical

-Institutional Review Boards (IRBs): Ethics research committees that conducted research on human subjects (emerged in 1970s) -goal of IRBs is to provide a framework for evaluating the ethics of a particular line of research inquiry and to make certain that checks and balances are put into place to protect subjects. Practices like 'informed consent' and protecting the privacy of informants are intended to empower participants in light of earlier abuses in the medical and social sciences -In order to act ethically, it is important that researchers reflect on the importance of accountability: both to the field of research and to the research subjects. - Accountability is a multi-directional relationship: there may be accountability to superiors, to colleagues, to participants, and to the public -Many ethics boards do not understand the processes of mining and anonymizing Big Data, let alone the errors that can cause data to become personally identifiable. Accountability requires rigorous thinking about the ramifications of Big Data, rather than assuming that ethics boards will necessarily do the work of ensuring that people are protected -There are also significant questions of truth, control, and power in Big Data studies: researchers have the tools and the access, while social media users as a whole do not. Their data were created in highly context-sensitive spaces, and it is entirely possible that some users would not give permission for their data to be used elsewhere. Many are not aware of the multiplicity of agents and algorithms currently gathering and storing their data for future use. Researchers are rarely in a user's imagined audience. Users are not necessarily aware of all the multiple uses, profits, and other gains that come from information they have posted -Being in public (i.e. sitting in a park) and being public (i.e. actively courting attention) -Ex: In 2006, a Harvard-based research group started gathering the profiles of 1,700 college-based Facebook users to study how their interests and friendships changed over time (Lewis et al. 2008). These supposedly anonymous data were released to the world, allowing other researchers to explore and analyze them. What other researchers quickly discovered was that it was possible to de-anonymize parts of the data set: compromising the privacy of students, none of whom were aware their data were being collected

3. Bigger data are not always better data

-Just because Big Data presents us with large quantities of data does not mean that methodological issues are no longer relevant. Understanding sample, for example, is more important now than ever. -Twitter does not represent 'all people', and it is an error to assume 'people' and 'Twitter users' are synonymous: they are a very particular sub-set. Neither is the population using Twitter representative of the global population. Nor can we assume that accounts and users are equivalent. Some users have multiple accounts, while some accounts are used by multiple people. Some people never establish an account, and simply access Twitter via the web. Some accounts are 'bots' that produce automated content without directly involving a person. Furthermore, the notion of an 'active' account is problematic. - Twitter Inc. has revealed that 40 percent of active users sign in just to listen (Twitter 2011). The very meanings of 'user' and 'participation' and 'active' need to be critically examined.

6. Limited access to Big Data creates new digital divides

-Only social media companies have access to really large social data - especially transactional data. An anthropologist working for Facebook or a sociologist working for Google will have access to data that the rest of the scholarly community will not'. Some companies restrict access to their data entirely; others sell the privilege of access for a fee; and others offer small data sets to university-based researchers. -Money dictates access to data, causing a wealth based knowledge gap -Those without access can neither reproduce nor evaluate the methodological claims of those who have privileged access. -Top-tier, well-resourced universities will be able to buy access to data, and students from the top universities are the ones most likely to be invited to work within large social media companies. Those from the periphery are less likely to get those invitations and develop their skills. The result is that the divisions between scholars will widen significantly. -When computational skills are positioned as the most valuable, questions emerge over who is advantaged and who is disadvantaged in such a context. This, in its own way, sets up new hierarchies around 'who can read the numbers', rather than recognizing that computer scientists and social scientists both have valuable perspectives to offer -Significantly, this is also a gendered division. Most researchers who have computational skills at the present moment are male and, as feminist historians and philosophers of science have demonstrated, who is asking the questions determines which questions are asked -the difficulty and expense of gaining access to Big Data produce a restricted culture of research findings. Large data companies have no responsibility to make their data available, and they have total control over who gets to see them. Big Data researchers with access to proprietary data sets are less likely to choose questions that are contentious to a social media company if they think it may result in their access being cut. -New kind of digital divide: the Big Data rich and the Big Data poor -Three classes of people in the realm of Big Data: 'those who create data (both consciously and by leaving digital footprints), those who have the means to collect it, and those who have expertise to analyze it'. We know that the last group is the smallest, and the most privileged: they are also the ones who get to determine the rules about how Big Data will be used, and who gets to participate.

-Big Data is less about data that is big than it is about

a capacity to search, aggregate, and cross-reference large data sets.

-Social systems are regulated by four forces:

market, law, social norms, and architecture, or, in the case of technology, code. When it comes to Big Data, these four forces are frequently at odds. The market sees Big Data as pure opportunity: marketers use it to target advertising, insurance providers use it to optimize their offerings, and Wall Street bankers use it to read the market.


Kaugnay na mga set ng pag-aaral

NCLEX-CARE OF THE PATIENT WITH A GASTROINTESTINAL DISORDER

View Set

Chapter 44: Geriatric Emergencies

View Set

Chapter 10 Intro to Marketing Quiz

View Set

Chapter Exam - Laws and Rules (Health + Accident)

View Set