Big Data pt 3.
The dichotomy of big data: utopian or dystopian
+Big data is seen as a powerful too to address various societal ills, offering the potential of new insights into areas as divers as cancer research, terrorism, and climate change. -Big data is seen as a troubling manifestation of Big Brother, enabling invasions of privacy, decreased civil freedoms, and increased state and corporate control.
Big Data Definition
1. Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets. 2.Analyze: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. 3. Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.
1. Big Data changes the definition of knowledge
Big Data not only refers to very large data sets and the tools and procedures used to manipulate and analyze them, but also to a computational turn in though and research. Big Data has emerges a system of knowledge this is already changing the objects of knowledge, rule also having the power to inform how we understand human networks and community. It offers "the capacity to collect and analyze data with an unprecedented breadth and depth and scale" -Number's don't speak for themselves. Other methods for ascertaining why people do things, write things, or make things are lost in the sheer volume of numbers. But the specialized tools of Big Data have their own inbuilt limitations and restrictions. •Facebook and Twitter archiving and search functions
The advance of knowledge via a leap of the scale and scope in relation to a given object or phenomenon
Data belongs to the object •Taking comes before interpreting •The most atomizableuseful unit of analysis
the "crisis in empirical sociology"
Data sets that were once obscure and difficult to manage-- and thus only of interest to social scientist-- are now being aggregated and made easily accessible to anyone who is curious, regardless of their training.
3. Bigger data are not always better data
Having large quantities of data does not mean that methodological issues are no longer relevant Twitter example •User demographics •Filtering by Twitter •Privacy constraints Understanding the sample is more important than ever. Example is gathering twitter statistics: twitter data scholars have used to examine a wide variety of patterns like media event engagement and conversational interactions, tends to focus on "how many" people studied. This does not represent all people. All people and twitter users are not synonymous. **Big Data and whole data are not the same*** It is not clear what tweets are included in these different data streams or sampling them represents. Small data is also important and can give insight as well.
4. Taken out of context, Big Data loses its meaning.
Researcher look to twitter and other social media to analyze connections between messages and account, making claims about social networks. Yes, the relations displayed through social media are not necessarily equivalent to the socio-grams and the kinship networks that sociologist and anthropologists have been investigating since the 1930s. Introduces two new types: articulated networks:those that result from people specifying their contacts through technical mechanisms like email or cell phone address books. The motivations that people have for adding someone to each of these lists vary widely but the result is that these ;its can include friends, colleagues, acquaintances, celebrities, friends of friends, etc. Behavioral networks: derived from communication patterns, cell coordinates, and social media interactions. These might include people who text message each other, tagged pics on fb, people who email each other, and people who are physically in the same space according to their cell phones. Frequency of contact does not indicate strength of relationship.
Articulated Networks
Results form people specifying their contacts through technical mechanisms. e.g. email and cell phone contacts
6. Limited access to Big Data creates new digital divides
The current ecosystem around Big Data creates a new kind of digital divide: the Big Data rich and the Big Data poor. •Those without access to Big Data can neither reproduce nor evaluate the methodological claims of those who have privileged access. Collecting and analyzing big swathes of data is a skill set generally restricted to those with a computational background Manovich(2011) writes that there are three classes of people in the realm of Big Data: "those who create data (both consciously and by leaving digital footprints), those who have the means to collect it, and those who have expertise to analyze it" The last group is the smallest and the most privileged Those with money-- or those inside the company--can produce a different type of research than those outside. Those without access can neither reproduce nor evaluate the methodological claims of those who have privileged access. The division between scholars(or whoever has access) will widen significantly. Also a question of skills. Also produces a restircted culture of research findings. Large date companies have no responsibility to make their data available, and they have total control over who gets to see them. Divide between big data rich and big data poor.
2. Claims to objectivity and accuracy are misleading
The scientific method attempts to remove itself from the subjective domain through the application of a dispassionate process whereby hypotheses are proposed and tested, eventually resulting in improvements in knowledge •Nonetheless, claims to objectivity are necessarily made by subjects and are based on subjective observations and choices. We interpret our data. This is not to say that all interpretations are created equal, but rather that not all numbers are neutral. •Design decisions and data cleaning •Data errors •Interpreting results Researchers must recognize their identity and perspective Big Data offers the humanistic disciplines a new way to claim the status of quantitative science and objective methods. But big data is still subjective and what it quantifies does not necessarily have a closer claim on objective truth. But there remains a mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business of producing facts. In this way, Big Data risks re-inscribing established divisions in the long running debates about scientific method and the legitimacy of social science and humanistic inquiry. All researchers are interpreters of data thus they cannot take a subjective view away from their observations. The design decisions that determine what will be measured also stem form interpretation. i.e. for social media data, there is a 'data cleaning' process: making decisions about what attributes and variables will be counted, and which will be ignored. This process is inherently subjective. Also issue of data errors. They are often unreliable, prone to outages and losses and these errors and gaps are magnified when multiple data sets are used together. This often results of seeing patterns where none actually exist.
Big Data has been described by the following properties
Volume Variety Velocity
5. Just because it is accessible does not mean its ethical
What is the status "pubic" data on social media sites? 'Any data on human subjects inevitably raise privacy issues one the real risks of abuse of such data are difficult to quantify. It may be unreasonable to ask researchers to obtain consent from evert person who posts a tweet, but it is problematic for researchers to justify their actions as ethical simply because the data are accessible. In order to act ethically it is important that researchers reflect on the importance of accountability. Accountability being a multi-directional relationship: there may be accountability to superiors, to colleagues, to participants, and to the public. IRB -Institutional Review Board Informed consent -legal procedure to ensure that a research participant is aware of all the potential risks and costs of being involved in a study
Behavioral Networks
derived form communication patterns, cell coordinates, and social media interactions e.g. texting, tagging, emailing.
The four forces that regulate social systems
market law social norms architechture ---or in the case of technology, code.
The market sees big data is pure opportunity:
marketers use it to target advertising, insurance providers use it to optimize their offerings, wall street bankers use it to read the market. Legislation has already been proposed to curb the collection and retention of data, usually over concerns about privacy.
Cons of Big Data
•Limited access to object •Little knowledge of how data were gathered •Datasets capture limited dimensions
Pros of Big Data
•Replicability •Potential "whole universe" datasets •Ready made manipulability