Big Data
Biased sample, messy
2 cons of big data
Intercoder reliability test
2 human coders analyze news articles independently, compared to see level of agreement; overcomes subjectivity
Analyze user profiles, analyze tweets posted by the same user, analyze network statistics
3 methods of sorting out users
Unsolicited, public opinion of the past, greater flexibility
3 pros of big data
Collect big social data, sort out social media users, analyze big social data
3 steps of big data analysis
Data limitation, biased samples, need to plan ahead, hardware requirements
4 limitations of Twitter's streaming API
Data cleaning, term frequency, word association, word cloud, sentiment analysis
5 components of basic text mining
LDA Analysis
A computer tells the topics it detected in tweets
Topic modeling
A document that has a mixture of latent topics
Representative sample/census
A smaller extraction of a larger population used for manual content analysis
API
Allows programs to tap into one another, register for one account with another
Computational social science
An emerging research field of social scientists and computer scientists
Big Data
Analyzes news media and public opinion, domain-dependent, ever-evolving, goes beyond the capabilities of traditionally used tools; tests and advances social science theories
Dictionary-based analysis
Big social data analysis based off pre-determined categories and key words
Supervised machine learning
Classifying documents into known categories (sometimes using topic modeling)
Basic text mining
Data cleaning, term frequency, word association, word cloud, sentiment analysis
Sentiment analysis
Manual content analysis vs. SentiStrength
Bigram
Most frequent pairs of words
Unigram
Most frequent word
Codebook
Pre-defined categories used for manual content analysis
Hand coding
Reading and deciding the topic of information (e.g. tweets)
Stemming
Reducing a word to its base form
Lemmatization
Reducing comparative/superlative degrees of a word (even ones that don't look like the root) to the base form
Data cleaning
Stemming; lemmatization; removing stop and space words, punctuation, and making lower case
The core of communication research
To answer our questions