Final Exam Content
Why is a sentiment analysis predictive?
-Because we don't know peoples' actual feelings, we are simply predicting it....which makes this predictive. We are attempting to analyze it to come up with a predictive value (positive or negative)
What is analytics?
-Extracting information from data and discovering meaningful patterns from it -big data is a way to process large amounts of data and IS NOT DATA ANALYTICS -provide a means of analyzing data sets and drawing conclusions about them to help organizations make informed business decisions.
Structured vs Unstructured Data
-Structured: follows a specified format for big data --i.e. stock prices, names, addresses, credit card numbers, geolocation, timestamps, weather --or: zipcodes, SSNs, phone numbers -Unstructured: data does not follow a specified format for big data = MOST DATA IS UNSTRUCTURED --i.e. photographs and videos, social media data, mobile data, satellite images, Facebook posts --or: tweets, texts, amazon reviews
Regular Databases
-a collection of files, organized as tables -tables are made up of records (rows) -rows are made up of fields (columns) -fields are made up of characters -TABLES > ROWS > COLUMNS > FIELDS -Examples we work with: Gmail, banking information, Facebook, Spotify, Instagram
Readings (2)
-emotion contagion = people experience same emotions without the awareness of it -unstructured data can come from machines (satellite images, science, photos, radar) or from humans (web content, social media, internal text in a company) -sentiment analysis is really opinion mining --two types: subjective (opinion) and objective (fact)
Other in-class assignments
-factors influencing the confidence interval of a trend line? - sample size, confidence level, variability in sample -hierarchies in Tableau: increased efforts in maintaining the workbook with basketball, for example --allow you to drill up or down to get less or more details -word frequency analysis: count the occurrence of each word (one point for each use) but only really show how often the word is used --more positive vs more negative --issues? sarcasm mainly
Readings (1)
-getting the data in the right layout is critical to pivot tables being effective -fields are columns, data records are rows, a record set is one row of data -wrong way to get data? = already in summarized form, which makes it tough to manipulate
Association Mining
-looks for things that occur at the same time -measures relations between variables in large databases -ex: promotional pricing or in-store product placement -in Tableau, we created a heat map with products on the x and y axis, and the larger the box, the more they were bought together --can be use for product placement and such --or could look at traffic stops for a broken light and how many led to a seat-belt violation too
More on Sentiment Analysis
-most common type? - consumer reviews/reports -Facebook cares = exposure to fewer positive posts lead to fewer positive posts (EMOTION CONTAGION) --and it allows them to create more targeted advertising for companies -we use Google Sheets add-in to do this with Tweets and then copy to Excel to do classification of positive vs negative
Pivot Tables and Databases
-pivot tables use a flat file structure -with all related data in the same column, you can aggregate data by column value -Tableau uses the same data format (and is essentially a pivot table analysis) -ALL VALUES OF THE SAME TYPE NEED TO BE IN THE SAME COLUMN
Readings (3)
-probability of scoring increases when? --team members string together more successful passes, soccer data has been tracked for over 60 years -descriptive analytics is about condensing big data -prescriptive analytics requires ACTIONABLE DATA and A FEEDBACK SYSTEM to check decisions made -datafication of everything is the trend now! -Knack is the company that created the video games to test employees, a far step from using intuition or boring tests that even highest executives couldn't pass
Sentiment Analysis
-the task of finding the opinions of authors about specific entities -offers organizations the ability to monitor various social media sites in real time and act accordingly (like with stock prices) -a form of predictive analytics because you take information and come up with a predictive value and to make an inference -categorizes whether a statement is positive or negative and assigns it a score based on a word library -Issues: some words don't understand combinations (like terrifically bad, etc) ---sarcasm, noisy text (slang, mistakes)
People Analytics
-trying to spot prospective employees -application of analytics to people's careers -predicting potential most often used for hourly work, because success and failure is more clearly measured -how? looked at performance in playing games and completing puzzles - data is collected and used to make a decision as to what type of employee he or she will be (and analyzing human interaction)
Types/Levels of Sentiment Analysis
1. Document-level (looking at the entire document or article) 2. Sentence-level (looking at the entire sentence) 3. Aspect-level (looking at just a phrase; MOST DETAILED TYPE) -Comparative Sentiment Analysis: Coke vs Pepsi -Sentiment lexicon: the dictionary along with associated sentiments
Understanding Big Data
Big data is a SET OF TECHNOLOGIES -it is not data analytics -it is also not information or knowledge -Hadhoop and MapReduce are popular tools to: ---deal with larger databases and constant changes ---big data and its storage can be a massive deal, and breaking it down is critical
Types of Data Analytics
Descriptive Analytics: 80% of business analytics are this -simplest type of analytics -summarizes what happened/past data -i.e. # of posts, mentions, followers, page views -use scatter plot to describe data Predictive Analytics: -forecast of what may happen in the future -using sentiment analysis, for example -utilizes wide variety of statistical/modeling/mining/machine learning techniques to study recent and historical data Prescriptive Analytics: -helps facilitate decision making -predictive model with ACTIONABLE data and a feedback system that tracks outcomes taken because of the data provided -decision trees or forecasting, for example Examples: -tomorrow's weather will be 80 degrees and rainy = PREDICTIVE -tomorrow's weather will be rainy so take an umbrella = PRESCRIPTIVE
Types of Databases
Flat Database = all data in one table Relational Database = different tables that are joined together by common fields -follow certain rules of normalization -uses multiple separate databases (Tableau seems to use one big set) ---used to minimize repetition and only need to change an entry in one place not multiple
Predictive Analytics in Tableau
Forecasting may be used to: predict sales numbers for the future or to predict player performance -in Tableau, this is the shadow after a line and it reflects the confidence interval or uncertainty in a prediction
Hadhoop and MapReduce
Hadhoop: stores all data in smaller pieces across a network (sends pieces to various computers) -makes data easier to manage -but some managers and companies still do not understand what it does -can handle unstructured data MapReduce: processes the smaller pieces of data (gives each connected computer a task)
In-class Assignments: Databases
Relational: similar info in multiple separate tables --minimizes repetition Flat: one table only, like Tableau - pivot tables with aggregated value by column How to associate tables of data in relational databases? --join multiple tables using the common columns