data science exam 3

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Structured vs. unstructured data

.structured data are recorded as well-defined fields that correspond to distinct variables, whereas unstructured data, such as natural language writings, consist of a mishmash of semantic entities that can differ from an observation to another and it may not even be clear what constitutes a separate observation in unstructured data

Relational database

1. A collection of tables that store data for different types of entities 2. Tables are made of rows(records) and columns(variables) 3. Fields are made of characters that can represent different types of data (data types)

MapReduce

1. processes the smaller pieces of data 2. Breaks up the analytical task and gives each connected computer a small piece to work on 3. Analyzes which programs people are mostly likely to pause and then skip commercials

Hadoop

1. stores all of the data in smaller pieces across a network 2. Stores big databases in smaller pieces across a network of connected computers 3. Stores real-time cable box activity for 5,000,000 customers, by region

big data technologies

Big Data technologies are a way of processing large amounts of data, not extracting insights from them

types of database- relational database

Data from different tables are joined by common fields in real-time

Sentiment analysis

Sentiment analysis categorizes whether a statement it positive or negative, and assigns it a score accordingly The sentiment score is structured data The simplest way to do this is to use a word library that has scores for popular words/phrases Sentiment analysis is a form of predictive analytics; it predicts a sentiment score that a human evaluator would likely give to the statement. IT CAN BE GOOD TO monitor performance and better customer service.

What is a word frequency analysis? What insight can it give you about a block of text?

measures the most commonly used words in a text and it helps because you can see whether something is mainly positive or negative for example.

What factors influence the confidence interval of a trend line

size of samples, variability, confidence level- higher level will tend to make a more confident trend.

Descriptive analytics = Seeing what is there in the data

summarize and understand past data, e.g. detect fraudulent behavior

How do hierarchies of dimensions work in Tableau? What are they used for?

you can combine two fields and compare data like that EG you can put school and players in a field or you can combine them into one. It can be used to list what field you want to come first.

How do the tables of data become associated in a relational database?

When they have similar columns such as one table has cars, trucks, SUVs, and sedans and another table has Hatchbacks, trucks, sports cars, and motorcycles, they both have trucks therefore they apply to this term.

Association mining

(look for things that occur at the same time) EG if two products appear frequently in the same order

pivot table

A pivot table is a table of statistics that summarizes the data of a more extensive table (such as from a database, spreadsheet, or business intelligence program). This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way. E.g. individual expenses items aggregated at the level of individual employees months or employees/months, etc. all related data must be in the same column arranged by columns, filters, values, rows, etc.

analytics

Extracting information from data and discovering meaningful patterns

How does sentiment analysis software determine a positive/negative score for a block of text?

It finds key words such as "fantastic", "worst", "horrible", "terrific"

What is the difference between relational databases and data formatted for Pivot Table analysis?

Pivot uses sums, averages and other things that group data together in a meaningful way. Relational is just data from one table being applied to another EG. "The columns (or fields) for the customer table might be Customer ID, Company Name, Company Address, etc.; the columns for a transaction table might be Transaction Date, Customer ID, Transaction Amount, Payment Method, etc. The tables can be related based on the common Customer ID field." - IBM.com

types of database- flat file database

all data in one table

Prescriptive analytics = Predicting what will be the values for data fields

facilitate decision-making directly, e.g. whether or not to grant a car loan to a consumer

Predictive analytics = Predicting what will be the values for data field

forecast what may happen in the future, e.g. weather forecasting

Pivot table allows to answers question such as

how many people work in marketing? How much did Abigail expense in January through March? Who expensed the most in March? Which month had the greatest amount expensed?

People analytics

predicting the performance of job candidate

forecasting

predicting weather where a storm will hit stocks, attendees


Set pelajaran terkait

Ch 19 INSURANCE OF SUBSTANDARD RISKS

View Set

Understanding Individual Behavior

View Set

Business Ethics and Society Final Review

View Set

Chapter 43 Assessment and Management of Patients with Hepatic Disorders

View Set

Ch. 18: Performance and Discharge

View Set