Chapter 6: Business Intelligence Big Data and Analytics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Benefits achieved from BI and Analytics

- detect fraud - improve forecasting - increase sales - optimize operations - reduce costs

Variety (Big Data)

-Structured data: format is known in advance. -Unstructured data: most of the deal in the organization. --Not organized in any predefined manner.

Four categories of NoSQL databases

1. Key-value NoSQL databases -Two columns ("key" and "value") 2. Document NoSQL databases -store, retrieve, and manage document orient information 3. Graph NoSQL databases -well-suited for analyzing interconnections 4. Column NoSQL databases -store data in columns

Conversion funnel

A graphical representation that summarizes the steps a consumer takes in making the decision to buy your product and become a customer.

Data Warehouse

A logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks

Scenario analysis

A process for predicting future values based on certain potential events.

Cross-Industry Process for Data Mining (CRISP-DM)

A six-phase structured approach for the planning and execution of a data mining project -Business understanding, data understanding, data preparation, modeling, evaluation, deployment

Linear programming

A technique for finding the optimum value (largest or smallest, depending on the problem) of a linear expression (called the objective function) that is calculated based on the value of a set of decision variables that are subject to a set of constraints.

Time series analysis

A type of forecast in which data relating to past demand are used to predict future demand.

Word cloud

A visual depiction of a set of words that have been grouped together because of the frequency of their occurrence.

NoSQL database

A way to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases. -have the capability to spread data over multiple servers so that each server contains only a subset of the total data

An organization's collection of useful data

Archives, Documents, Data from business apps, social media, sensor data, media, machine log data and public data

Online Transaction Processing (OLTP)

Capturing of transaction and event information using technology to process, store, and update -Do not support data analysis required today -Data warehouses and data marts allow organizations to access OLTP data and support decision making more effectively

Technologies used to manage and process big data

Data warehouses, extract transform load process, data marts, data lakes, NoSQL databases, Hadoop, In-Memory databases

Extract Transform Load (ETL) process

Extracts data from a variety of sources, edits and transforms data into a data warehouse format, loads data into the warehouse

MapReduce program

a composite program that consists of a Map procedure that performs filtering and sorting and a Reduce method that performs a summary operation.

Hadoop Distributed File System (HDFS)

a system used for data storage that divides the data into subsets and distributes the subsets onto different servers for processing.

Genetic algorithm

a technique that employs a natural selection-like process to find approximate solutions to optimization and search problems -typically implemented as a computer simulation

Hadoop

an open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets. -Includes, HDFS and MapReduce program -Limitation: can only perform batch processing

Big Data

describe data collections that are so enormous and complex that traditional data management software, hardware, and analytics processes are incapable of dealing with them.

Data scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

Self-Service Analytics

includes training, techniques, and processes that empower end users to perform their own analyses using an endorsed set of tools.

Regression Analysis

involves determining the relationship between a dependent variable and one or more independent variables. Ex. Pharmaceutical company uses this to predict drug shelf life to meet FDA regulations and identify a suitable expiration data for the drug.

Descriptive analysis

preliminary data processing stage used to identify patterns in the data and answer questions about who, what, where, when, and to what extent. -identifies data patterns -Includes visual analytics and regression analysis

Visual analytics

presentation of data pictorially or graphically. -Word cloud and Conversion funnel

Text analysis

process for extracting value from large quantities of unstructured text data such as consumer comments, social media postings, and customer reviews.

Video analysis

process of obtaining information or insights from video footage.

Monte Carlo Simulation

simulation that enables you to see a spectrum of thousands of possible outcomes, considering not only the many variables involved, but also the range of potential values for each of those variables.

In-memory database (IMDB)

stores the entire database in random access memory (RAM) -faster access to data --Rates much faster than storing data on secondary storage

Data Mart

subset of a data warehouse in which only a focused portion of the data warehouse information is kept -used by small and medium-sized businesses and departments within large companies

Data Lake

takes a "store everything" approach to big data -saves all data in its raw and unaltered form

Predictive analytics

technics to analyze current data. Identifies future probabilities and tends and makes predictions about the future.

Data mining

the process of analyzing data to extract information not offered by the raw data alone -explores large amounts of data for hidden patterns -Association analysis, neural computing, and case-based reasoning.

Veracity (Big Data)

the uncertainty of data, including biases, noise, and abnormalities

Volume (Big Data)

volume of data that exists in the digital universe is about 16.1 zettabyte. -one zettabyte= one trillion gigabytes

Key characteristics of Big Data

volume, velocity, value, variety, and veracity

Business intelligence (BI)

wide range of applications, practices, and technologies -extracts, transforms, integrates, visualizes, analyzes, interprets, and presents data


Kaugnay na mga set ng pag-aaral

SEJPME II - Introduction to Joint Fundamentals

View Set

Human Resource Management- Chapter 6

View Set

Linux Chapter 5 - Linux Filesystem Administration

View Set

Ch. 12 Southeast Asians and Pacific Islanders

View Set

7th Grade World History| 11.01: Romantic Art in an Age of Revolution

View Set