Module 06 - Business Intelligence

Ace your homework & exams now with Quizwiz!

who are some of the providers of IMDB?

- Altibase - Oracle - SAP - Software AG

What is Hadoop?

-Open-source software framework -written in Java -For distributed storage and processing of very large data sets on computer clusters from commodity hardware -good for very Large Companies -Only Well-qualified data scientists can use it

Portals that provide free sources of big data

1. Amazon Web Services(AWS) public data sets 2. Bureau of Labor Statistics(BLS) 3. CIA World Factbook 4. Google Finance 5. Healthdata.gov

three most common data mining techniques

1. association analysis 2. neural computing 3. case-based reasoning

what are the goals for the phases of CRSIP-DM?

1. business understanding - clarify goals for data mining project, convert goals for predictive analysis, and design project plan. 2. data understanding - collect data to be used, get familiar with the data, and identify any data that must be addressed. 3. data preparation - select subset of data to use, clean data, and transform data for analysis. 4. modeling - apply selected modelling techniques. 5. evaluation - determine if model achieves the goals. 6. deployment - deploy the model into the organization's decision-making process.

Five categories for BI and analytics tools

1. descriptive analysis 2. predictive analysis 3. optimization 4. simulation 5. text and video analysis

Examples of benefits from BI and analytics?

1. detect fraud 2. improve forecasting 3. increase sales 4. optomize operations 5. reduce costs

What are some sources of useful data for an organization?

1. documents 2. archives 3. social media 4. sensor data 5. machine log data 6. public data 7. data from business apps

Key components of BI and analytics

1. existence of a solid data management program; data governance 2. organization need creative data scientists 3. management team must have strong commitment to data-driven decision making

Characteristics of a data warehouse

1. large 2. multiple sources 3. historical 4. cross organizational access and analysis 5. supports various types of analyses and reporting

what two techniques are involved in predictive analytics?

1. time series analysis 2. data mining

Five Key Characteristics of Big Data

1. volume 2. velocity 3. value 4. variety 5. veracity

Conversion Funnel

A graphical representation that summarizes the steps a consumer takes in making the decision to buy your product and become a customer.

Cross-Industry Process for Data Mining (CRISP-DM)

A six-phase structured approach for the planning and execution of a data mining project

What is a NoSQL database?

A term used to describe high-performance, non-relational databases. NoSQL databases use a variety of data models, including document, graph, key/value, and columnar.

Basel II Accord

Create international standards that strengthen global capital and liquidity rules with the goal of promoting a more resilient banking sector

How did the KDDI Corporation use IMDB and how efficient it was?

KDDI obtained 40 existing servers and combined them into a single Oracle SuperCluster running the Oracle Times Ten in IMDB. it reduce their data center footprint by 83% and their power consumption by 70%.

Categories of NoSQL Databases

Key-value store • Relies on a hash table to locate and represent data Column family store • Large data stores can reference multiple columns with a single key Document database • Similar to key-value stores • Contains documents that are in collections of other key-value collections Graph database • Instead of a spreadsheet, use nodes, node properties, and the relationship between the nodes

A __________________ is composed of a Map procedure that performs filtering and sorting and a Reduce method that performs a summary operation.

MapReduce program

What does regression analysis do?

Regression analysis can be used to predict the value of one variable when the value of one or more other variables is already known. Example: can we predict the length of a hospital stay of a patient with a certain diagnosis?

self-service analytics

Training, techniques, and processes that empower end users to work independently to access data from approved sources to perform their own analyses using an endorsed set of tools.

What is big data?

Very large data sets that are incapable of dealing with by traditional data management software, hardware, and analysis processes

What is a Monte Carlo simulation?

Will randomly generate the behavior of various asset classes to obtain the range of possible outcomes for a portfolio. Assumes you need x amount and a certain time (ex. retirement) and determines the best portfolio for that goal. Similar to exhausting the principal.

Business Intelligence (BI)

a broad category of applications, technologies, and processes for gathering, storing, accessing, and analyzing data to help business users make better decisions

What is a data mart?

a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business

Hadoop has two components: ___________________, and _______________________.

a data process java-based system called MapReduce; a distributed file system called Hadoop Distributed File System(HDFS)

What is a data warehouse?

a large store of data accumulated from a wide range of sources within a company and used to guide management decisions.

What is a word cloud?

a visual representation of the frequency with which certain words are used in a qualitative assessment; larger words indicate higher frequency of use

what is genetic algorithm?

an optimization technique that employs a natural selection like process to find approximate solutions and search problems.

___________ can be defined as extensive use of data and quantitive analysis to support fact-based decision within organizations.

analytics

A ___________ takes a 'store everything' approach to big data, saving all the data in its raw and unaltered form.

data lake

Global Data Protection Regulation (GDPR)

data privacy requirements that applies to E.U. members as well as non-E.U. members that market or process information of individuals in the E.U.

What is an in-memory database?

database management system that stores the entire database in RAM

Bank Secrecy Act

establishes the US treasury as the lead agency for developing anti-money laundering programs

The process of obtaining quality data from a data warehouse

extract --> transform --> load

what does a data scientist do?

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

NoSQL Databases have ______________, which allow for spreading data over multiple servers so each server contains only a subset of the total data.

horizontal scaling capability

what is the advantage of using in-memory database(IMDB)?

it has the advantage of providing access to data at rates much faster than the traditional means of storing it in secondary forms of storage, such as hard drives.

what does each characteristic of a warehouse represent?

large - holds billions of records multiple sources - data comes from many sources, internal and external; extract, transform, load process is required for qulity data historical - stores data of 5 years or more corss organizational and access and analysis - data used by users across the organization

what is text analysis?

process for extracting value from large quantities of unstructured data

what is video analysis?

process of obtaining information from video footage

__________________ is a simulation technique, and a process, that is used for predicting future values based on certain potential events.

scenario analysis

What is data mining?

the process of analyzing data to extract information not offered by the raw data alone

linear programming

the process of finding the maximum or minimum values of a function for a region defined by inequalities

What is the goal of Business Intelligence(BI)?

to get the most value out of information and present the results of analysis in easy layman terms

what does descriptive analysis cover?

to identify patterns in the data and answer questions such as about who, what, when , where, and to what extent.

What is time series analysis?

use of statistical methods to analyze time series data ad extract statistics and characteristics about the data.


Related study sets

AP European History 5x5: 1450-1600

View Set

MKG 310 Consumer Behavior Exam 2

View Set

Exam 3: Spinal Cord Injury NCLEX Questions

View Set