CIS 330 Chapter 9, 10, 11 (Final Exam)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

In general, certain trends in organizations encourage the need for data warehousing; these trends include the following:

-No single system of record -Multiple systems are not synchronized -Organizations want to analyze the activities in a balanced way -Customer relationship management -Supplier relationship management

What are the characteristics of quality data?

-Uniqueness -Accuracy -Consistency -Completeness -Timeliness -Currency (is the degree to which data are recent enough to be useful) -Conformance (refers to whether data are stored, exchanged, or presented in a format that is as specified by their metadata) -Referential Integrity (data that refer to other data need to be unique and satisfy requirements to exist)

What are the characteristics of Data After ETL?

1) Detailed 2) Historical 3) Normalized (data is fully normalized) 4) Comprehensive (data reflects an enterprise-wide perspective) 5) Timely 6) Quality controlled

What are the two stages of which data reconciliation occurs?

1) During an initial load, when the EDW is first created 2) During subsequent updates (normally performed on a periodic basis) to keep the EDW current and/or to expand it

What are the goals of data mining?

1) Explanatory 2) Confirmatory 3) Exploratory

What are the four main types of NoSQL?

1) Key-value stores 2) Document stores 3) Wide-column stores 4) Graph databases

The need to separate operational and informational systems is based on three primary factors:

1. A data warehouse centralizes data that are scattered throughout disparate operational systems and makes them readily available for decision support applications 2. A properly designed data warehouse adds value to data by improving their quality and consistency 3. A separate data warehouse eliminates much of the contention for resources that results when informational applications are confounded with operational processing

A subject-oriented, integrated, time-variant, nonupdateable collection of data used in support of management decision-making processes

Data Warehouse

knowledge discovery using a sophisticated blend of techniques from traditional statistics, artificial intelligence, and computer graphics

Data mining

a process of using pattern recognition and other artificial intelligence techniques to upgrade the quality of raw data before transforming and moving the data to the warehouse. Also called data cleansing

Data scrubbing

the component of data reconciliation that converts data from the format of the source operational systems to the format of the enterprise data warehouse

Data transformation

Is the process whereby organizations create and maintain data warehouses and extract meaning from and help inform decision making through the use of data in the data warehouses

Data warehousing

Is the process of data warehousing requires extracting data from existing operational systems, cleansing and transforming data for decision making, and loading them into a data warehouse

Extract-Transform-Load (ETL)

is a file system designed for managing a large number of potentially very large files in a highly distributed environment

HDFS of Hadoop Distributed File System

An open source implemenation framework of MapReduce

Hadoop

a method of capturing only the changes that have occured in the source data since the last capture

Incremental extract

The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organziation. This means that the data warehouse holds the one version of "the truth"

Integrated

The question "What will happen?" refers to what?

Predicitvie analytics

The question "How can we make it happen?" refers to what?

Prescriptive Analytics

Data in the data warehouse contain a time dimension so that they may be used to study trends and changes

Time-variant

the process of partitioning data according to predefined criteria

selection

What are some ways to consolidate data?

-Application Integration -Business Process Integration -User Interaction Integration

Hortonworks specifies three characteristics of a data lake, what are they?

-Collect everything -Dive in anywhere -Flexible access

Data quality is important to:

-Minimize IT project risk -Make timely business decisions -Ensure regulatory compliance -Expand the customer base

system that allows managers to measure,monitor, and manage key activities and processes to achieve organizational goals

Business Performance Management and Dashboards

a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and usefull information

Business intelligence

-No single system of record -Multiple systems are not sychronized -Organizations want to analyze the activities in a balanced way -Customer relationship management -Supplier relationship management

Certain trends in organizations encourage the need for data warehousing; these trends include the following:

technique that indicates which data have changed since the last data integration activity

Changed data capture (CDC)

an executive-level position accountable for all data-related activities in the enterprise

Chief data officer (CDO)

a technique for data integration that provides a virtual view of integrated data without actually creating one centralized database

Data federation

a large integrated repositroy for internal and external data that does not followw a predefined schema

Data lake

The question "What happened?" refers to what?

Descriptive analytics

a centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

Enterprise data warehouse (EDW)

an algorithm for massive parallel processing of various types of computing tasks

MapReduce

disciplines, technologies, and methods used to ensure the currency, meaning, and quality of reference data within and across various subject areas

Master data management (MDM)

OLAP tools that load data into an intermediate structure, usually a three- or higher-dimensional array

Multidimensional OLAP (MOLAP)

a category of recently introduced data storage and retrieval technologies that are not based on the relational model

NoSQL

Data in the data warehouse are loaded and refreshed from operational systems but cannot be updated by end users

Nonupdataeable

the use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques

Online analytical processing (OLAP)

an approach to filling a data warehouse that involves bulk rewriting of the target data at periodic intervals

Refresh mode

OLAP tools that view the database as a traditional relational database in either a star schema or other normalized or denormalized set of tables

Relational OLAP (ROLAP)

Schoenborn identified four specific infrastructure capabilities that are required for big data and advanced analytics. What are they?

Scalability (ability to add capacity) Parallelism (being able to do multiple things at the same time) Low latency (high speed in various processing) Data optimization (skills needed to desing optimal storage and processing structures)

a method of capturing a snapshot of the required source data at a point in time

Static extract

A data warehouse is organized around the key subjects (or high-level entities) of the entreprise

Subject-oriented

the process of discovering meaningful information algorithmically based on computational analysis of unstructured textual information

Text mining

1) a business requires an integrated, company-wide view of high-quality information 2) the information systems department must separate informational from operational systems to improve performance dramatically in managing company data

Two major factors drive the need for data warehousing in most organizations today:

an approach to filling a data warehouse in which only changes in the source data are written to the data warehouse

Update mode

the process of transforming data from a detailed level to a summary level

aggregation

high-level organizational groups and processes that oversee data stewardship across the organization. It usually guides data quality intiatives, data architecture, data integration and master data management, data warehousing and business intelligence, and other data-related matters

data governance

a data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systems

data mart

a person assigned the responsibility of ensuring that organizationa applications properly support the organization's enterprise goals for data quality

data steward

a data mart filled exclusively from an enterprise data warehouse and its reconciled data

dependent data mart

What is the oldest form of analytics?

descriptive analytics

describes the past status of the domain of interest using a variety of tools through techniques such as reporting, data visualization, dashboards, and scorecards

descriptive analytics

capturing the relevant data from the source files and databases used to fill the EDW

extracting

a data mart filled with data extracted from the operational environment, without the benefit of a data warehouse

independent data mart

a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications

informational system

the process of combining data from various sources into a single table or view

joining

Converts data from one or more source fields to one or more target fields

multifield

a system that is used to run a business in real time, based on current data. Also called a system of record.

operational system

applies statistical and computational methods and models to data regarding past and current events to predict what might happen in the future

predicitve analytics

uses results of predicitive analytics together with optimization and simulation tools to recommend actions that will lead to a desired outcome

prescriptive analytics

Converts data from a single source field to a single target field

single-field


Ensembles d'études connexes

W1SQ: The Declaration of Independence

View Set

Microbiology Chapter 3 HW, Microbiology Chapter 4 Part 2 HW, Microbiology Chapter 4 Part 1 HW

View Set

Exam 3 Study Guide (Ch. 9, 13, 14, 16)

View Set

Anatomy- Chapter 7 Quizlet (actual test questions)

View Set

U.S. History chapter 4-9. Unit quiz 2

View Set

Diffusion, Osmosis and Active Transport

View Set

Health Chapter 1 : Field Underwriting Procedures

View Set