Business Data Warehousing - Exam 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Enterprise Resource Planning (ERP) system

A cross-functional information system that provides and organization wide coordination and integration of the key business processes and helps in planning the resources of an organization.

Data warehouse lite

A relatively straightforward implementation of a modest scope (often, for a small user group or team) in which you don't go out on any technological limbs; almost a low-tech implementation.

Entity Integrity Rule

A rule that states that no primary key attribute (or component of a primary key attribute) may be null.

SAP NetWeaver

A software stack for many of SAP ES applications. Used for customer development and integration with other application and systems.

Business Rule

A statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business.

Model

Abstraction of a real-world object or event

Snowflake Schema

An expanded version of a star schema in which dimension tables are normalized into several related tables. •In a snowflake schema, dimensions are not clearly defined. They are normalized to help reduce data redundancy and protect data integrity. It takes less space to store dimension tables, but it is a more complicated structure (multiple tables to populate and synchronize) that can be difficult to maintain.

Business Process

Collection of activities that takes one or more input and creates a desired output that is of value to organizations.

Four Basic Types of SQL Commands

Create, Read, Update, and Delete CRUD

Physical Subset

Data Extract that is a physically separate database from the data warehouse

Transaction Data

Data from executing process steps, or transactions. Numeric data, such as dates, quantities, prices, payment amounts. AKA row, fact, measure

Operational Data

Data generated by your day to day operations such as customer, inventory, and purchase data.

Time-Variant

Data in The Data Warehouse are loaded and refreshed with a specific period of time so they they may be used to study trends and changes.

Non-Updateable

Data in the data warehouse are loaded and refreshed from operational systems. but cannot be updated by the end users.

Three Types of Data Marts

Dependent Data Marts independent Data Marts Hybrid Data Marts

Master Data

Descriptive data that describes an object and/or entities that provides context for business transaction. AKA Column, Field, Attribute Ex. Customer Data, Product Data

Major Integrity Constraints

Domain Constraints Entity Constraints Entity Integrity Rule Referential Integrity Constraints

Data

Facts related to any object in consideration.

Data Management

Generation, storage, and retrieval of data

SAP HANA

High-Performance Analytic Appliance Is an application that uses IMDB technology that allows the processing of massive amounts of real-time data in a. short time. The IMCE allows HANA to process data stored in RSAM as opposed to reading from a disk.

IMCE

In-Memory Computing Engine

HANA In-Memory Computing Engine

It is a component residing within the Index Server in SAP HANA. It has got several sub-components such as Session Management, Request Processing, and Execution Control, Relational Engine, Planning Engine, Disk Storage, etc.

Referential Integrity Constraints

It is a rule that maintains consistency among the rows of two relations. The rule states that if there is a foreign key in one relation, either each foreign key value must match a primary key value in another relation or the foreign key value must be null.

Counter

It is a type of column within an analytical or calculation view, which shows the count or number of attribute columns.

Main Storage

Main Storage contains the main part of data. In Main Storage, suitable data compression Method (Dictionary Encoding, Cluster Encoding, Sparse Encoding, Run Length encoding, etc.) is applied to compress data with the purpose to save memory and speed up searches

Index Server

Main server of SAP HANA architecture which has the data storage (having the actual data) and processing engine. Queries in different languages like SQL and MDX receives in the index server and process by different components and servers within it. The transactions and authentications also manage with it.

OLTP vs OLAP

OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

OLAP

On-Line Analytic Processing

Three-Layer Client Server Architecture

People/Users Applications Information

Four-Layer Architecture

People/users Applications In Memory Computing Engine S4/HANA Information

Three Layers of Enterprise Systems

Presentation Layer Application Layer Database Layer

Classifying Data as an Asset (Groupings)

Run the business data Integrate the business data Monitor the business data

Five Major Components of SAP HANA In-Memory Computing Environment

SAP HANA In-Memory Computing Engine ERP Database SAP Business Objects BI 4 In-Memory Computing Studio/SAP HANA Studio

Integrated

The data in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization.

Entity Constraints

The entity integrity rule is designed to ensure that every relation has a primary key and that the data values for that primary key are all valid. In particular, itguarantees that every primary key attribute is non-null.

Local Attributes

The local or private attributes are used in modeling data in analytical or calculation views. Such attributes can be taken as a private attribute in an analytical view and can only use used in that view or model.

Data Mining

The practice of analyzing the big data present in data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in data warehouse with the help of algorithm of data mining.

Integrity Constraints

The relational data model includes several types of constraints, or rules limiting acceptable values and actions, whose purpose is to facilitate maintaining the accuracy and integrity of data in the database.

SAP HANA Studio Repository

The repository in SAP HANA Studio stores information related to the newly released updates. You can update the old version to the latest ones with the help of this.

Row Storage

The row storage method of storing data is similar to how data stores traditionally in disk databases. The only difference between the SAP HANA row storage and traditional row storage is that in SAP HANA, data stores in rows in the main memory and in traditional databases, data stores in rows in the disk storage.

Restricted Measure

These are the filtered measure values based on a condition put on an attribute. For instance, showing the measure values for only the gross revenue on a particular car in USA (attribute).

Statistical Server

This server checks the performance and health of the overall components of SAP HANA architecture. It collects, stores and analyses the data related to allocation, consumption, and state of system resources.

Organization Data

Used to represent the structure of an enterprise. Any data that describes an organization Describes central characteristics of organizations, internal structures, etc. Ex. Companies, Factories, Warehouses, etc

Five-Layer Architecture

Users Cloud Application In Memory Computing Engine S4/HANA Information

Simple Attributes

We can derive it from the data source

Simple Measure

We can take these measures as it is from the source table.

XS Engine

XS Engine in SAP HANA architecture facilitates communication between the external applications (Java and HTML based) and SAP HANA system via HTTP/HTTPS in a web browser.

Pre-processor Server

a text analyzing server which processes textual data. The service provided by this component is used during text search

SAP HANA

an in-memory database in SAP that is exceptionally fast

Modification Anomaly

an unexpected side effect from trying to insert, update, or delete a row. Essentially more data must be provided to accomplish an operation than would be expected.

OLTP (On-Line Transaction Processing)

characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). •The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. •In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually normalized ).

Calculated Measure

created from a combination of two measures taken from OLAP cubes, arithmetic operators, constants, functions, etc. For instance, a calculated measure would be, Profit obtained from subtracting two measures Sales price - Cost price

Dependent Data Mart

created from an existing enterprise data warehouse. It is the top-down approach that begins with storing all business data in one central location, then extracts a clearly defined portion of the data when needed for analysis. (either logical or physical subset)

Business Meta Data

detail that gives end-users a way easy to understand information stored in the data warehouse.

Technical Meta Data

information about warehouse which is used by Data warehouse designers and administrators.

Well-Structured Relation

relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. Redundancies in a table may result in errors or inconsistencies (called anomalies ) when a user attempts to update the data in the table.

Entity Relationship Model

A detail, logical representation of the data for an organization or a business area.

Logical View

A virtual table view that is logically but not physically separated from the data warehouse.

IMDB

In-Memory Database

Components of SAP HANA Architecture

Index Server Name Server Pre-processor Server Statistical Server SAP NANA Studio Repository XS Engine

Three Types of Anomalies

Insertion Anomaly Modification Anomaly Deletion Anomaly

Two Major Factors Driving Data Warehousing

Integrated View of Information Information vs Operational Systems

Calculated Attributes

It is created from existing attributes at the source. Such a full name created from first and last name

Data modeling

Iterative and progressive process of creating a specific data model for a determined problem domain

Three Different Engines in SAP HANA

Join Engine OLAP Engine Calculation Engine

Integrated Information Systems for BP

Systems in which functional areas share data. Better integration leads to improvements in communication, workflow, and success of the company. Thus integrated data leads to more efficient business processes.

SAP XI/PI

The Exchange Infrastructure: the SAP platform for process integration based on the exchange of XML messages. Enables you to connect systems from different vendors, languages, and versions.

Subject-Oriented

A data warehouse is organized around the key subjects of the enterprise, Ex. Customer, Product, Time, Organizational Unit Location, etc.

Data warehouse supreme

A data warehouse that has large-scale data distribution and advanced technologies that can integrate various " run the business" systems, improving the overall quality of the data assets across business information analytical needs and transactional needs.

Relational Data Structure

A relation is a named, two-dimensional table of data. Each relation or table consists of a set of named columns and an arbitrary number of unnamed rows

Functional Dependency

A relationship between attributes such that the values in the first attribute always determine the values in the second attribute. I.e. ProductID -> Supplier

Star Schema

A simple database design in which dimensional data are separated from fact or event data. A dimensional model is another name for a star schema. •The benefit of a star schema is that fewer joins are needed when writing queries, as there is no dependency between dimensions. This simplifies the ETL request process making it easier for analysts to access and navigate.

Independent Data Mart

A stand-alone system created without the use of a data warehouse. Focuses on one subject area or business function. Data is extracted from internal or external sources (or both), processed, and then loaded to the data mart where it is stored until needed for business analytics Benefits: not difficult to design/develop Disadv: may become cumbersome to manage, each with its own ETL and logic

Data warehouse deluxe

A standard data warehouse implementation that uses advanced technologies to solve complex business information and analytical needs across a broader user population.

Data Mart

A structure or access pattern specific to data warehouse environments, used to retrieve client specific data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or functional area.

Data Warehouse

A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes and Business Intelligence (BI)

Domain Constraints

All of the values that appear in a column of a relation must be from the same domain. A domain is the set of values that may be assigned to an attribute. A domain definition usually consists of the following components: domain name, meaning, data type, size (or length), and allowable values or allowable range, if applicable.

Internal Schema

An internal, Physical Level, schema today really consists of two separate schemas: a logical schema and a physical schema. The logical schema is the representation of data for a type of data management technology (e.g., relational). The physical schema describes how data are to be represented and stored in secondary storage using a particular DBMS (e.g., Oracle or MS SQL).

Importance of Data Models

Are a communication tool, Give an overall view of the database, Organize data for various users, Are an abstraction for the creation of good database

Four Types of Modeling Objects

Attribute View Analytic View Calculation View Decision Table

Benefits of ERP

Better Insights Higher Productivity Reduce Risk Consistent Infrastructure Improves Efficiency Sychronizes Reporting and Automation

Advantages of DBMS

Better data integration and less data inconsistency. Data Quality Data sharing Data security

Column Storage

Column storage is one of the many reasons which make SAP HANA unique. A column storage method stores data in a columnar fashion (linear). It improves SAP HANA's performance by optimizing both read and write operations on data. Data stores in the column storage area which divides into two sections; Main Storage and Delta Storage.

Hybrid Data Mart

Combines data from an existing data warehouse and other operational source systems. Unites the speed and end-user focus of a top-down approach with the benefits of the enterprise-level integration of the Botton-up method. Useful for ad hoc integration

Concepts to Describe Relational Model

Data Structure (tables) Data Manipulation (SQL) Data Integrity

Five Components of a Data Warehouse

Data Warehouse database Sourcing Acquisition Clean-up Transformational Tools (ETL)

Metadata

Data about data ex. names of fields, data types, etc

DBMS

Database Management System A system software component that manages or controls one or more databases.

Advantages of a Data Mart

Efficient Access Inexpensive Data Warehouse Alternative Improve Data Warehouse Performance Data Maintenance Simple Setup Analytics Easy Entry

EDW

Enterprise Data Warehouse

Analytic View

In the SAP HANA analytic view, you can create star schemas, having a central transaction or fact table and multiple dimension tables linked to it via primary keys. We can join or group the measures, which contain in only the central transaction table with multiple dimension tables as per the logic's requirement. An analytic view is the next layer or level of a model after the attribute view.

Two Forms of Database Specifications

Logical Specification Physical Specification

Name Server

Maintains the information on the topology or landscape of the SAP HANA system environment. It contains information on the name and location of the components.

Different Data Types

Master data Transactional Data Operational Data Organizational Data Metadata

OLTP

On-Line Transactional Processing

ODS

Operational Data store - central database that provides a snapshot of the latest data from multiple transactional systems for operational reporting.

Calculation View

SAP HANA Calculation view is a more complex model using both attribute and analytic view elements in it to make a complete business logic. Thus, it is the next level of SAP HANA modeling for more advanced transformations and logic. In order to perform complex operations such as combining two data sets into one, we need to use the calculation view. As we inform about analytic view can measure only a transaction table. So, if we have transaction data from two tables, say, Sales and Finance, and we wish to combine both in one data set, we can only do it in the calculation view

Three Types of Attributes in SAP HANA Modeling

Simple Attributes Calculated Attributes Local Attributes

Types of Measures

Simple Measure Calculated Measure Restricted Measure Counter

Data models

Simple representations of complex real-world data structures Useful for supporting a specific problem domain

Granular Data

Sometime refers as 'Mart'. the lowest level of data in the target set, in the data warehouse serves as the single point of reference for all dependent data marts that are created.

Decision Table

We can use Decision table for business planning and decision-making purposes. With the help of these tables, we can analyze business scenarios and situations. It answers a lot of if and else conditional queries, which helps in decision making by the enterprise. We can use to carry out such an if-else situational analysis using a graphical interface.

OLAP (On-Line Analytical Processing)

characterized by relatively low volume of transactions. •Queries are often very complex and involve aggregations. •For OLAP systems a response time is an effectiveness measure. •OLAP applications are widely used by Data Mining techniques. •In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema which we will discuss later).

Logical Specifications map the conceptual requirements into the data model associated with a specific database management system.

map the conceptual requirements into the data model associated with a specific database management system.

Insert Anomaly

occurs when certain attributes cannot be inserted into the database without the presence of other attributes.

Deletion Anomaly

occurs when you delete a record that may contain attributes that shouldn't be deleted.

Conceptual Schema

• conceptual schema describes the database structure of the whole database for the community of users. This schema hides information about the physical storage structures and focuses on describing data types, entities, relationships, etc. This Logical Level comes between the user level and physical storage view. However, there is only single conceptual view of a single database.

Data Marts

•A data mart is an access layer which is used to get data out to the users. •It is presented as an option for large size data warehouse as it takes less time and money to build.

Components of Enterprise Resource Planning (ERP):

•FI/CO: Financial Accounting/Controlling - all about Tracking. (Also, known as the heart of ERP) •MM: Material Management - all about Buying. •SD: Sales and Distribution - all about Selling. •PP: Production Planning - all about Making. •HR/HCM: Human Resource/Human Capital Management - all about People.

Role of DBMS

•Intermediary between the user and the database. •Enables data to be shared. •Presents the end user with an integrated view of the data. •Receives and translates application requests into operations required to fulfill the requests. •Hides database's internal complexity from the application programs and users.

Attribute View

•SAP HANA Attribute view uses attributes, or the non-measurable, non-quantifiable data to design views and models according to the business logic. The data in attribute view also call as 'Master data' and we can use it as a piece of reference information in other views (analytic and calculation). The tables in attribute view do not contain any facts or measure. •That is entries like sales amount, transaction amount, quantity sold, profit incurred etc. We can analyze all these measures by different characteristics such as year, quarter, month, vendors, sales departments, product groups etc. These characteristics also call as dimensions.

Components of Enterprise Systems (ES of Business Suite)

•SRM: Supplier Relationship Management - all about Sourcing. •SCM: Supply Chain Management - all about Planning. •CRM: Customer Relationship Management - all about creating Demand. •PLM: Product Lifecycle Management - all about Idea. •ERP: Is the heart of Enterprise Systems

Characteristics of a DBMS

•Simultaneous access by many users and many applications. •Direct access to data with a data interface. •Uniform and consistent access. •Integration and distribution of data across multiple servers.

OLAP Engine

•The OLAP engine is for the creation of analytical views. It is used only when attributes are used in the analytical view and no calculation is done. If operations like calculations, counters, restricted measures are used then other engines like a calculation engine is used along with the OLAP engine.

Calculation Engine

•The calculation engine is for calculation along with using in other views like analytical and attribute view whenever they apply calculation. The calculation engine might also need other engines to work with it such as OLAP and join engine.

Join Engine

•The join engine is used every time when an attribute view is created or used, or a join condition is applied in a native SQL command. If a calculation applies in an attribute view, the join engine along with calculation engine works to create that attribute view.

External Schema

•describes the part of the database which specific user is interested in. It hides the unrelated details of the database from the user. There may be "n" number of external views for each database. Also, known as View Level i.e. employeesaccess to the database.

Physical Specifications

•indicate all the parameters for data storage that are then used as input for database implementation. During this phase, a database is actually defined using a data definition language.

Delta Storage

•used for a write operation and uses basic compression. All uncommitted modification in Column table data stored in delta storage. •The purpose of delta merge operation is to move changes, which is collected in delta storage to main storage. After performing Delta Merge operation on sap column table, the content of main storage is saved to disk and compression recalculated


Kaugnay na mga set ng pag-aaral

4 Scrum Events for Inspection and Adaptation

View Set

Chapter 37: Immunodeficiency, HIV, & AIDS

View Set

Chapter 41: Fluid, Electrolyte, and Acid-Base Balance

View Set

possible multiple choice questions

View Set