ITP 487 Exam 1
What is a key?
A combination of one or more columns that is used to identify rows in a relation
What are the shortcomings of Star Schmas?
- Duplication of dimensional data - No support for multiple languages - No historization (changing dimensional data) - Hierarchies can cause anomalies
What are the advantages of the Snowflake Schema?
- Reduces data redundancy - Supports multiple languages, currencies, and units of measure - Supports historization - Supports cube administration
How do you Snowflake?
- Extracting attributes from dimension tables and placing them in to new tables (master data)
What is data staging and what does it ensure?
- It is the process whereby data are organized and prepared for analysis.
What are the five main InfoObject Types?
- Key figures (revenue, sales amount, net value, etc.) - Characteristics (Customer, region, zip code, etc.) - Units (Unit of measure, currency) - Time (Calender year, fiscal year, month, day, etc.) - Some technical stuff (DataPackageID, etc.)
In a Star Schema, what are the two measures of data, and what type of data is it?
- Measures and Key Figures - Transactional data
What are the major improvements of the Snowflake over the Star Schema (4 main ways)?
- Reduces data redundancy - Supports multiple languages, currencies & units of measure - Supports historization - Supports cube administration
Why is SAP HANA so efficient?
- SAP's is an in-memory database - Columnar data storage - Data compression (avoids redundancy through dictionary compression, prefix encoding, etc.)
Advantages of Star Schema
- The star-like layout is easy to understand and implement - The one-to-many relationship between dimension tables and the fact table is straightforward. - Has only a single level of JOIN for any query. The fact table is joined to one or more dimension tables, so the JOINs are easy to build.
What are some reasons for the recent explosion of data?
- There are more users of technology - There is far more usere-generated content - Mobility (i.e. how the culture at golf tournaments is changing because of mobile phones)
What are the primary dimension tables in a Star Schema?
- Time dimension - Customer dimension - Product dimension - Sales Org dimension - Region dimension
What are the primary dimension tables in a Star Schema?
- Time dimesnion - Customer dimension - Product dimension - Sales Org dimension - Region dimension
What is data harmonizing?
- When data from multiple sources need to be made consistent
In a star schema, what is the fact table?
A fact table contains facts about the business; it contains a list of historical transactions with data bout key figures and measures
What is a primary key?
A candidate key selected as the primary means of identifying rows in a relation. It never changes.
Define Data Warehouse
A data warehouse is a process and architecture that requires robust planning to implement a platform, which consists of transformation, consolidation, cleansing and mapping of data from multiple operational data sources to a target DBMS that supports decision-making.
What is a Dimension inside of SAP BW?
A group of characteristics which belong to the same business object (semantically)
What is a composite key?
A key that consists of two or more columns
What is a foreign key?
A primary key of one relation that is placed in another relation to form a link between the relations
What is a surrogate key?
A primary key which the database creates
In SAP BW what does Process Chain mean?
A sequence of processes that get triggered in a specific order
What is a star schema and what is its purpose?
A simple relational model that is easy to understand and that represents business data in a lucid manner
What is the Referential Integrity Constraint (RIC)?
A statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation
What is a relation?
A two-dimensional table
Data mapping
Allows for connections between two systems. This connection allows for data initially captured for one purpose to be translated and used for another purpose. One system in a map is identified as the source while the other is the target.
What is a characteristic?
Each attribute in a dimension table is called a characteristic (i.e. In the Customer table, the customer's name, address, etc. are characteristics)
What is a canidate key?
Any key that could possibly be a primary key
What are the common Data Load Errors?
Common cause is mismatched data (i.e. text field in the source and numeric field in the target)
Define decisions
Course of action, monitoring, and correction
What does DBMS stand for?
Data Base Management System
What is the flow from data to decision?
Data -> Information -> Knowledge -> Wisdom -> Decision
What does DTP stand for? What does DTP do?
Data Transfer Process. It triggers data transfer between a source and its target.
What data do rows contain?
Data about entities
What data do columns contain?
Data about the attributes of the entry
Define data
Data can be defined as raw figures
What type of data do OLTP data systems use and who are the primary users of this data?
Data which is a current representation of the state of the business. The primary users of this data are Clerks, salespersons, and administrators.
Data which is most used in a DB can be considered what? (and vice versa)
Data which is used constantly is considered "Hot Data". Data which is rarely used in considered "Cold Data"
What does ETL stand for?
Extract, Transform, Load
What are two examples of InfoProviders?
I. aDSO's II. Composite Providers
What can a Composite Provider do?
I can merge data from different BW InfoProviders and / or HANA views.
What are the transformation rules when data cleansing?
I. Algebraic rules (i.e. converging weights or currencies) II. String rules (i.e. combine two names into one) III. Data and time rules (i.e. change date and time formats) IV. Programmatic or algorithm-based rules (i.e. use transformation programming languages)
What are the "four rules" of relations?
I. All entries in a column have to be of the same kind II. Each column has to have a unique name III. No two rows ma be identical IV. Cells of a table must hold a single value
What are the steps to building a SAP BW/4HANA Snowflake Schema?
I. Build fact table by creating key figures II. Build "dimensions" (folders in BW/4HANA) III. Build characteristics (attributes of dimensions) IV. Further the snowflake characteristics by establishing their text, attributes, hierarchies, and geo information
What are the five essential components of data warehosuing?
I. Data source II. Data conversion and extraction III. Data warehouse DBMS IV. Data warehouse administration V. BI tools (BI - Business Intelligence)
When do you Snowflake (2 Conditions)?
I. Dimension table consists of two or more sets of attributes which define information at different grains (detail) II. Sets of attributes of the same dimension table are being populated by different source systems (very common in data warehouses)
What are the five drivers of Business Intelligence?
I. Globalization II. Decentralization III. Smaller Margins IV. Faster reaction to the market V. Increasing importance of service
Why is data warehousing needed?
I. Integrated, company-wide view of high-quality information (from heterogeneous databases) II. Separation of operational and informational systems and data (for improved performance)
What are the two types of data that are stored in the sales fact table?
I. Measures II. Key figures
If a data load error occurs, what are the three ways that data transfer system can address these errors?
I. No update, no reporting II. Update valid records, no reporting III. Update valid records, reporting possible
When dealing with InfoProviders in BW on AnyDB, what are the two types of data, and some examples.
I. Physical data - Infocubes - Data Store II. Virtual data - MultiProviders: to create views - Virtual Providers
What two things do InfoProviders do in BW/4HANA
I. Physical data: aDSOs; can optimize data for fast multi-dimensional analysis or fast update II. Virtual data: Composite Providers
What are the Seven data mining techniques which were talked about in class?
I. Regression II. Decision Trees III. Neural Networks IV. Association Analysis V. Clustering VI. Classification VII. Genetic Algorithms
What are two more ways to fix data load errors?
I. Roll back to data state prior to load II. Delete data load, repair rogue data, and reload
What are the six objecties of data warehouses?
I. Single source of truth II. Standardized structures and representation for all enterprise information III. Easy access (single point of access to enterprise information) IV. Self service, high quality business reporting and analysis on all levels V. High performance environment fed from heterogeneous sources VI. Unburden systems and IT resources in the transactional environment
What are the three main types of master data?
I. Text II. Attributes III. Hierarchies
Given a data set, what do you do (4 steps)?
I. Understand the data (data dictionary, create a star schema) II. Design the Snowflake (Master tables, data types, lengths) III. Model in SAP BW/4HANA (Implement design, key figures, characteristics) IV. Load data set (master data, transactional data)
What is an entity?
Identifiable things that users want to track (customers, products, sales orders, vendors, etc.)
What are InfoObjects? What do they help us build?
InfoObjects are the "bricks" to build data structures inside of SAP BW, they represent business objects from an end user's view.
In SAP BW what does data source mean?
Interface object to source system representing the source data structure
In a Snowflake Schema, what is the purpose of the dimension table
It acts as a link between master data and fact table records
What does an aDSO consist of?
It can consists of InfoObjects and / or ordinary database fiels. Can be used in different layers depending on property sets
What is the ALPHA conversion routine?
It converts from internal to external length
What is SAP BW/4HANA?
It is SAP's version of Enterprise Data Warehouse, but it only runs on SAP HANA
What is the purpose of data mining?
It is important that the results provide enterprises with insights not available through traditional techniques or predefined relationships (i.e. relational tables)
How is Business Intelligence efficient?
It is user-centered to improve overall decision making
What does OLTP stand for and what is its purpose?
OLTP stands for Online Transaction Processing System and its primary purpose is to run the business on a current basis (operational systems).
Define wisdom
Reveals ideas principles, biases, and insights. Usually where policies are made
Define information
Reveals relationships between entities (the WHAT behind analytics)
Define knowledge
Reveals the trends and patterns (the WHY behind analytics)
How does data flow in SAP BW?
Source system --> data source --> aDSO
What is a composite entity
The database will not allow you to create a grade without having a student to assign the grade to
What type of data do OLAP data systems use and who are the primary users of this data?
The primary type of data used is historical point-in-time and prediction data. This is primarily used by managers, business analysts, and customers.
What does loading mean in ETL?
The movement of data from source systems to their intermediate or final destinations
Define data mining
The process of discovering meaningful correlations, patterns, and trends by sifting through large amounts of data stored in repositories
What does transformation mean in ETL?
The process of mapping and harmonizing data from their sources to the targets
What does extract mean in ETL?
The process of sourcing the data for analysis
What are InfoProviders?
They are the data containers inside of SAP BW. They may store data records physically (persistent) or virtually.
In SAP BW what does transformation mean
Transforms the data according to defined rules
In SAP BW what does Data Transfer Process (DTP) do?
Triggers the data transfer between a source and target
What is a transitive dependency?
When a column depends on another column to be defined (i.e.If A --> B and B --> C; therefore A -->C)