Data Warehousing Questions

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is Data Warehousing?

A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. Data warehousing merges data from multiple sources into an easy and complete form.

Difference between OLAP and Data Warehouse?

A data warehouse serves as a repository to store historical data that can be used for analysis. OLAP is Online Analytical processing that can be used to analyze and evaluate data in a warehouse. The warehouse has data coming from varied sources. OLAP tool helps to organize data in the warehouse using multidimensional models.

What is a degenerate dimension table?

A degenerate table does not have its own dimension table, derived from a fact table. The column which is part of a fact table, does not map to any dimension. Ex: Employee_ID

What are dimension tables?

A dimension table in a data warehouse contains fields used to describe the data in fact tables. Can provide additional and descriptive information of the field of a fact table.

What are fact tables?

A fact table in a data warehouse consists of facts and/or measures. The nature of data in a fact table is usually numerical.

What is Data Dictionary?

A file which consists of the basic definitions of a database. Contains the list of files that are available in the database, number of records in each file, and the information about the fields.

What is Data Mining?

A method for comparing large amounts of data for the purpose of finding patterns. Normally used for models and forecasting. Process of correlations, patterns by shifting through large data repositories using pattern recognition techniques.

What is junk dimension?

A single dimension formed by lumping a number of small dimensions. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension.

What is the snow flake scheme design in a database?

A snowflake schema is the simplest form of an arrangement of fact tables and dimensional tables. The fact table is usually at the center surrounded by the dimension tables. If the dimensional table is split into many tables, where the schema is inclined slightly towards normalization, then the snow flake design is utilized.

Difference between star and snowflake schema?

A star schema is highly de-normalized technique that has one fact table and is associated with numerous dimension tables and depicts a star. A snowflake schema uses normalized principles where every dimension table is associated with sub dimensions table/s.

What is cube grouping?

A transformer built set of similar cubes. Generally used in creating smaller cubes that are based on the data in the level of dimension.

What is a surrogate key?

A unique identifier in a database either for an entity in the modeled word or an object in the database. A surrogate key is an internally generated key by the current system and is invisible to the user. Ex: Sequential number.

Difference between view and materialized view?

A view is created by combining data from different tables, hence, a view does not have data of itself. A materialized view usually used in data warehousing has data, this data helps in decision making, performing calculations etc. When a view is created, the data is not stored in the database, the data is created when a query is fired on the view, whereas, data of a materialized view is stored.

What is Active Data Warehousing?

Aims to capture data continuously and deliver real time data. They provide a single integrated view of a customer across multiple business lines. Associate with Business Intelligence. Used to find trends and patterns that can be used in future decision making.

What is Data Modeling?

Aims to identify all entities that have data, then defines a relationship between these entities. Data models can be conceptual, logical or physical. Conceptual models are used to explore high level business concepts. Logical models are used to explore domain concepts. Physical models are used to explore database design.

What is Virtual Data Warehousing?

An aggregate view of complete data inventory, containing meta data, and uses middleware to build connections to different data sources. The can be fast as they allow users to filter the most important pieces of data from different legacy applications.

What is analysis service?

An integrated view of business data provided with the combination of OLAP and data mining functionality. This service allows the user to utilize a wide variety of data mining algorithms which allows the creation and designing of data mining models.

What is time series algorithm in data mining?

Can be used to predict continuous values of data, once it is skilled to predict a series of data, it can predict the outcome of other series. Ex: Performance one employee can influence or forecast the profit.

What is snapshot with reference to data warehouse?

Can be used to track activities. Snapshot has three components: - Time when event occurred - A key to identify the snap shot - Data that relates to the key

What is sequence clustering algorithm?

Collects similar or related paths, sequences of data containing events. Ex: This algorithm may help finding the path to store a product of "similar" nature in a retail ware house.

What is continuous data in data mining?

Considered to be data which changes continuously and in an ordered fashion. Ex: age, salary, years of experience.

What is discrete data in data mining?

Considered to be defined or finite data. Ex: Employee ID, phone number, gender, address etc.

What is the purpose of Factless Fact Table?

Contain key values that are referenced by dimensions, they don't have any facts or information but are commonly used for tracking some information of an event.

What are the methods of loading Dimension tables?

Conventional Load: In this method all the table constraints will be checked against the data, before loading the data Direct Load or Faster Load: The data will be loaded directly without checking the constraints. Data checking against constraints will be performed later and indexing will not be done on bad data.

Difference between data warehousing and data mining?

Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries. Exploring the data using data mining helps in reporting, planning strategies, finding meaningful patterns etc.

Difference between dependent and independent data warehouse?

Dependent data warehouse stores the data in a central data warehouse. An independent data warehouse does not make use of a central data warehouse.

What is ETL process in data warehousing?

ETL stands for Extraction, Transformation and Loading. Means extracting data from different sources such as flat files, databases or XML data, then transforming this data depending on the application's need and loads this data into the data warehouse.

What are linked cubes?

Linked cubes are the cubes that are linked in order to make the data remain constant. This linkage reduces the possibility of sparse data.

What are cubes?

Multi-dimensional data in a summarized version where the dimension and the data are represented by the edge and the body of the cube respectively. A cube typically includes the aggregations that are needed for business intelligence queries.

What are the Fundamental Stages of Data Warehousing?

Offline Operational Databases: Initial stage of data warehousing, where the development of an operational system to an off-line server is done by simply copying the databases. Offline Data Warehouse: In this stage the data warehouses are updated regularly using the source data. Real Time Data Warehouse: Data warehouses are updated for every transaction performed on the source data. Integrated Data Warehouse: Data warehouses are updated when a transaction is performed and also generates transactions which are passed back to the source online data.

What is an OLAP system?

Online Analytical Processing: performs analysis of business data and provides the ability to perform complex calculations on usually low volumes of data. OLAP helps the user gain an insight on the data coming from different sources (multi-dimensional).

What is an OLTP system?

Online Transaction and Processing: helps and manages application based on transactions involving high volume of data. OLTP is based on client-server architecture and supports transactions across networks.

What is Dimensional Modeling?

Rational or consistent design technique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. Different from entity-relationship model.

Difference between SAS tool and other tools?

SAS is a reporting tool and an ETL tool also contains a forecasting tool. For this reason, SAS is used most in clinical trials and healthcare industry. Other tools consist of reporting tools, for example Business Objects Cognos or ETL tools, for example, Informatica, or both for example Business Objects.

What is a Data Mart?

Stores particular data that is gathered from different sources. This data may belong to some specific group of people. Data marts can be used to focus on specific business needs.

What is Metadata?

The description of data. It contains information about how and when, by whom a certain data was collected and the data format. Essential to understand information that is stored in data warehouses and xml-based web applications.

What is the level of granularity of a fact table?

The granularity is the lowest level of information stored in the fact table, the depth of data level is known as granularity. Ex: In date dimension the level could be year, month, quarter, period, week, day of granularity.l

What is Data Cleaning?

The process of identifying erroneous data. Data is checked for accuracy, consistency, typos etc. Methods: Parsing - used to detect syntax errors Data Transformation - Confirms that the input data matches in format with expected data Duplicate elimination - this process gets rid of duplicate entries Statistical methods - values of mean, standard deviation, range, or clustering algorithms to find erroneous data

What is Business Intelligence?

When the organization analyzes the measurement of aspects of business such as sales, marketing, efficiency of operations, profitability, and market penetration within customer groups. Typically encompasses OLAP, visualization of data, mining data and reporting tools.

What is XMLA?

XML for Analysis, which can be considered as a standard for accessing data in OLAP, data mining or data sources on the internet. It is Simple Object Access Protocol. XMLA uses discover to fetch information from the internet and then the applications execute against the data sources.


Set pelajaran terkait

CISSP Domain 7: Security Operations

View Set