20-768 [03] - Working with Cubes and Dimensions

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

MOLAP

MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Advantages: Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. Can perform complex calculations: All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.

ATTRIBUTE COLUMN BINDINGS

ATTRIBUTE COLUMN BINDINGS. To control output from attributes, (1) define the column that uniquely identifies attribute values, (2) the column that users see, and (3) an optional value column you can use for MDX calculations 1. KEY COLUMN - Uniquely identifies members. Often the PK of Dim Table. May be combined key. A key column value occurs in every row of the fact table, so reducing its size makes the fact table smaller and makes joins between the fact table and the time dimension faster. 2. NAME COLUMN - The value displayed to the user. Can be a Calculated Column from source view. 3. VALUE COLUMN = The value returned by an MDX "MemberValue" function. Thus can create calculations based on a value other than the name or key.

FACTS / MEASURES

FACTS / MEASURES. Facts, or measures, have the values that interest you, but dimensions provide the business context by which you aggregate these measures.

RAGGED HIERARCHIES

ragged hierarchies

ATTRIBUTE HIERARCHIES

ATTRIBUTE HIERARCHIES. SSAS dimensions contain attributes that correspond to columns in dimension tables. These attributes appear as ATTRIBUTE HIERARCHIES and can be organized into user-defined hierarchies, or defined as parent-child hierarchies, based on columns in the underlying dimension table. You use hierarchies to organize measures contained in a cube. Hierarchies define the DRILL-DOWN PATHS through aggregations. Each layer of a Hierarchy is call a LEVEL.

ATTRIBUTES

ATTRIBUTES: The Name attribute provides a friendly name for the dimension, enabling you to display an alternative, more relevant, name than the source table name-to-end users. SSAS dimensions contain attributes that correspond to columns in dimension tables. These attributes appear as attribute hierarchies and can be organized into user-defined hierarchies, or defined as parent-child hierarchies, based on columns in the underlying dimension table. You use hierarchies to organize measures contained in a cube.

DEFAULT DIMENSION HIERARCHY

By DEFAULT: Every attribute defined within a dimension is part of a two-level hierarchy known as an attribute hierarchy. The TOP LEVEL is named "ALL"; the SECOND LEVEL is the "LEAF" LEVEL. The Leaf Level contains values for each individual member of the attribute. While they can help in browsing, the data within an attribute hierarchy can typically be too large and difficult to navigate. Attribute hierarchies are often replaced with user defined hierarchies where you can define more levels and make data more manageable. EX: A Date attribute hierarchy might have an "All level" and a second level containing all the dates within the dimension. To make this more manageable, you could replace the Date attribute hierarchy with a user-defined hierarchy named Calendar Year that contains Year, Quarter and Month levels, making the data easier to navigate.

DIMENSION MODES.

CONFIGURING DIMENSION STORAGE -- 2 DIMENSION MODES. MOLAP & ROLAP define where, and in what structure type, the dimension data is stored: 1. MOLAP --- Data stored in a MultiDimensional Structure created & populated when the dimension is processed. OPTIMIZED for FASTEST QUERY performance. Precomputes every possible combination between Measures and the Dimension. MOLAP = Multidimensional Online Analytical Processing. You can incrementally or fully update objects in MOLAP storage without taking the partition or cube offline. However, there are situations that may require you to take a cube offline to process certain structural changes to OLAP objects. http://msdn.microsoft.com/en-us/library/ms174915.aspx 2. ROLAP --- Data stored in Relational Format and Aggregated when queried. OPTIMIZED for minimal PROCESSING OVERHEAD & REAL-TIME QUERYING of a Relational source. Uses Aggregate Tables. Choose which table at query time. Denormoralization. Query response is generally slower with ROLAP storage than with the MOLAP or HOLAP storage modes. ROLAP = Relational Online Analytical Processing.

DIMENSION CONCEPTS

DIMENSION CONCEPTS. Dimensions... __ Provide context for FACTS. __ Provide aspects by which Facts are AGGREGATED. __ Are COLLECTION of ATTRIBUTES from a table or View. __ Are typically arranged as a HIERARCHY.

DIMENSION DESIGNER.

DIMENSION DESIGNER. Is used to edit the Attributes, Levels, Hierarchies, and Translations of a dimension, and to browse the dimension. Has 4 Tabs... 1. DIMENSION STRUCTURE TAB. Used to View & Edit attributes, Levels & Hierarchies of a dimension. 2. ATTRIBUTE RELATIONSHIPS TAB. Used to Create, Modify, or Delete attribute relationship of a dimension. 3. TRANSLATIONS TAB. Used to View & Edit the Multilanguage translation for Attributes, Levels, & Hierarchies. 4. BROWSER TAB. Used to browse members of each Hierarchy in a dimension. Members can only be browsed AFTER solution is DEPLOYED.

DIMENSIONS

DIMENSIONS. Dimensions are a fundamental component of cubes. Facts, or measures, have the values that interest you, but dimensions provide the business context by which you aggregate these measures. Dimensions organize your data by category and show results grouped by these categories, such as product, customer, or month. SSAS dimensions contain attributes that correspond to columns in dimension tables. These attributes appear as ATTRIBUTE HIERARCHIES and can be organized into user-defined hierarchies, or defined as parent-child hierarchies, based on columns in the underlying dimension table. You use hierarchies to organize measures contained in a cube. All SSAS dimensions are groups of attributes based on columns from tables or views in a data source view. A DATABASE DIMENSION exists independently of a cube; an instance of a database dimension within a cube is called a CUBE DIMENSION. Dimensions can exist independently of a cube—you can use dimensions in multiple cubes, use them multiple times in a single cube, and link them between SSAS instances.

HIERARCHY RELATIONSHIPS

Defining relationships between hierarchy levels enables Analysis Services to establish more useful aggregations that, in turn, increase query performance. This can save memory during processing and increase performance—which is often important with large or complex cubes. When arranging attributes into user-defined hierarchies, you define relationships between hierarchy levels. The levels are connected in a many-to-one or one-to-one relationship, referred to as a natural relationship. EX: in a Calendar Time hierarchy, a Day level is related to the Month level, the Month level to the Quarter level, and so on. A NATURAL HIERARCHY is composed of attributes where each is a member property of the attribute below. EX: A CustomerGeography hierarchy with the levels Country-Region, State-Province, City, ZIP Code and Customer, is a natural hierarchy because of the relationships between the attributes. By contrast, a Marital Status-Gender hierarchy containing the levels Marital Status and Gender is non-natural, because marital status and gender do not have a hierarchical relationship to each other. For performance reasons, natural hierarchies are preferred and the Dimension Designer will warn you if you create a hierarchy that is non-natural. You can ignore this warning when it makes sense to use a non-natural hierarchy. EX: It is plausible that you would want to drill down from marital status to gender.

PARENT-CHILD HIERARCHIES

PARENT-CHILD HIERARCHIES. A. Exist when have SELF-REFERENCING dimension tables. B. Are UNBALANCED hierarchies. C. Are CHANGING DIMENSIONS. D. Have AGGREGATES calculated at RUN TIME. __________________ "changing dimensions", which means that rows can be modified without the need for processing. They are also the only dimension type you can write-enable. __________________ Because the parent-child hierarchy is calculated at run time, no aggregates are calculated when the cube is processed. __________________ Parent-Child Dimensions - http://aka.ms/Whhuwv

QUESTION: MOLAP dimensions provide better performance than ROLAP but which of the following is a reason for choosing ROLAP? ___ A. ROLAP is easier to implement than MOLAP. ___ B. When real-time OLAP is required. ___ C. Querying ROLAP generally involves simpler queries. ___ D. ROLAP is more efficient with small dimensions. [_]

QUESTION: MOLAP dimensions provide better performance than ROLAP but which of the following is a reason for choosing ROLAP? ___ A. ROLAP is easier to implement than MOLAP. ___ B. When real-time OLAP is required. ___ C. Querying ROLAP generally involves simpler queries. ___ D. ROLAP is more efficient with small dimensions. [ ANSWER: _________ ] ___________ MOLAP and ROLAP are different ways of physically storing the Analysis Services database cubes. The vast majority of Cubes I've been involved with use the MOLAP storage mode. The only exception to this that I am aware of is when there is a requirement to have real time updates within the cube. This can be achieved by having the current partitions as ROLAP. However the slice of the partition must be kept small and the hardware of the severs holding the cube and data warehouse must be high to cope with the additional resources required to serve a query request direct from the data warehouse. This is due to fast query performance which is down to optimized storage, multidimensional indexing and caching. Most Cubes are refreshed over night outside office hours so the longer time it takes to refresh data in MOLAP cube is generally not a deciding factor. SSAS cubes are generally excellent at handling large numbers of dimensions while minimizing data redundancy provided that there is; - 1) Effective use of hierarchies 2) Effective use of attribute relationships. 3) Optimized use of the Dimension Usage Tab by ideally basing the cube on the simplest possible relationships (star schema within the data warehouse) and joining Dimensions to Measure Groups using the simplest data types (surrogate keys) ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality. Also the amount of time it takes to refresh the data within a ROLAP storage mode is shorter than ROLAP. ??????? On balance I've successfully delivered multi-billion record fact tables with fast query processing using MOLAP. Where the physical storage of the MOLAP cube was significantly less than the data warehouse. I tend to use partitioning to cope with scale when fact tables reach 100's millions of rows and I prefer to use pro-active caching to cope with updating the cube during the day when the business requires. Just for completeness HOLAP (Hybrid Online Analytical Processing) as the name suggests in a hybrid between MOLAP and ROLAP. Where HOLAP stores aggregations in MOLAP for fast query performance, and detailed data in ROLAP to optimize time of cube processing. Further relevant link; - http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/105bb9d0-5069-4938-bf89-f16d04d76bcd/ For more details, refer to; http://msdn.microsoft.com/en-us/library/ms174915.aspx .

ROLAP

ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Advantages: Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. Disadvantages: Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions.

CUBE WIZARD, DIM. WIZARD & DIM. DESIGNER.

The Cube Wizard & Dimension Wizard CREATE ATTRIBUTES for a dimension. Other wizards and elements of the SSAS user interface may further modify these attributes. These default settings are sufficient in most situations, but you can use Dimension Designer to edit the attributes. You can REMOVE attributes from the dimension by right-clicking the attribute and clicking Delete. Removing attributes from a dimension does not affect the data source view, which can be used by multiple dimensions without forcing each dimension to use all the same attributes. You can RENAME an attribute to provide a more meaningful or user-friendly name than the dimension table. Do this by right-clicking the attribute in the Dimension Structure tab of the Dimension Designer. You can also set the NAME PROPERTY of the attribute in the Properties window or edit it directly if the Attributes pane is in the grid view. Having many attributes for a dimension can confuse users. You can organize attributes into DISPLAY FOLDERS to simplify browsing. Using folders only affects the way client applications display the dimension and has no other effects on hierarchies or attributes. After you have created display folders, DEPLOY the solution and RECONNECT to see the RESULTS on the Browser tab. Attributes can be used to create HIERARCHIES. EX: A Product dimension might have Product Category, Product Sub-Category, and Product Name attributes. Attributes are used to add extra detail to dimension members and do not need to be the basis for a hierarchy method. Therefore, in the previous example, you might have Product Description and Product Image attributes. These attributes are not used for a hierarchy, nor to sort or group members, but are often needed to provide additional detail to dimension members.

NON-LEAF DATA

You should also consider what to do with NON-LEAF DATA. In the example used here, the sales manager can also make sales. If this is the case, and you have data in the fact table associated to non-leaf members, you must change the dimension's Members With Data property to non-leaf data visible or non-leaf data hidden. Otherwise, processing the cube fails. If you use non-leaf data visible, the parent—in this case the sales manager—will also appear in the level below and seem to report to himself or herself. If you use non-leaf data hidden, the sales manager will not appear in the level below and, should you perform a sum on all employee sales, the value will not include the sales manager.


Set pelajaran terkait

Hematology: LabCE Erythrocyte Inclusions

View Set

Day 4 - Lymphatic's/mapping/pinpoint/spyphi

View Set

Commercial Pilot Oral - Human Factors

View Set

soc midterm- Latent Functions of Education

View Set

ENTR 187 - Entrepreneurship - Sac State - Dr. VanVo

View Set

Env Chem - Unit 2 Greenhouse Effect Review

View Set