Exam Prep 70-768 (Data Modeling)

Ace your homework & exams now with Quizwiz!

-- Subsets of data can be analyzed in Excel (subset option) -- connection string property: DataView=Sample, if used with other tool. -- should be defined for all tables of a DirectQuery Model otherwise when testing no sample data is available.

-- DirectQuery Mode Sample Subset

-- adding filter to table Orders: SUMX( FILTER(Orders, Orders[Price] > 1); Orders[Amount]) -- removing all filters from table Orders: SUMX( ALL(Orders); Orders[Amount]) -- ALL(column) can also be used to remove filter from single column -- sample: AllChannels:= CALCULATE( SUMX(Orders, Orders[Amount]), ALL(Orders[Channel])) CALCULATE(expression, filter1, filter2, ...)

-- Manipulatiing Filter context

-- use Profiler or Extended Events -- for Teradata or Oracle use trace monitoring tools for source database instead

-- Troubleshooting DirectQuery SQL Queries

-- used to add columns to DAX query expression -- not possible to return some but not all columns from table -- added columns are not part of filter context, cannot be used in the filter parameter of CALCAULATE or CALCULATETABLE or group by of SUMMARIZE -- syntax: ADDCOLUMNS(<tableexpression>, <name>, <expression>[, <name>, <expression>]...) -- sample: evaluate ADDCOLUMNS( 'Internet Sales', "Margin%", ('Internet Sales'[Sales Amount] - 'Internet Sales'[Total Product Cost]) / 'Internet Sales'[Total Product Cost] )

ADDCOLUMNS function

-- get information about non existing combinations -- syntax: ADDMISSINGITEMS(<showAllColumn>[, <showAllColumn>]..., <table>, <groupingColumn>[, <groupingColumn>]...[, filterTable]...) -- sample: EVALUATE FILTER( ADDMISSINGITEMS( 'City'[Sales Territory], 'Stock Item'[Color], SUMMARIZE( 'Sale', 'City'[Sales Territory], 'Stock Item'[Color], "Sale Count", 'Sale'[Sale Count] ), 'City'[Sales Territory], 'Stock Item'[Color] ), 'City'[Sales Territory] = "External" )

ADDMINSSINGITEMS function

-- used for non-measure calculated members -- ensures proper calculation for each intersecting measure in query -- sample: CREATE MEMBER CURRENTCUBE.[City].[Sales Territory].[All].West AS AGGREGATE( {[City].[Sales Territory].[Far West], [City].[Sales Territory].[Southwest]}), VISIBLE = 1, LANGUAGE = 1033 ; -- LANGUAGE controls format of value

AGGREGATE function

-- to exclude allmost all the filters of the table -- syntax: ALLEXCEPT(<table>,<column>[,<column>[,...]]) -- sample: CALCULATE( SUM(Orders[Amount]; ALLEXCEPT() Orders; Orders[Price]) -- also removes all not mentioned filters on related lookup tables (e.g. cities, channels)

ALLEXCEPT

-- create VisualTotals -- corresponding to the VISUALTOTALS function in MDX -- ALLSELECTED removes the last filter generated from context transition.

ALLSELECTED function

-- execute function MyFunction in assembly MyASSP like this: SELECT MyASSP.MyFunction([Measures].[A Measure]) ON COLUMNS FROM [My Cube];

ASSP sample

-- Model.bim property DirectQuery-Mode = On -- Default Off -- deployment still required -- but no processing takes place because no tables have to be loaded -- no Data stored on the SSAS server, only metadata that helps in converting DAX to SQL -- if DirectQuery-Mode is switched from Off to On Data is flushed and only metadata is kept. -- Multiple partitions on any table are not allowed (keeps DirectQuery Mode from getting activated)

Activate DirectQuery Mode

-- MembersWithData: can be set to NonLeafDataVisible or NonLeafDataHidden -- sample: -- NonLeafDataHidden means hide managers data if drilldown to see the managers employees (manager is hidden) = best practice -- MembersWithDataCaption if MembersWithData is set to NonLeafDataVisisble, caption for member when you drill down to children can be adjusted -> "* (Manager)" -- NamingTemplate: -- assign level names instead of (Level 01, Level 02 etc.) -- sample value: manager;sales *; -- RootMemberIf: -- ParentIsBlankSelfOrMissing (default) -- consider member as a parent if one of the three following cases fit: -- ParentIsBlank -- consider member as a parent only when the parent atribute column is blank -- ParentIsMissing -- consider member as a parent only when the parent column value is not found anywhere in the key attribute column -- ParentIsSelf -- consider member as a parent only when the parent atribute column matches the value in the key attribute column -- UnaryOperatorColumn: -- special design scenarios, such as financial account dimension, custom rollup behaviour allowed unary operators: +, -, *, /, ~ (to ignore value) sample: https://sqljoe.wordpress.com/2011/09/23/using-unary-operators-to-control-analysis-services-hierarchy-aggregations/

Additional Parent Attribute Properties

-- all errors should be resolved during ETL -- Error configuration Default setting: Default if this setting is kept any error during processing causes the operation to fail. Custom behaviour can be defined by setting to user defined. -- Error configuration can also be overriden on each processing request. -- Processing Mode Default: Regular partition can not be read by storage engine until Process Full operation is complete if set to "Lazy Aggregation" partition is available as soon as Process Data is complete. Process Index then runs in the background. Lack of control when aggregations are finally built. -- ProcessingPriority defines priority of building current partition relative to other partitions when ProessingMode is set to Lazy Aggregation -- Slice should be defined for ROLAP and proactive caching partitions, also for MOLAP partitions when a measure group contains asymmetrical partitions (one partition one year and another with multiple years)

Additional partition properties

-- Additive aggregate function Additive across all dimensions -- Semiadditive aggregate function Additive across some dimensions but not necessarily across all dimensions -- Sample: Inventory, add across some dimension but not across Date dimension (June, July, Year which is typically the month of december) -- Second Sample: Financial Reporting, sum up revenue and expenses but for assets and liabilities use last value -- Nonadditive aggregate functions Not aggregatable at all.

Additive, semiadditive, nonaddititve

-- main ones: -- SUM, AVERAGE, MIN, MAX -- operate on numeric or date types only -- A-Types -- sample: COUNTA -- operating on Any Type -- X-Types -- smaple: COUNTAX, COUNTX, SUMX, etc... -- aggregate expression (which is calculated for each row of a table) by a corresponding aggregate function -- base tables used as arguments for aggX and FILTER functions are filtered by filter context

Aggregate Functions

Sum additive Count additive Min semiadditive Max semiadditive DistinctCount nonadditive None nonadditive ByAccount semiadditive AverageOfChildren semiadditive FirstChild semiadditive LastChild semiadditive FirstNonEmpty semiadditive LastNonEmpty semiadditive

Aggregate Functions

-- SSAS does NOT create aggregations by default -- use Aggregation Design Wizard in the Cube Designer as part of initial deployment to production. -- determine which attributes should be considered for aggregation by following rules: -- FULL: requires SSAS to include Attribute in every aggregation or an attribute from a lower level. set only for a limited number of attribute, if any -- NONE: requires SSAS to exclude this attribute from all aggregations. Set this for infrequently used attributes -- UNRESTRICTED: Set this value manually for 5 to 10 of the most commonly used attributes. -- DEFAULT: applies default rule -- Aggregation Design Wizard counts row in Measure group and members in attributes, can be corrected if only subset of production data is available. -- select Performance gain reaches option 30% as aggregation option. -- determine which aggregates are useful for queries with profiler (Get Data from Aggregation event) -- manually create aggregation in extended aggregation view if aggregations are missing -- Usage Based Optimization Wirard can be used to design aggregations to improve query performance for the targeted queries

Aggregations

-- used if none of the MDX statements meet ones needs -- create and register an assembly on SSAS instance -- implemented in VB.NET or C# assembly (DLL) -- registered via SSMS on server or database level (right click assemblies) -- sample: inte rnal rate of return (IRR oder interner Zinsfuß)

Analysis Services stored procedure (ASSP)

-- Type: .NET -- File Name: full Filename of assembly -- Assembly Name: name -- Include Debug information: debugging feature in Visual Studio when executing a function in this assembly -- Permissions: -- Safe: assembly has access to internal computations only -- External Access: internal computations and external resources (file system, database) -- Unrestricted: No protections of the system -- Impersonation: sets execution context for external resources with following settings -- Use a specific Windows User Name and Password -- Use The Service Account -- Use the Credentials of the current User -- Anonymous: use IUSER_servername Windows login -- Default: Default to running the code under the Windows login for the current user

Assembly Registering properties

-- if value to be assigned does not require multiple expressions, use this assignment operation: ([Invoice Date].[Calendar Year].[CY2014], [Invoice Date].[Calendar].[Calendar Month].Members, [Measures].[Sales Quota]) = 4500000

Assignment of single value to all cells of a subcube

-- defines if attribute hierarchy is created -- set to false to convert attribute to member property

Attribute AttributeHierarchyEnabled property

-- defines if attribute hierarchy is visible -- if it's not visible it still can be used in a user defined hierarchy -- AttrbuteHierarchyEnabled must be set to TRUE

Attribute AttributeHierarchyVisible property

-- Gibt an, welche Aktion Analysis Services ausführt, wenn ein NULL-Wert in einem Datenelement festgestellt wird. -- possible values: -- Automatic Gibt an, dass NULL-Werte für das Cube-Datenelement oder das Dimension-Datenelement als ZeroOrBlank behandelt werden. -- Error NULL-Werte sind in diesem Datenelement unzulässig. -- Preserve Gibt an, dass der NULL-Wert beibehalten wird. -- UnknownMember Gibt an, dass der NULL-Wert als unbekanntes Element behandelt wird. -- ZeroOrBlank Gibt an, dass der NULL-Wert in 0 (für numerische Daten) oder in eine leere Zeichenfolge (für Daten in Zeichenfolgen) konvertiert wird.

Attribute Key Column or Name Column NullProcessing Property

-- RelationshipType: Rigid or Flexible Flexible (default): allows changes to occur without requiring to fully process the dimension Rigid: any change requires you to fully process the dimension -- Attribute: Parent Attribute -- Cardinality: One or Many (default) -- Name: Defaults to the Parent Attribute Name -- Visible: Determines whether a client application can access the parent attribute as a member property in an MDX query. Performance considerations: https://msdn.microsoft.com/en-us/library/dn749781.aspx.

Attribute Relationship Properties

-- RelationshipType property can be Rigid or Flexible -- Default: Flexible -- determines what happens when the parent for an attribute changes -- the impact of changing parent depends on processing operation performed and RelationshipType property -- flexible and PROCESS UPDATE: still needs to be defined

Attribute Relationship Types

-- Must be created between hierarchy levels in case of a natural hierarchy Cannot be created between hierarchy levels in case of an unnatural hierarchy (color - size)

Attribute Relationships

-- SQL Server Agent -> TMSL -- SSIS as part of workflow -> Analysis Services DDL Task -- AMO -> client application -- Powershell cmdlet

Automating Process Operations

-- Building: build artefacts into bin directory (preparation for deploying) -- Deploying: creates the defined objects in an instance of SSAS. -- Processing: copies the data from the underlying data sources into the cube objects

Building, Deploying, Processing

-- can take two forms: -- boolean expression CALCULATE( sum(Orders[Amount]); Cities[Continent] = "Europe") -- list of values for a column or a list of rows for a table CALCULATE(sum(Orders[Amount]); All(Cities[Continent])) CALCULATE(sum(Orders[Amount]); All(Cities)) -- also can be mixed: AllCitiesItalyAndFrance:=CALCULATE(SUM(Orders[Amount]); Cities[Country]="Italy" || Cities[Country]="France"; All(Cities[City]))

CALCULATE Fitler parameters

-- RELATED and RELATEDTABLE can be expressed by CALCULATE and CALCULATETABLE. sample: RELATED(Channels[Discount]) -> row context of current row in Orders table is propagated to Channels table equivalent: CALCULATE(VALUES(Channels[Discount])) -> row context of current row in Orders table is propagated to filter context in Channels table. If there is a many to one relationsship, VALUES returns exactly one row otherwise it returns multiple rows and an error is thrown and the expression fails.

CALCULATE and RELATED

-- controls aggregation of cell values -- if removed only queries that return values at the bottom level of each attribute hierarchy (leaf level) return values

CALCULATE command

-- are synonyms

CALCULATETABLE and RELATEDTABLE

-- check for existence of at least one value -- syntax: CONTAINS(<table>, <columnName>, <value>[, <columnName>, <value>]...) -- sample: ROW( "SalesExist", CONTAINS( 'Internet Sales', 'Internet Sales'[Unit Price], 564.99 ) )

CONTAINS functions

-- counts BLANKs in column -- syntax: COUNTBLANK(<column>)

COUNTBLANK function

-- counts only numbers, dates and text that can be converted to a number -- sample: count cities that have at least one sale (accumulated in Total Sales aggregate) City Sale Count:=COUNTX(VALUES(City[WWI City ID]);'Sale'[Total Sales]) -- syntax: COUNTX(<table>,<expression>)

COUNTX function

-- most operations run in parallel -- Query Thread Pool and Process Thread Pool is maintained -- formula engine is single threaded -- watch for Processing Pool Idle Threads if it is 0 for a long time increase ThreadPool\Process\MaxThreads property

CPU utilization

CREATE [ SESSION ] [HIDDEN] [ CALCULATED ] MEMBER CURRENTCUBE | Cube_Name.Member_Name AS MDX_Expression [,Property_Name = Property_Value, ...n] ......[,SCOPE_ISOLATION = CUBE] -- SCOPE_ISOLATION = CUBE means: run query scoped calculations before calculations defined in the cube -- SESSION = default, if not specified

CREATE MEMBER

-- FORE_COLOR -- value can be expression, controls color of calculated members value --ASSOCIATED_MEASURE_GROUP -- controls measure group under which displayed CREATE MEMBER CURRENTCUBE.[Measures].[Percent of Buying Group Sales Total] AS [Measures].[Sales Amount Without Tax] / ([Measures].[Sales Amount Without Tax],[Customer].[Buying Group].[All] ), FORMAT_STRING = "Percent", FORE_COLOR = IIF([Measures].[Percent of Buying Group Sales Total] > 0.35, 32768 /*Green*/, 0 /*Black*/), VISIBLE = 1 , ASSOCIATED_MEASURE_GROUP = 'Sale' ;

CREATE MEMBER sample

-- create sets that are commonly queried -- dynamic or static -- dynamic set factors in the query context

CREATE SET

-- only difference DYNAMIC instead of STATIC CREATE DYNAMIC SET CURRENTCUBE.[Top 3 Stock Items Dynamic] AS TOPCOUNT ( [Stock Item].[Stock Item].[Stock Item].Members, 3, [Measures].[Sales Amount Without Tax] ), DISPLAY_FOLDER = 'Stock Item Sets' ; CREATE STATIC SET CURRENTCUBE.[Top 3 Stock Items Static] AS TOPCOUNT ( [Stock Item].[Stock Item].[Stock Item].Members, 3, [Measures].[Sales Amount Without Tax] ), DISPLAY_FOLDER = 'Stock Item Sets' ;

CREATE SET sample

-- like SQL CROSSJOIN(<tableexp>, <tableexp>[, <tableexp>]...) -- sample: ADDCOLUMNS( CROSSJOIN( DISTINCT( 'Product'[Color] ), DISTINCT( 'Product'[Size] ) ), "Products", COUNTROWS( RELATEDTABLE( Product ) ) )

CROSSJOIN function

-- Calculated column used for: -- placing values on axis in pivot table (rows and columns, not values) -- categorizing numbers -- expressions that are strictly bound to current row -- Measures used for: -- profit percentage of a pivot table selection (cell) -- calculate ratios of product compared to all products but filter by year and region

Calcualted Columns, Measures

Member of the measures dimension, calculated by MDX expression

Calculated measure

-- defined by boolean expression -- defined in cell data tab of role designer -- example: NOT Measures.CurrentMember IS [Measures].[Profit]

Cell level security

-- with XMLA statement: <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <ClearCache> <Object> <DatabaseID>70-768-Ch1</DatabaseID> </Object> </ClearCache> </Batch> -- clearing of FileSystemCache also possible with RAMMAP tool: https://technet.microsoft.com/en-us/sysinternals/rammap.aspx

Clearing SSAS Cache (MD and Tabular)

-- dimension tables that are related to different fact tables, such as a Date dimension that is common to all analysis.

Conformed Dimensions

--completely different to MDX -- MDX: members, sets, hierarchies... -- DAX: tables, columns and relationsships... -- very different syntax to MDX

DAX

-- Integer, Real, Currency, Date, Boolean (True/False), String, Blob

DAX Data Types

-- Evaluation context -- Row Context -- Filter Context -- DAX functions might might use one or both of them -- some like CALCULATE or CALCULATETABLE generate interaction between them

DAX Evaluation Context

-- in case of the error "The result set of a query to external data source has exceeded the maximum allowed size of '1000000' rows." limit can be increased. -- MaxIntermediateRowSize (Default: 1.000.000) -- can be added to Msmdsrv.ini file, DAX section

DAX Query Properties

-- for grouping and adding new columns -- simplified syntax: SUMMARIZE ( Source, GroupByColumns, NewColumns ) -- sample: SUMMARIZE ( 'Internet Sales', Product[Color], "Sales", [Internet Total Sales] ) -- sample with ROLLUP and ISSUBTOTAL: -- group by is applied for Class and Color. Subtotal column is generated to indicate Color Rollup by boolean value: evaluate SUMMARIZE ( 'FactInternetSales', DimProduct[Class], ROLLUP ( DimProduct[Color] ), "Color Is Subtotal", ISSUBTOTAL( DimProduct[Color]), "Sales", [Internet Total Sales] )

DAX SUMMARIZE function

-- new in SSAS Tabular since 2016 -- syntax: SUMMARIZECOLUMNS( <groupBy_columnName> [, < groupBy_columnName >]..., [<filterTable>]...[, <name>, <expression>]...) -- sample: -- crossjoin generated for groupBy columns SUMMARIZECOLUMNS ( 'Date'[Calendar Year], 'Product'[Color], "Sales Amount", SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] ) ) -- sample with ROLLUP: SUMMARIZECOUMNS (Regions[State], ROLLUPADDISSUBTOTAL ( Sales[CustomerId], "IsCustomerSubtotal" ), Sales[Date], "Total Qty", SUM( Sales[Qty] )) -- ROLLUPADDISSUBTOTAL ( Sales[CustomerId], "IsCustomerSubtotal" ) creates two columns: Sales[CustomerId] and IsCustomerSubtotal

DAX SUMMARIZECOLUMNS

-- query data models and get results back -- EVALUATE clause to get data back from a DAX expression -- START AT only together with ORDER BY -- syntax: [DEFINE { MEASURE <tableName>[<name>] = <expression> } { VAR <name> = <expression>}] EVALUATE <table> [ORDER BY {<expression> [{ASC | DESC}]}[, ...] [START AT {<value>|<parameter>} [, ...]]]

DAX query syntax

-- use DAX query plans (gathered with Extended Events or Profiler) -- performance counters for tuning server's usage of memory and CPU resources -- primary target: reduce the time required by the formula engine -- look for CallbackDataId function in VertiPaq SE Query Begin events and try to change DAX so that this call back function is not called anymore -- look for Spool_Iterator which should be avoided -- too many VertiPaq queries -> try to replace FILTER with CALCULATE + Slicer or move filter to an outer group. -- VertiPaqResult sending high recordcount to next operator -> try adding filters or rearrange filters -- Slow or complex calculations -- avoid ISERROR and IFERROR because they force cell-by-cell-mode -- if testing for one row do not use COUNTROWS instead use HASONEVALUE or HASONEFILTER -- use ADDCOLUMNS instead of SUMMARIZE -- use ISEMPTY instead of ISBLANK -> this executes faster

DAX query tuning

-- The filter context is the set of filters applied to the data model before the evaluation of a DAX expression starts.

DAX: Filter Context

-- You have a row context whenever you iterate a table, either explicitly (using an iterator) or implicitly (in a calculated column): -- When you write an expression in a calculated column, the expression is evaluated for each row of the table, creating a row context for each row. -- When you use an iterator like FILTER, SUMX, AVERAGEX, ADDCOLUMNS, or any one of the DAX functions that iterate over a table expression.

DAX: Row Context

-- sample: AverageDiscountedSalesPerCustomer := AVERAGEX ( Customer, SUMX ( RELATEDTABLE ( Sales ), Sales[SalesAmount] * Customer[DiscountPct] ) ) -- Customer[DiscountPct] is the row context introduced by AVERAGEX over Customer. -- RELATEDTABLE also uses the row context introduced by AVERAGEX over Customer to return all sales for the current customer

DAX: Row Context sample

-- each attribute hierarchy's DEFAULTMEMBER is the All member -- can be changed by another chosen member or by MDX expression -- use with caution, because of implicit permanent filter

DEFAULTMEMBER

-- set expression is calculated by calculating like member and combining all results: Member expression syntax using a level expression Descendants(Member_Expression [ , Level_Expression [ ,Desc_Flag ] ] ) Member expression syntax using a numeric expression Descendants(Member_Expression [ , Distance [ ,Desc_Flag ] ] ) Set expression syntax using a level expression Descendants(Set_Expression [ , Level_Expression [ ,Desc_Flag ] ] ) Member expression syntax using a numeric expression Descendants(Set_Expression [ , Distance [ ,Desc_Flag ] ] )

DESCENDANTS Syntax variants

-- samples: Descendants([Geography].[Geo].[America],[Geography].[Geo].[Country],SELF) -> only countries: Mexico,... Descendants([Geography].[Geo].[America],[Geography].[Geo].[Country],AFTER) -> only cities: Los Angeles,... (Countries excluded) Descendants([Geography].[Geo].[America],[Geography].[Geo].[CITY],BEFORE) -> only countries + America (cities excluded) Descendants([Geography].[Geo].[America],[Geography].[Geo].[Country],BEFORE_AND_AFTER) -> America + cities (countries excluded) Descendants([Geography].[Geo].[America],[Geography].[Geo].[Country],SELF_AND_AFTER) -> countries and cities (America excluded) Descendants([Geography].[Geo].[America],[Geography].[Geo].[Country],SELF_AND_BEFORE) -> America + countries Descendants([Geography].[Geo].[America],,LEAVES) -> only cities

DESCENDANTS function

-- yields blank when division by 0 occurs -- syntax: DIVIDE(<numerator>, <denominator> [,<alternateresult>])

DIVIDE function

-- access time consuming processing operations: $System.discover_object_activity -- object space consumption: $System.discover_object_memory_usage

DMVs samples

-- defines where data is located and how to authenticate -- Impersonation Information: --Windows User and Password --Use Service Account --Use Credential of Current User --Inherit (uses impersonation information set in Data SourceImpersonationInfo database property)

Data Source Definition

-- specify which data to use for loading -- Abstraction Layer over the source database -- ideally "which data" should be implemented by view layer

Data Source View (DSV)

-- Grant users and define Database role in SSDT or SSMS -- Permissions: None Read Read and Process Process (insufficient to use SSMS to connect. Processing only can be done with a tool like SSIS) Administrator

Database Role

-- Text concat: & -- logical and, or: &&, ||

Dax Operators

-- build one dimension, build cube, deploy project, review results (in Excel) -- continue iteratively

Development approach

-- applies to cell level security only -- READ Permissions: -- role will be able to read cells defined by boolean expression -- READ-CONTINGENT permissions: applies also to any calculated measures related to disallowed measure: -- sample: NOT Measures.CurrentMember IS [Measures].[Profit] -> disallows also access to calculated measured related to [Measures].[Profit]

Difference Read-Contingent Permissions and Read Permissions

-- Descriptive columns about a row in the dimension table.

Dimension Attributes

-- Dimension is based on one or more dimension tables -- dimension creation methods: -- use existing table -- generate a time table in data source -- generate a time table on the server -- generate a non-time table in data source -- all attributes must be selected -- select AttributeHierarchyEnabled and Attribute Type properties

Dimension Creation Wizard

-- ValidFrom, ValidTo -- in Type 2 SCD scenarios

Dimension History columns

-- An optional column in a dimension table that has a foreign key relationship to another table in which information about ETL processes is maintained.

Dimension Lineage or Audit Key

-- The primary key in the source table for the dimension.

Dimension Natural or Business Key

-- CREATE ATTRIBUTE RELATIONSHIPS for one to many relationships Default: all attributes are related to the key attribute relationships are used to build indexes. Cross products between those two columns can be retrieved from index then. Furthermore aggregations created for Sales Territory can be used for queries requesting Country -- for each attribute relationship -- user-defined hierarchies define user defined hierarchies for natural hierarchies -- DON'T FORGET to create cascading attribute relationships for the attributes in the hierarchy -- hierarchies are materialized on disk

Dimension Optimization

-- retrieve data ffrom relational source, load data into dimension, build indexes -- can be configured by processing properties: -- ProcessingGroup: Default: ByAttribute (-> many SELECT DISTINCT statement on relational source) minimize the impact of multiple statement on relational source by using ByTable (-> single SQL statement to retrieve attributes) -- Proactive caching: Default: None configured the same way as for measure group partitions

Dimension Processing

-- can be triggered in SSMS or SSDT -- error configuratino can be overridden and batch settings can be defined or XMLA script generated -- different processing methods: Process Default Process Full Process Clear Process Data Process Add Prcess Index Process Update Process Structure Process Clear Structure

Dimension Processing Configuration

-- A primary key for the dimension table to uniquely identify each row. -- no business meaning. -- often defined as identity column -- use YYYYMMDD in date dimension tables ()

Dimension Surrogate Key

-- used for prototyping -- In real-world practice, the ETL process should prevent the insertion of a row into a fact table if a dimension member does not already exist. -- The addition of an Unknown member to a dimension is a common practice to handle data quality problems. -- When you manage missing dimension members for fact data in the ETL process, you should remove the Unknown member generated by SSAS. -- defined together with UnknownMemberName, which defines the Key of the member to display for the UnknownMember -- can be set to true to create Unknown member in dimension: -- The Unknown member that SSAS creates does not correspond to a row in your dimension table, but serves as a bucket for values for which a key in the foreign key column in the fact table is missing or invalid.

Dimension UnknownMember Property

-- define allowed members, denied members and default member for the role per attribute hierarchy -- defined in Dimension Data tab of role designer -- default member for the role overrides the default member for attribute hierarchy -- defined by set expressions -- enable Visual Totals checkbox for a hierarchy to reflect the filtered value

Dimension level security

-- good entrypoint http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/

Dimensional Modeling

-- Real-time access to data -- no administrative overhead to maintain model -- access to data that is larger than the available memory on SSAS server -- incoming queries are translated to SQL queries -- like ROLAP in BISM Multidimensional -- column store indexes can be built on SQL Server to compensate for negative performance impact.

DirectQuery Mode

-- Only one data source per model alowed -- data source types: SQL Server 2008 or later Azure SQL Database Azure SQL Data Warehouse Analytics Platform system (formerly know as Parallel Data Warehouse) Oracl 9i, 10g, 11g, 12g Teradata V2R6, V2

DirectQuery Mode Data Source Types

-- no calculated tables allowed -- some DAX functions are not supported in DirectQuery Mode -- no caching of query results

DirectQuery Restrictions

-- group attribute members into ranges -- set DiscretizationBucketCount and DiscretizationMethod property .. discretization methods: -- EqualAreas -- Clusters (only for numeric attribute values) -- define default name for group in Format property of NameColumn property

Discretization of attribute members

-- queried by schema rowset queries in MDX window --syntax: SELECT [DISTINCT] [TOP <n>] <select list> FROM $System.<schemaRowset> [WHERE <condition expression>] [ORDER BY <expression>[DESC|ASC]]

Dynamic Management Views (DMVs)

-- permissions dynamically determined at query time by DAX functions -- 1. USERNAME function -- row filter DAX sample to filter on City[Sales Territory]: =CONTAINS( SalesTerritoryPermissions, SalesTerritoryPermissions[UserLogin], USERNAME(), SalesTerritoryPermissions[FilterValue], City[Sales Territory] ) -- 2. CUSTOMDATA function -- CUSTOMDATA property can be set in connection string -- row filter DAX sample to filter on City[Sales Territory]: =IF( CUSTOMDATA()= "", FALSE(), City[Sales Territory] = CUSTOMDATA() )

Dynamic security

-- EARLIER, EARLIEST -- get data from previous row context -- sample: RunningTotal := SUMX(FILTER( Orders; Orders[OrderDate] <= EARLIER(Orders[OrderDate]))) -- EARLIEST gets the value of the outermost evaluation pass

EARLIER Function

-- IFERROR, ISERROR -- sample: IFERROR(SALES[Quantity] * Sales[Price], BLANK()) -- equivalent to: IF(ISERROR(SALES[Quantity] * Sales[Price]); BLANK(); SALES[Quantity] * Sales[Price])

Error Handling

-- 3 types of errors: -- conversion errors "1+1" + 0 -> cannot convert value "1+1" of type string to type real -- arithmetical operation errors 10/0 -> Infinity -7/0 -> -Infinity 0/0 -> Infinity (10/0) / (7/0) -> NaN but 1000 / (10/0) = 0 -- empty or missing values -- missing, blank values or empty cells are handled by special value BLANK --value BlANK can be obtained by BLANK() function IF ( Sales[DiscountPerc] = 0; BLANK(); Sales[Amount] * Sales[DiscountPerc]) -- propagation of BLANK occurs in most arthmetical operations: BLANK() / 3 = BLANK() BUT: 3 + BLANK() = 3 3 / BLANK() = Infinity 0 / BLANK() = NaN FALSE() && BLANK() = FALSE FALSE() || BLANK() = FALSE TRUE() && BLANK() = FALSE TRUE() || BLANK = TRUE BLANK() = 0 -> TRUE

Error Types

-- events can be stored in: -- Event_file (XEL file) -- Event_stream -- Ring_buffer -- for Direct Query there are two Events to monitor SQL queries that were translated from DAX: -- DirectQuery Begin -- DirectQuery End

Extended Events traceSQL

-- The FREEZE statement ensures that any statements subsequently added to the MDX script do not change the values assigned to the subcubes before. -- syntax: FREEZE [subcube]

FREEZE Statement

-- One or more optional columns in a fact table that represents a dimension value that is not stored in a separate table. -- no join is required

Fact Degenerate Dimension columns

-- One foreign key column for each dimension table that relates to the fact table. -- The combination of foreign keys represents the granularity, or level of detail, of the fact table.

Fact Foreign Key columns

-- same as for Dimension Lineage Key

Fact Lineage or Audit Key

-- One or more columns that contain numeric data that describes an event or business process. -- when there are no measures, this is called: -- factless fact table -- e.g. when you need to count occurrences of an event and have no other measurements related to the event. -- e.g. bridge table between fact and dimension table

Fact Measure columns

-- surrogate keys for dimension tables are a great idea. Surrogate keys for fact tables are not logically required but can be very helpful in the back room ETL processing

Fact Surrogate Key

-- defined by relationship between measure group and dimension object that is created from degenerate dimension

Fact dimension model

-- set of filters that define the active rows, that will be used for the calculation. -- defined by headers, rows, filters and slicers in Excel pivot table -- Filter context also can be tested in SSDT or SSMS

Filter Context

-- relationships directly affect filter context of involved tables

Filter context and related tables

-- CALCULATETABLE and FILTER samples: EVALUATE CALCULATETABLE( 'Sales Territory', 'Sales Territory'[Sales Territory Country] = "United States" ) evaluate -- references to multiple columns possible FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Internet Sales'[Due Date] )

Filtering

-- GENERATE like CROSS APPLY (evaluate table2 for each row in table 1) -- syntax: GENERATE(<table1exp>, <table2exp>) -- sample: GENERATE( 'Product Category', RELATEDTABLE( 'Product Sub-category' ) ) -- sample2 (wth activated ship Date Relationship) GENERATE( 'Date', CALCULATETABLE( 'Internet Sales', USERELATIONSHIP( 'Internet Sales'[Ship Date], 'Date'[Date] ) ) ) -- GENERATEALL like OUTER APPLY (keep outer rows from table 1 when there are no rows in table 2 for a row of table 1) -- sample: EVALUATE GENERATEALL( 'Date', CALCULATETABLE( 'Internet Sales', USERELATIONSHIP( 'Internet Sales'[Ship Date], 'Date'[Date] ) ) )

GENERATE, GENERATEALL functions

-- set affinity between SSAS operation and specific logical processors -- restrict thread pools to specific processors -- only beneficial when server has more than 64 logical processors -- Default value: not defined -- Standard Edition supports 24 cores only -- Enterprise Edition up to 640 cores -- when SSAS service starts all affinity masks are automatically computed -- cann also be set in SSAS server properties: -- Thread Pool\ Command -- Thread Pool \ IOProcess (only used by MD) -- Thread Pool \ Parsing\ Long -- Thread Poiol\ Parsing \ Short -- Thread Pool \ Process -- Thread Pool \ Query -- Vertipaq \ Thread Pool (perfmorms scans for tabular model queries)

Group Affinity Masks

-- Never (default) -- OnlyChildWithNoName -- OnlyChildWithParentName -- NoName -- ParentName e.g. set OnlyChildWithParentName on all three lower levels (state, city, name) USA -> California -> San Francisco -> Chris Webb Vatican -> Vatican -> Vatican -> Pope gets: Vatican -> Pope

HideMemberIf User Defined Hierarchy Level property

-- direct filtering -> condition on column in CALCULATE or CALCULATETABLE ISFILTERED -> returns true -- indirectly filtering -> filtered by propagation of the filter context over relationships or over columns of same table ISCROSSFILTERED -> returns true syntax: ISFILTERED(<columnName>) ISCROSSFILTERED(<columnName>)

ISFILTERED, ISCROSSFILTERED functions

-- Default True: -- shows top level aggregation value for an unrelated dimension -- False: show no rows for an unrelated dimension

IgnoreUnrelatedDimension

-- Project Type Import From Powerpivot in Business Intelligence Group of project templates

Import From Microsoft Powerpivot

-- conceptually the same in Tabular and MD -- define one measure as goal and compare another measure to that goal

KPI

-- Delegation scenarios are limited to tabular models configured for Direct Query mode. -- for processing, ROLAP access or remote partitions it is always used the service account

Kerberos Constrained Delegation configuration

-- is the attribute that is referenced in the fact table. -- marked with a key in dimension designer -- Attribute default name is inherited from key column name -- Attribute name can be changed by renaming for avoiding technical names

Key Attribute

-- decode value in a lookup table -- can also be done with RELATED (if relationship exists) -- returns error if multiple different values are returned -- if there is one distinct value as result value, even for multiple return values, it returns this distinct value without error -- syntax: LOOKUPVALUE( <result_columnName>, <search_columnName>, <search_value>[, <search_columnName>, <search_value>]...) -- sample: EVALUATE ROW( "Country", LOOKUPVALUE( Geography[Country Region Name], Geography[Country Region Code], "CA" ) ) -- another sample: ROW( "Product", LOOKUPVALUE( 'Product'[Product Name], 'Product'[Color], "Silver", 'Product Category'[Product Category Name], "Accessories" ) )

LOOKUPVALUE function

-- A linked dimension is based on a dimension created and stored in another Analysis Services database of the same version and compatibility level -- can introduce performance problems -- alternative: -- Another way to think about building once and reusing your development work is to save the .dim files in source control and reuse it in other projects

Linked Dimension

-- Memory\HardMemoryLimit -- Default: 0 (between TotalMemoryLimit and Physical available memory) threshold at which server rejects any new requests -- Memory \ LowMemoryLimit -- Default: 65 (percentage of physical memory) -- first threshold beginning to clear infrequently used objects from memory -- Memory \ TotalMemoryLimit -- Default: 80 (percentage of physical memory) threshold for aggressively releasing memory must be set lower than HardMemoryLimit ALL properties are specified in percent of physical memory. Low and Total can be specified absolutely in bytes (when specified as > 100)

MD Memory Limits

-- if planned to use multiple partitions in a cube, remove the definition of the initial partition first and then add new partitions -- multiple tables (e.g. partitioned by date) can be specified as separate fact tables for one measure group to define multiple partitions for the measure group -- bind partitions to views instead of tables -- table-binding partitions: select whole table -- query-binding partitions: restrict partition by query -- Query binding partitions should be one of the last development steps before deploying -- specify where SSAS performs the processing and where the data is stored -- Query binding needs to ensure no overlapping source data. Each Query needs to return the same columns and each WHERE clause needs to return a unique set of rows.

MD Processing management

-- general goal: reduce the amount of time spent by storage engine. -- review structure of dimensions, aggregations and partitions

MD model optimization

-- add indexes to relational source tables -- Fact table: clustered ind index typically on date dimension key column -- Dimension: Unique clustered index -> key attribute column Nonclustered Index -> High cardinality attribute column

MD processing tuning

-- no query plans -- use SQL Profiler to analyze division of labor between engines for specific queries -- Windows Performance Monitor for general performance on the server

MD query monitoring

-- 1. MDX Parsing by Query Parser -- 2. populating axes by Formula Engine either from dimension cache or storage -- 3. column intersections are combined with WHERE clause to identify tuples to retrieve vom cube to compute cell data -- 4. logical and physical execution plan building. Sending physical execution plan to cache subsystem as a series of subcube operations. -- 5. subecube operations request the storage engine to return fact data from the followinng locations: -- storage engine cache (empty when SSAS service starts) -- aggregations store on disk -- fact data store on disk -- 6. calculation of calculated measures. If calculation comes from model, result is stored in formula engine cache

MD query processing sequence

-- uses expressions to define value, target, status and trend icon -- parent-child KPIs possible for weighted allocation for scorecard-style applications -- in status expression use KPIVALUE and KPIGOAL functions to define output somewhere between -1 and +1 KPISTATUS, KPITREND and KPIWEIGHT also can be used -- in trend expression current status is compared with the past and a value somewhere between -1 and +1 is returned: --sample: CASE WHEN IsEmpty(ParallelPeriod([Invoice Date].[Calendar].[Calendar Year], 1, [Invoice Date].[Calendar])) THEN 0 WHEN [Measures].[Sales Amount Without Tax] > (ParallelPeriod([Invoice Date].[Calendar].[Calendar Year], 1, [Invoice Date].[Calendar]), [Measures].[Sales Amount Without Tax]) THEN 1 WHEN [Measures].[Sales Amount Without Tax] < (ParallelPeriod([Invoice Date].[Calendar].[Calendar Year], 1, [Invoice Date].[Calendar]), [Measures].[Sales Amount Without Tax]) THEN -1 ELSE 0 END

MD: KPI

-- advantages: -- caching of calculation results on the server -- faster for subsequent queries -- created by updating MDX Script

MDX Calculations

-- default measure: implicit defined by cubes measure on Cube structure page of the cube designer: first measure of first measure group explicit defined by "Default measure" property of cube. (can not be determined by SSMS or Cube Browser of SSDT) -- Default member: implict defined as the All member at the top level of an attribute hierarchy explicit defined by the "Default member" property of the attribute hierarchy

MDX Default measure and Default member

-- Tabular query engine accepts MDX queries -- Excel Pivot Table can be used to query Tabular Model

MDX Support

1. FROM (including potential subselects) 2. WHERE 3. SETs and MEMBER location in the WITH clause 4. Build the list of tuples for all query axes (columns, rows, ...) in parallel, ignoring NON EMPTY and HAVING 5. Cell values for all axis intersections 6. Remove tuples from the axes as requested by NON EMPTY and HAVING for each axis.

MDX: Logical Execution Order

-- rollup up aggregated values within dimension by different members without overcounting values -- needs physically dimension table, fact bridge table, dimension bridge table (can be the fact table itself, if fact dimension is defined) and fact table -- logically bridge measure group and dimension must be created -- bridge measure group must share at least one dimension with fact measure group

Many-to-Many dimension model

-- collection of measures that come from the same fact table

Measure Group

-- Set granularity attribute in dimension usage tab of cube designer. -- Normally set as the one that is set as the key attribute of the dimension. -- Example: Sale (Fact), Forecast(Fact), Date (Dimension) Sale (granularity = day), Forecast (granularity = month) Configure dimension usage for Forecast measure group and the Date dimension by creating regular relationship between them and setting the granularity attribute to month attribute.

Measure Group Granularity

-- IgnoreUnrelatedDimension (Default True): Display top level aggregation value for all members in unrelated dimension -- Name

Measure Group properties (Basic)

-- AggregateFunction -- DataType -- DisplayFolder -- MeasureExpression: Defines an MDX expression that resolves the value for a measure at the leaf level before aggregation. -- Visible: if false you still can reference measure in MDX expression -- Description -- FormatString -- Name -- Source

Measure Properties

-- DisplayFolder -- Description -- Format -- Measurename for renaming -- Table Detail Position (defines Standard Fields to show up in Client Tools when double clicking Tablename)

Measure Properties

-- member time functions: -- Openingperiod (similar to Closingperiod) OpeningPeriod( [ Level_Expression [ , Member_Expression ] ] ) -- Parallelperiod: ParallelPeriod( [ Level_Expression [ ,Index [ , Member_Expression ] ] ] ) -- index > 0 -> lag ( in the past ) -- index < 0 -> future PARALLELPERIOD( [Invoice Date].[Calendar].[Calendar Year], -1, [Invoice Date].[Calendar].[CY2015-Jan]) -- yields [Invoice Date].[Calendar].[CY2016-Jan]

Member Time functions

-- Use all data -- Use no data (modeling tasks can be performed faster) -- Use a subset of data (to see effect of modeling changes) -- DirectQuery Partition (exactly one must be defined) -- Sample Partition (more than 1 can be defined) -- if more than one sample partition are defined, all results from each partition's query are combined into the sample set.

Modeling in DirectQuery Mode

-- CALCULATE with boolean filter is faster than FILTER -- sample: EuropeSales := CALCULATE(SUM(Orders[Amount]; Cities[Continent] = "Europe") -- eq: SUMX(FILTER(Orders; RELATED(Cities[Continent]) = "Europe"); Orders[Amount]

Modifying filter context for multiple tables

-- cell-level -- dimension

Multidimensional Security

-- applied at final step of logical query execution -- remove empty columns or rows

NON EMPTY Keyword

-- operates like a filter -- NONEMPTY([City].[City].members, [Measures].[Sales Amount Without Tax])

NONEMPTY function

-- Non-union memory access -- both MD and TAB are NUMA aware by default -- processor group: container for a set of up to 64 logical processors -- one NUMA node typically assigned to one processor group -- PerNumaMode property can be set to: -1: when less than 4 NUMA nodes -> one IOProcess thread pool 0: only one IOProcess thread pool 1: one thread pool per NUMA Node 2: one thread pool per logical processor

NUMA

-- A natural hierarchy is a hierarchy for which each attribute has a one-to-many relationship between dimension members on a parent level and dimension members on a child level.

Natural Hierarchy

-- Ancestor: Level syntax Ancestor(Member_Expression, Level_Expression) Numeric syntax Ancestor(Member_Expression, Distance) -- Parent, Children, Siblings, Prevmember, Nextmember

Navigation Functions

-- multiple query servers can be combined to an NLB cluster with one virtual IP Address -- keep one master database and use synchronisation to ensure each of the query servers has an exact copy of the master database

Network Load Balancing (NLB)

-- CALENDAR(<start_date>, <end_date>) -- CALENDARAUTO([fiscal_year_end_month]) -- DATEDIFF(<start_date>, <end_date>, <interval>) -- SUBSTITUTEWITHINDEX(<table>, <indexColumnName>, <indexColumnsTable>, [<orderBy_expression>, [<order>][, <orderBy_expression>, [<order>]]...]) -- ISONORAFTER(<scalar_expression>, <scalar_expression>sort_order] [,scalar_expression>, <scalar_expression>, [sort_order][,...]) -- PRODUCT(<column>) -- PRODUCTX(<table>, <expression>) -- DATATABLE (ColumnName1, DataType1, ColumnName2, DataType2..., {{Value1, Value2...}, {ValueN, ValueN+1...}...}) -- GROUPBY (<table>, [<groupBy_columnName1>], [<name>, <expression>]... ) -- ISEMPTY(<table_expression>) -- NATURALINNERJOIN(<leftJoinTable>, <rightJoinTable>) -- NATURALLEFTOUTERJOIN(<leftJoinTable>, <rightJoinTable>) -- SUMMARIZECOLUMNS( <groupBy_columnName> [, < groupBy_columnName >]..., [<filterTable>]...[, <name>, <expression>]...) -- UNION(<table_expression1>, <table_expression2> [,<table_expression>]...) -- VAR <name> = <expression> -- GEOMEAN(<column>) -- GEOMEANX(<table>, <expression>) -- MEDIAN(<column>) -- MEDIANX(<column>) -- PERCENTILE.EXC(<column>, <k>) -- PERCENTILE.INC(<column>, <k>) -- PERCENTILEX.EXC(<table>, <expression>, k) -- PERCENTILEX.INC(<table>, <expression>;, k) -- XIRR(<table>, <values>, <dates>, [guess]) -- XNPV(<table>, <values>, <dates>, <rate>) -- CONCATENATEX(<table>, <expression>, [delimiter])

New DAX function SSAS 2016

-- orderBy has 4 settings: -- Key: This value is the default for every attribute except the key attribute if you assigned it a name column in the Dimension Wizard because every attribute always has a key column, but optionally has a name column. -- Name: The sort order is set to Name for the key attribute by default when you specify a name column in the Dimension Wizard. Any attribute in a dimension can have separate columns assigned as the key column or name column, so you can define the sort order based on the name column if that is the case. -- AttributeKey: You can also define the sort order for an attribute based on the key column value of another attribute (OrderByAttribute property) , but only if an attribute relationship exists. -- AttributeName: This option is like the AttributeKey sort order, except that the ordering is based on the name column of the related attribute (OrderByAttribute property) instead of the key column.

OrderBy property

-- can be set in the Process Partition or Dimension dialog box under Change settings... -- Processing Order: Default: Parallel SSAS performs processing operations in parallel Maximum parallel tasks can be set from 1,2,4..., 128 should be set to 1,5 to 2 times the number of CPUs available on server. All tasks a treated as single transaction. If any task fails they all fail and transaction is rolled back. When Sequential is selected SSAS processes each object one after the other. Transaction Mode: Default: One transaction can be set ONLY when Sequential is selected for Processing Order Set to Seaparte transactions to rollback failed processing jobs on an individual basis. Writeback Table Options: Create (fails if writeback table already exists), Create always (writeback table overwritten), Use Existing (create new if none exists) -- Process Affected Objects: let SSAS automatically process objects that have a dependency on the object you have selected for processing in the current operation.

Parallel, Sequential and Writeback processing settings

-- set Attribute Usage property of Parent attribute to Parent -- child attribute is key attribute

Parent Child Dimension

-- can be defined to support scorcard applications -- at minimum a Parent KPI should define the status expression like this: (KPIStatus("Sales Goal") * KPIWeight("Sales Goal")) + (KPIStatus("Profitability") * KPIWeight("Profitability")) + (KPIStatus("Customer Growth") * KPIWeight("Customer Growth")) -- in the children KPIs a weight as hardcoded value or expression can be defined. -- The sum of the weights of all child KPIs should be 1

Parent KPI

-- created from dimension table that includes a foreign-key relationship to itself -- varying number of levels -- bad performance

Parent-Child dimension model

-- MD: Partitions are processed in parallel -- Tabular Partitions are processed serially

Partition Processing Comparison Tabular and MD

-- designation of MOLAP, HOLAP or ROLAP to the partiion PLUS a specification of how the partition responds to changes in the relational source, if at all. -- folowing storage options available: -- MOLAP detail data + aggregations in MD format processing only manually or on a scheduled basis. fast data access , no realtime data required -- Scheduled MOLAP Data is refreshed every 24 hours, no control over the schedule -- Automatic MOLAP replacement for infrastructure with SSIS or scheduled SQL Server agent job. PROACTIVE CACHING: when server detects changes to the source data, it waits for 10 seconds for activity to end in the source data but proceeds after 10 minutes if activity does not end. -- Medium-latency MOLAP same as Automatic MOLAP. It differs with regard to data latency. If data in MOLAP partition from which queries are answered is older than 4 hours the partition switches to ROLAP mode and asnwers queries directly from relational source until processing completes. -- Low-latency MOLAP same as Medium-latency MOLAP except that the data latency is 30 minutes. -- Real-time HOLAP: only aggregations stored in MD format. Detail data kept in relational source. Aggregations are updated when SSAS receives notifications of data change. During aggregatino processing, queries switch to ROLAP until processing is complete. -- Real-time ROLAP: detail data + aggregations are kept in realtional source. Use when data changes frequently and queries require current data. -- custom setting: use a specific combination of silence interval, silence interval override and data latency interval.

Partition Storage

-- in tbe Aggregations page of cube designer click Assign Aggregation Design, select a design form drop down list and then select the check box for each partition to which you want to assign the aggregation

Partition aggregations

-- Tabular: any table can be partitioned no effect on query performance -- Multidimensional: only measure groups can be partitioned can improve query performance in some cases

Partition comparison Multidimension and Tabular

-- processing can be started from SSDT or SSMS -- Process Default, Process Full, Process Data, Process Clear -- batch settings and error configuration can be overridden -- in SSMS XMLA script can be generated to control a single parallel processing operation or sequencing a serial processing operation

Partition processing configuration

-- Default: One partition per measure group -- Multiple partitions per measure group beneficial for both processing and query performance -- faster query performance through partition elimination (if partition does not contain the data requested SSAS does not read the partition) -- different aggregation levels can be defined for each partition. Example: Design higher level of aggregation for frequently-queried current data, but design fewer aggregations for less-frequently queried hostorical data -- storage mode can be specified by partition (MOLAP, HOLAP or ROLAP) -- partitions can be placed on seperate drives or even on seperate servers to distribute disk IO operations -- processing operations can also run faster -- separate current from historical data -- several partitions can be processed in parallel -> faster than serially or one big partition

Partitions

-- counters for both MD and Tabular begin with MSAS13 or MSOLAP$<instancename> -- ensure to test a single database in isolation -- therefore detach all database other than the one to test -- after testing reattach the databases previously detached

Performance Monitor

-- determine ratio of formula engine and storage engine to overall query time -- determine query tuning efforts by comparing these ratios -- storage engine time = sum(all Query Subcube events) -- formula engine time = Query Duration - storage engine time

Performance tuning SSAS MD queries

-- determine ratio of formula engine and storage engine to overall query time -- determine query tuning efforts by comparing these ratios -- storage engine time = sum (Vertipaq SE Query End subclass Vertipaq Scan)

Performance tuning SSAS Tabular queries

-- Key Error Action if fact table references a dimension key that does not yet exist in the dimension: Possible values: Discard record or Convert to Unknown -- Key Error Limit: perform action KeyErrorLimitAction (StopProcessing or StopLogging) if this count of errors is reached -- Key not found possible values: ReportAndContiune(Default), IgnoreError or ReportAndStop -- Duplicate Key possible values: ReportAndContiune, IgnoreError (Default) or ReportAndStop -- Null Key Converted to Unknown if Key Error Action set to Convert to Unknown this setting determines the effect on processing. ReportAndContiune, IgnoreError (Default) or ReportAndStop -- NULL key not allowed if Key Error action set to Discard Recrd, this setting determines the effect on processing ReportAndContiune(Default), IgnoreError or ReportAndStop -- Error Log Path: specify the path and file name for the error log which SSAS updates when you set Report and Continue or Report and stop for any error action

Processing Error Handling

-- Process Default SSAS determines the processing method to bring object to fully processed state. If unprocessed SSAS performs Process Full, if indexes or aggregations are not processed, SSAS performs Process Index -- Process Full the object is deleted and rebuilt. If object is a dimension Process Full must be performed on ALL measure groups related to the dimension -- Process Clear the data in the object an ANY of its dependencies are removed from model -- Process Data data is loaded either in to a dimension or a partition object. Deletes and rebuilds target object. NO indexes for dimensions or partitions and NO aggregations for partitions are rebuilt. When applied to dimension, ALL related partitions need to have Process Data applied. -- Process Add FAST operation can only be used in an XMLA script. When applied to dimension it compares data in relational table with members in dimension and adds members that are not found in dimension. Change of existing members is not handled. Aggregations and Indexes in partitions are preserved. When applied to measure group or partition a source table is specified or a query that contains only new rows. SSAS loads data into a new partition that it creates fully processes it and then merges the partition into the target partition -- Process Update can be applied ONLY to dimensions. Members are deleted if they no longer exist in source table, new members are added, members are updated to reflect changes found in relational data. If FLEXIBLE relationship types exist on any level of a hierarchy aggregations and indexes are dropped in all related partitions. -- Process Index normally used after using Process Data. Rleational source is not accessed during this operation. Builds bitmap indexes for dimension objects and indexes and aggregations for partition objects. -- Process Structure can be ONLY applied to cubes or mining structures. Processes only dimensions and cube but not contained mining models, if any. Shortens processing time if cube contains mining models. -- Process Clear Structure can ONLY be applied to mining structures. Used to clear training data from a mining structure

Processing methods

-- simplest: PROCESS FULL on database only feasible when there is enough time top process every dimension and partition and rebuild indexes and aggregations. -- not an option in most SSAS implementations -- process dimensions, measure groups and partitions separately -- dimension processing: can be shortened by performing Process Update only for dimensions that are known to have been changed and then run Process Default on cube to rebuild indexes and aggregations dropped for flexible realtionships. If dimension never has changes on existing rows and only adds new rows over time, Process Add should be used for fast processing. Process Data in combination with Process Index can be used for huge dimensions instead of Process Full when encountering processing issues that contains millions of rows. --partition processing: slower than dimension processing because of large data volumes. Process Full for new partitions needed or when a reload is necessary.

Processing strategy

-- Processing Option () -- Default: automatically processes objects if necessary after deployment -- Do Not Process: processing objects as seperate step -- Full: process completely after each deployment

Project Processing Option

-- defaults to Deploy Changes Only -- deployment of new or changed objects only -- Deploy All deploys all objects

Project Server Mode Option

-- defaults to False -- determines if deployment is transactional -- rollback deployment if processing fails

Project Transactional Deployment Option

-- When you set an attribute's AttributeHierarchyEnabled property to False, you can access the value of that attribute by using the PROPERTIES function.

Properties Function

-- split complex calculations into simpler parts -- syntax: DEFINE MEASURE tableName[columnName] = <expression> MEASURE tableName[columnName] = <expression> ... EVALUATE tableexpression -- sample: DEFINE MEASURE 'Product Category'[Subcategories Count] = COUNTROWS( RELATEDTABLE( 'Product Subcategory' ) ) MEASURE 'Product Category'[Products Count] = COUNTROWS( RELATEDTABLE( 'Product' ) ) EVALUATE ADDCOLUMNS( 'Product Category', "'Product Category'[SubCategories]", [Subcategories Count], "Products Count", 'Product Category'[Products Count] )

Query Scoped Measures

WITH MEMBER [Measures].[Average Sales Amount] AS [Measures].[Sales Amount Without Tax] / [Measures].[Quantity], FORMAT_STRING = "Currency"

Query Scoped calculations

-- scalar ranking function -- compares current Product Name over ALL Product Names -- syntax: RANKX(<table>, <expression>[, <value>[, <order>[, <ties>]]]) -- sample: EVALUATE CALCULATETABLE( addcolumns( summarize( Product, Product[Product Name] ), "Sales", 'Internet Sales'[Internet Total Sales], "Rank", RANKX( ALL( Product[Product Name] ), 'Internet Sales'[Internet Total Sales] ) ), Geography[English Country Region Name] = "United States" ) ORDER BY [Sales] DESC GO

RANKX function

-- return expressions when there is not table -- EVALUATE has to return table -- syntax: ROW(<name>, <expression>[[,<name>, <expression>]...]) -- sample: EVALUATE ROW( "Sales 2002", CALCULATE( SUM('Internet Sales'[Sales Amount]), 'Date'[Calendar Year] = 2002), "Sales 2003", CALCULATE( SUM('Internet Sales'[Sales Amount]), 'Date'[Calendar Year] = 2003) )

ROW Function

-- Both Parent-Child and Standard Hierarchy can be ragged -- User Hierarchy can be turned into a Ragged Hierarchy by using HideMemberIf property of user hierarchy levels

Ragged Hierarchy

-- create reference dimension to analyze measures for which a relationship does not exist directly in the fact table. -- instead fact table has relationship with another dimension which has a relationship with the reference dimension. -- reference dimension column must exist in intermediate dimension -- not recommended as a best practice because of performance issues -- better include foreign key for each dimension into the fact table

Reference Dimension model

-- when not meet the criteria for any other dimension model, consider it a Regular Dimension

Regular dimension model

-- RELATED -- returns value from the one-side of a relationship (can hop multiple relationships) -- returns BLANK if no row is found -- sample (calculated column in orders table) = if( RELATED('SalesTerritory'[SalesTerritoryCountry]) = "Germany"; "€"; "$") -- RELATEDTABLE -- access to the many side of one to many relationship. -- sample (calculated column in ProductCategory table) = COUNTROWS( RELATEDTABLE (DimProduct))

Relational Functions

-- ManyToOne -- OneToOne Composite Relationsships are not allowed. Instead calculated column has to be used.

Relationship Cardinality

-- single table (Default) filter type is one-directional (Table 1 filters Table 2) -- both tables filter type is bi-directional (Table 1 filters Table2 and vice versa) sample: city dimension table filters on sales (like in single table type relationship) and filtered date (by year) on sales filters city dimension table -- Default can be changed for Project in Model.bim file Properties in Solution Explorer -- Default for Visual Studio can be changed in Tools/Options...

Relationship Filter direction

-- Active (only one to be able to change) -- only ONE Active property between two tables can be set to TRUE. -- Relationship with Active set to false can be used by DAX function USEREALTIONSHIP (-> roleplaying dimension)

Relationship Properties

-- only for SSAS MD available -- if the processing of multiple large partitions need to speeded up multiple remote partitions on the processing server can be used -- remote partitions for a cube (called master database) are stored in dedicated SSAS database (secondary database) on remote SSAS instance -- master database: can use multiple secondary databases on one remote instance of SSAS

Remote Partitions

-- only one database dimension (.dim file) is mapped to multiple dimensions in the cube (cube dimension) -- different result in queries is the result of the join to a different column in the measure group table.

Role Playing Dimension model

-- is counterintuitive -- if user is member of two roles and one role grants permission to view data while another rolte removes permission, the permissions are combined and the user can view the data.

Role based security is additive

-- permissions: -- Full Control (Administrator) full control --Process Database process by SSIS and SSMS --Read Definition script database, view metadata without granting ability to modify the objects or read data

Role permissions

-- Expressions defined in calculated column is evaluated in row context which is automatically defined for earch row in table -- X-Functions activate row context for each of iterated rows

Row Context

-- restrict users to only see a portion of the model -- Row Filter security -- Dynamic security

Row Level Security

-- if CALCULATE and CALCULATETABLE are written in expression that is evaluated in an active row context, the row context is transformed into filter context made by all equivalent rows -- sample: -- calculated column: 1. CALCULATE( SUM(ORDERS[Quantitiy]) -> result same as column Quantity 2. SUM(ORDERS[Quantitiy]) -> result is the sum of all quantities of the table for all rows (row context not tranformed to filter context)

Row context and CALCULATE

-- only rows in tables can be filtered -- no measures, columns or perspectives -- Row are secured by DAX expressions like ='Stock Item'[Brand] = "Northwind" -- can be tested by "Analyze in Excel" and activate role(s) -- in connection string properties: EffectiveUserName=DomainName\UserName Roles=Role1,Role2,... -- in SSMS with additional connection properties when connecting to server -- Row filter changes take effect immediately, no reconnect necessary

Row filter security

-- overwrite -- only current data is tracked

SCD Type 1

-- add new row -- current data and historical data is tracked -- validFrom, validTo (closed open interval) -- validTo = 99991231 or NULL for row with current values -- validTo = date of update to expire the updated row

SCD Type 2

SCOPE(Subcube_Expression) [ MDX_Statement ] END SCOPE Subcube_Expression ::=(Auxiliary_Subcube [, Auxiliary_Subcube,...n]) Auxiliary_Subcube ::= Limited_Set | Root([dimension_name]) | Leaves([dimension_name]) Limited_Set ::= single_tuple | member | Common_Grain_Members | hierarchy.members | level.members | {} | Descendants ( Member , [level [ , SELF | AFTER | BEFORE | SELF_AND_AFTER | SELF_AND_BEFORE | SELF_BEFORE_AFTER | LEAVES ] )

SCOPE FULL syntax

-- used together with assignment operations in MDX script -- restrict calculations in MDX script to specific cells and specific points at which the SSAS engine calculates cell values. -- syntax(simplified) SCOPE(subcube) mdx statemnt (e.g. THIS = ...) END SCOPE

SCOPE statement

-- Progress Report Begin begins reading partition data if data is not in cache -- Progress Report Current -- Progress Report End -- Query Begin query received bei SSAS -- Query End access duration of query in duration column -- Calculate Non Empty Begin -- Calculate Non Empty End -- Get Data From Aggregation storage engine retrieves data from aggregation -- Get Data From Cache storage engine retrieves data from cache -- Query Cube Begin formula engine begins executing the query -- Query Cube End -- Query Dimension storage engine retrieves dimension members -- Serialize Results Begin after calculations are complete -- Serialize Results Current -- Serialize Results End results have been sent back to client -- Query Subcube the storage engine retrieves a subcube from cache or disk. -- Query Subcube Verbose

SQL Profiler MD Trace Events

-- Query Begin -- DAX Query subclass Plan Logical -- Vertipaq SE Query Begin -- Vertipaq SE Query End -- DAX Query subclass Plan Physical -- Query End

SQL Profiler Tabular Trace Events

-- aplies to MD only, because Tabular is optimized for memory storage and access -- DataDir: place for files storing database objects and metadata -- Default: Program Files\Microsoft SQL Server\MSAS13.<instance name>\OLAP\Data can be overriden for database by specifying alternate locations for cube partitions. When doing this each path must be added to AllowedBrowsingFolders seperated by pipes -- LogDir: log files, flight recorder and query logs when using usage-based optimization -- TempDir: temporary files created during processing. Can be placed on seperate volume if experiencing issues during processing.

SSAS Disk Layout

-- run SSAS MD and Tabular on dedicated server without SSQL, SSIS, SSRS -- if source warehouse is small SSAS MD can be run together with SSQL to reduce hardware costs -- SSAS Tabular in a standalone environment because of high memory requirements. -- separate querying from processing with SSAS Processing and Query Server -> process database and copy processed database to query server

SSAS instance placement

-- sample: Sales Group:= SWITCH( TRUE(), [Total Sales] < 1000000, "Under 1,000,000", [Total Sales] < 10000000, "Between 1,000,000 - 10,000,000", [Total Sales] < 20000000, "Between 10,000,000 - 20,000,000", "Over 20,000,000" )

SWITCH function

-- Role based -- only integrated security -- Roles are additive (unlike in SQL Server Database) Server administrator role + Database role (read only) yields Server administrator also in database. -- two Roles: - Server Administrator - Database role (adminstrative or read permission at database level)

Security

-- after deployment only members of this role has permissions in the database -- default: local administrators + SSAS Service Account local admins and SSAS service account can be removed by configuring server properties: Security \ BuiltinAdminsAreServerAdmins Security \ ServerAccountIsServerAdmin -- can also be changed in C:\Program Files\Microsoft SQL Server\MSAS13.SQL2016TAB\OLAP\Config\msmdsrv.ini

Server Administrator

-- set time functions: PeriodsToDate PeriodsToDate( [ Level_Expression [ ,Member_Expression ] ] ) -- sample: PERIODSTODATE( // The level that determines the period of time to traverse [Invoice Date].[Calendar].[Calendar Year], // The last member of the set to return [Invoice Date].[Calendar].[CY2016-Mar])

Set Time functions

-- create dimension and... -- add SCD attribute(s) -- add attribute for surrogate key (which must exist for a SCD type 2) and name as History attribute (stock item history) -- add attribute for business key and name it as business attribute (stock item) -- set AttributeHierarchyVisible property to false for the history attribute

Slowly changing dimension model (SCD)

-- add sorting to column

SortByColumn column property

-- MOLAP: -- performs fastest, needs most storage space, requires most time to process -- ROLAP: processing much faster than MOLAP, no data retrieval from datasource, only consistency checking, aggregations are stored in data source. MDX gets automatically converted in T-SQL (for SQL Server data source), slower query performance, good for hundreds of millions of rows in dimension table -- HOLAP: -- Detail data is kept in data source -- Aggregated data and indexes are loaded into MOLAP storage. Querie slow to resolve. Storage requirements not lower than MOLAP because space is needed to read all data and build aggregations and indexes. This data is disposed afterwards. -- !!!! HOLAP applies only to cube partitions, NOT for dimensions !!!!!

Storage Models

-- cube data and dimension data is always stored seperately -- different storage models for cube data and dimension data can be specified. -- cube data can be seperated into different partitions, each using different Storage Model

Storage Modes

-- SetToStr: SetToStr(Set_Expression) -- StrToMember: StrToMember(Member_Name [,CONSTRAINED] ) sample: WITH MEMBER [Measures].[Set Members] AS SETTOSTR(YTD([Invoice Date].[Calendar].CURRENTMEMBER)) SELECT [Measures].[Set Members] ON COLUMNS, { STRTOMEMBER("[Invoice Date].[Calendar].[CY2016-" + FORMAT(NOW(),"MMM") + "]") : STRTOMEMBER("[Invoice Date].[Calendar].[CY2016-" + FORMAT(NOW(),"MMM") + "]").NEXTMEMBER.NEXTMEMBER } ON ROWS FROM [Wide World Importers DW];

String functions

-- contains always the All member of a hierarchy if no members of that hierarchy are explicitly defined -- otherwise can contain the following elements: -- a single member -- some or all members of a single level of an attribute or user-defined hierarchy -- sample: ([Invoice Date].[Calendar Year].[CY2014], [Invoice Date].[Calendar].[Calendar Month].Members, [Measures].[Sales Quota])

Subcube expression

-- base measure -- target measure -- ration between base and targetb (Default 40% and 80%)

TM: KPI

-- limit number of rows returned -- TOPN does not garantee any sort order for the result->ORDER BY is necessary for ordering -- returned row depend on orderBy_expression (1 or 0= default = DESC) -- sample: EVALUATE TOPN( 5, SUMMARIZE( 'Stock Item', 'Stock Item'[Stock Item], "Sales", [Total Sales] ), [Sales] ) ORDER BY [Sales] DESC -- syntax: TOPN(<n_value>, <table>, <orderBy_expression>, [<order>[, <orderBy_expression>, [<order>]]...])

TOPN function

-- stored in memory just like any other column too -- unlike measures that are calculated at query time

Tabular Calculated Columns

-- built using DAX expression (Table function) -- sample to fill a Calculated Table for Dates: CALENDAR (DATE (2005, 1, 1), DATE (2015, 12, 31)) or CALENDARAUTO() -- latter searches for all used dates in all tables all columns and creates range of dates to include all of them -- both functions return DAX Table

Tabular Calculated Table

-- In-Memory: Changes in source tables are not automatically reflected in database. Either refresh whole model or only specific partitions. -- DirectQuery: Changes automatically are reflected in database.

Tabular Data Refresh

-- Server -- Database -- Cube Name -- Processing Default: process only objects that are not in a processed state Do not process: no processing after deployment Full: process all objects after deployment Transactional Deployment: True: Deployment and Processing = One transaction False: No Transaction

Tabular Deployment Properties

-- in-memory data management feature, also called XVelocity

Tabular Engine

-- in MDM (Multidemensional Model) there is a distinction between measures and calculated measures -- in TM all measures are calculations (at minimum an aggregate function applied to a numeric column) -- expressed in DAX (Data Analysis Expressions)

Tabular Measures

-- Memory\HardMemoryLimit -- Default: 0 (between TotalMemoryLimit and Physical available memory) threshold at which server rejects any new requests -- Memory \ LowMemoryLimit -- Default: 65 (percentage of physical memory) -- first threshold beginning to clear infrequently used objects from memory -- Memory \ TotalMemoryLimit -- Default: 80 (percentage of physical memory) threshold for aggressively releasing memory must be set lower than HardMemoryLimit -- VertipaqPagingPolicy -- Default: 1 (Paging enabled) -- enables or disables SSAS to page to disk -- VertipaqMemoryLimit -- Default: 60 (percentage of total physical memory) -- total memory SSAS server uses for storing all in-memory databases before paging to disk (or without paging, see VertipaqPagingPolicy) -- requires memory for processing operations, to a lesser extent query resolution, but enogh memory to store each database

Tabular Memory Limits

-- supports more data source types -- BISM Multidimensional only supports relational data sources -- BISM Tabular also supports Excel, Text, SSRS, SSAS, and others...

Tabular Model

-- set in server properties (Extended) -- VertiPaqPagingPolicy allow paging if available memory is low -- VertiPaqMemoryLimit expressed as a percentage of total memory when exceeded out of memory error occurs if VertiPaqPagingPolicy = 0 otherwise paging occurs.

Tabular Model Memory configuration

-- GOAL for processing and queries: reduce memory footprint -- directly related to size of the tables and dictionaries of each column (cardinality of column) -- remove columns that are not required for analysis or calculations -- remove rows that are not required for analysis, e.g. time periods in the past that do not contribute helpful insight -- eliminate key columns from fact tables -- split date/time column into separate date and time columns -- use ETL to consolidate snowflaked dimensions, lookup tables and junk dimensions -- eliminate calculated columns in large tables because no compression is applied to them -- use currency instead of decimal whenever possible -> only 4 decimal places -- monitor Rows Read/Sec in MSOLAP$TABULAR - Processing to determine if processing benefits from changes -- when relational source is partioned create matching partitions in the model partitions and ensure a one-to-one mapping between relational and model partitions. -- tune relational query. In tabular partition definitions you can use ORDER BY, CTEs and stored procedures. The query is not modified by SSAS during processing

Tabular Model optimization

-- partitions defined by Partition Manager -- if each partition defined by same source table, define filter for each partition -- for dynamic partition strategy use a script to define partitions. Execute SSIS package to execute SSAS DDL Task and schedule on regular basis.

Tabular Partitions

-- help users to more easily use the objects they need for analysis -- not a security feature

Tabular Perspectives

-- Process Add valid only for partition load new rows into a partition supported only by TMSL (Tabular Model Scripting Language) or TOM (Tabular Object Model) dictionary is updated and dependant objects are recalculated -- Process Clear valid for database, table and partition clear all data save space before doing process full -- Process Data valid only for table and partition loads data but makes no changes to dependant objects run Process Recalc after all tables have been loaded -- Process Default valid for database, table and partition determines state of each object. If unprocessed or partially processed brings to processed state by loading data. Performs Recalc only on objects invalidated by previous Process Data operation Perform Process Recalc after operating Process Default on Table or Partition level -- Process Defrag valid for database and table when you process or remove partitions independently, the dictionary is not rebuilt. Old Values (not used anymore) are not removed from dictionary. rebuilds table data and optimizes dictionary for all partitions. Does not require read operations on data source. at database level only can be executed by TMSL. -- Process Full valid for database, table and partition for database: every partition of every table is refreshed and dependent objects are recalculated. For new table one dictionary is created and at least one file per column. Multiple files if table is partitioned. -- Process Recalc valid for database only should always be executed after Process Data operation.

Tabular Processing Options

-- supported by TMSL if any operation fails original state of the model is preserved.

Tabular Processing Transaction Support

if model is deployed first time or if data changes dependent objects also need to be processed: -- Calculated columns -- Indexes -- Relationships (value, row number)

Tabular Processing of Dependent Objects

-- accessed by Model.bim properties -- Database Backup: create backup of workspace database as .ABF file each time model is saved (not possible for remote SSAS instance) -- Workspace database: name of workspace database (readonly) -- Workspace retention Unload From Memory (keeps database on disk, unloads from memory) Keep in Memory (keeps database in memory after unloading Visual Studio) Delete Workspace (removes database from disk and memory after unloading Visual Studio) -- Workspace Server use local machine if possible for best performance

Tabular SSDT Workspace Database Properties

-- configured in Model.bim DirectQuery-Mode Property -- In-Memory DirectQuery-Mode = False stores data on disk when SSAS Service is not running. When starting metadata of deployed databases is loaded into memory. When first user queries database, it is loaded into memory. When SSMS is used to connect to SSAS service, all databases are loaded into memory. Loaded database stay in memory until SSAS is stopped. -- DirectQuery DirectQuery-Mode = True

Tabular Storage Modes

-- 1. Query parser checks for MDX or DAX -- 2. MDX formula engine is invoked which sends DAX requests for measures to DAX formula engine -- 3. DAX formula engine generates query plan -- 4. Vertipaq storage engine processes query plan -- storage engine sends results back to formula engine -- shortterm Vertipaq cache is held to benefit multiple requests of same data in same query.

Tabular query processing sequence

-- natural as well as unnatural hierarchies are supported -- provide no performance benefits -- no ragged or parent-child hierarchies -- all columns used to create hierarchy must be in same table -- in Snowflake schema calculated measures have to be set up to build hierarchy: RELATED(ProductCategory[ProductCategoryDescription])

Tabular: Hierarchies

-- isolate to single user and query or processing to test -- CPU view can switched to logical processor view -- if only one processor is active when starting query or processing then formula engine is active -- if all processors are active storage engine is active

Task Manager

-- the following functions return a table with one date column -- applied with following pattern: CALCULATE([Total Sales], DATESYTD('Date'[Date])) -- PREVIOUSMONTH(<dates>) -> takes first date of column <dates> and returns all dates of the month before -- SAMEPERIODLASTYEAR(<dates>) -> subtracts one year from all dates -- TOTALYTD(<expression>,<dates>[,<filter>][,<year_end_date>]) -- returns dates table beginning from first of january until the current month of <dates> -- TOTALYTD([Total Sales], 'Date'[Date]) -- -> shortcut for CALCULATE([Total Sales], DATESYTD('Date'[Date]))

Time Intelligence functions

-- understand query execution architecture in detail: https://msdn.microsoft.com/en-us/library/dn749781.aspx

Tipp: Analysis Services MOLAP Performance Guide

-- at http://blogs.prodata.ie/post/Tool-for-baselining-MOLAP-Processing.aspx -- and at http://blogs.prodata.ie/post/Baselining-Tabular-Model-Processing-with-a-trace-file.aspx

Tipp: Baselining Tabular and MD Model Processing

-- https://msdn.microsoft.com/en-us/library/dn736073.aspx

Tipp: Cluster SSAS

-- Analysis DAX patterns -- http://www.daxpatterns.com/patterns/

Tipp: Dax Patterns

-- removing all the "grey cells" in the dimension usage matrix is not necessarily a "best practice" that you should follow in all cases. -> no complete Dimension usage matrix necessary. Fill gray boxes only when necessary.

Tipp: Dimension Usage Tab

-- very resource intensive

Tipp: Distinct Count

https://msdn.microsoft.com/en-us/library/jj874401.aspx

Tipp: Hardware Sizing Tabular solution

-- https://www.sqlbi.com/articles/understanding-dax-query-plans/

Tipp: Logical DAX query plan introduction

-- https://msdn.microsoft.com/en-us/library/hh226085.aspx.

Tipp: MD Operation Guide

-- Excel generates queries in MDX -- Power View generates queries in DAX -- SSRS can use both -- MDX returns cellset -- DAX returns tables

Tipp: MDX vs. DAX

-- https://msdn.microsoft.com/en-us/library/dn749781.aspx

Tipp: MOLAP Performance Guide

-- If there are no overlapping dimension between measure groups consider creating seperate cubes for each measure group

Tipp: Measure Groups and Cubes

-- https://msdn.microsoft.com/en-us/library/dn393915.aspx

Tipp: Performance Tuning of Tabular Model

-- considering hierarchical relationships -- sample: Ratio To Parent:= var salesAllDates= CALCULATE([Total Sales];ALL('Date'[Date])) var salesAllMonths= CALCULATE([Total Sales];ALL('Date'[Calendar Month Label])) var salesAllYears= CALCULATE([Total Sales];all('Date'[Calendar Year Label])) return IF(ISFILTERED('Date'[Date]); DIVIDE([Total Sales];salesAllDates); IF(ISFILTERED('Date'[Calendar Month Label]); DIVIDE([Total Sales]; salesAllMonths); IF(ISFILTERED('Date'[Calendar Year Label]); DIVIDE([Total Sales]; salesAllYears) ) ) )

Tipp: Ratio to Parent Measure

-- reference Columns always with tablenames and Measures without tablenames

Tipp: References for Measures and Columns

-- SQL Server 2008 R2 Analysis Services Operations Guide: https://msdn.microsoft.com/en-us/library/hh226085.aspx

Tipp: Relational Source optimization

-- the measures dimension is the only dimension that can be placed on the columns axis

Tipp: SSRS measures dimension placement on axes

https://msdn.microsoft.com/en-us/library/ms174609.aspx

Tipp: Setup of ByAccount semiadditive aggregation

http://www.sqlservercentral.com/stairway/

Tipp: Stairway series on SQL and BI themes

-- https://msdn.microsoft.com/en-us/library/jj874401.aspx

Tipp: Tabular: Hardware Sizing

https://www.blue-granite.com/blog/5-tips-fast-ssas-tabular-models.

Tipp: Ultra fast Tabular Models

https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=3

Tipp: Vertipaq Storage

-- new Name is xVelocity

Tipp: Vertipaq, xVelocity

--the intermediate measure group must refer to a measure group that contains a valid relationship with a dimension that relates via a regular relationship to the target measure group. d(m-n dimension)-m-d-m(intermediate measure group)-d-m(target measuregroup)

Tipp: cascading M-N relationships, intermediate measure group

http://www.sqlbi.com/articles/many2many/ http://www.sqlbi.com http://sqlblog.com/blogs/marco_russo http://sqlblog.com/blogs/alberto_ferrari http://www.sqlbi.com/articles/sqlbi-methodology/ Comparison of InMon, Kimball and SQLBI Methodology

Tipp: good links

EVALUATE CALCULATETABLE(CALCULATETABLE(T, innermost-condition), outermost-condition) -- filter context is evaluated starting from outermost-condition and innermost-condition modifies filter context of outermost-condition

Tipp: nested CALCULATETABLE functions

-- CommandBegin -- CommandEnd -- Progress Report Begin --Progress Report End -- Progress Report Error

Trace Events for processing

-- apply to dimensions and cube captions -- add translations to display captions for dimensions and attributes that are specific to a user's locale. -- You can also optionally bind an attribute to columns containing translated attribute member names

Translations

-- provide alternate metadata for object names and descriptions -- MD: -- translations can be defined in dimension and cube designer -- add column for select language and tpye in a caption for the dimension, each attribute name, hierarchy and each level -- contents of column can also be translated if a source column is available in source table -- cube translations are supported for cube, measure group, measures, calculated measures, cube dimensions, perspectives, KPIs, actions, named sets and calculated members

Translations

-- no user interface available -- export JSON template for selected language -- modify template manually -- sample: "name": "City", "translatedCaption": "", "translatedDescription": "" -- import JSON file back into model

Translations Tabular

-- combine rows -- syntax: UNION(<table_expression1>, <table_expression2> [,<table_expression>]...) -- sample: EVALUATE UNION( ROW( "Stock Item", "Black", "Sales Territory", "Southwest", "Calendar Year", "CY2016", "Sales", CALCULATE( 'Sale'[Total Sales], 'Stock Item'[Color] = "Black", 'City'[Sales Territory] = "Southwest", 'Date'[Calendar Year Label] = "CY2016", 'Customer'[Buying Group] = "Tailspin Toys", 'Stock Item'[Size] = "L" ) ), ROW( "Stock Item", "Blue", "Sales Territory", "Far West", "Calendar Year", "CY2016", "Sales", CALCULATE( 'Sale'[Total Sales], 'Stock Item'[Color] = "Blue", 'City'[Sales Territory] = "Far West", 'Date'[Calendar Year Label] = "CY2016", 'Customer'[Buying Group] = "Tailspin Toys", 'Stock Item'[Size] = "L" ) ) )

UNION function

-- Sales by Orderdate: SalesByOrderDate := SUM ( FactInternetSales[SalesAmount] ) -- Sales by Duedate: CALCULATE ( SUM ( FactInternetSales[SalesAmount] ), USERELATIONSHIP ( FactInternetSales[ShipDateKey], DimDate[DateKey] ) )

USEREALTIONSHIP DAX function (role playing)

-- activate inactive relationship that must be defined in the tabular model -- sample (calculated column Shipped Lines in DateTime table) CALCULATE(COUNTROWS(InternetSales), USERELATIONSHIP(InternetSales[ShippingDate], DateTime[Date]))

USERELATIONSHIP filter

-- objects that can be processed: entire database cube measure group partition dimension

Units of processable objects

-- An unnatural hierarchy is one for which there is the potential to have a many-to-many relationship between members on a parent and child levels. (color - size)

Unnatural Hiearchy

Returns a one-column table that contains the distinct values from the specified table or column in the current filter context. syntax:VALUES(<ColumnName>)

VALUES function

-- VALUES -- bound to filter context -- syntax: VALUES(<TableNameOrColumnName>) -- if used with column returns one column table with distinct values (might include BLANK value) -- DISTINCT -- syntax: DISTINCT(<column>) -- only difference to VALUES: returns no BLANK value

VALUES, DISTINCT functions

-- use encodings to reduce number of bits used to store values in column -- Value encoding column Unit Price Values: 212, 197, 214, 197, 214, 197, 194, 197, 216 stored as Unit "Price - 194": 18, 3, 20, 3, 20, 3, 0, 3, 22 -- dictionary encoding replaces values with dictionary and indexes Red, Red, White, Black... becomes 0 -> Red, 1 -> White, 2 -> Black 0, 0, 1, 2... -- Run length encoding (RLE) Q1 310 times, Q2 290 times becomes Quarter Q1 Start 1 Count 310, Quarter Q2 Start 311 Count 290, ...

Vertipaq compression

-- in most cases MD model read-only --for budgeting and forecasting user input is beneficial -- a client is necessary (like Excel) that support writeback -- existing data in a measure group is never changed -- writeback data is stored in a separate table and incorporated in query results.as if they are part of the cube data -- writeback can be converted permantly incorporated into the cube or discarded (SSMS) -- partition can be write-enabled only when measure group contains ONLY measures that use the Sum aggregate function -- security can be granted to a limited group of users -- leaf and nonleaf cells are handled differently -- leaf cells can be directly changed, non leaf cell values have to be allocated. Client can use UPADTE CUBE statement to distribute changes to all leaf cells (allocation)

Writeback partition

-- MD performance issues -- tuning strategy: start with tuning queries then optimize processing -- most common reason for slow MDX query is the use of cell by cell evaluation instead of bulk-mode evaluation. -- rewrite MDX slow query if you find any of the following constructs: -- 1. SET Alias: replace SET alias defined in WITH clase and used for example in a SUM function with the SET expression itself -- 2. Late binding functions: function applied to the CURRENTMEMBER function. This function must be evaluated in the context of the current row or column -- 3. ASSP function always evaluates in cell-by-cell-mode -- 4. Cell-level security -- look at performance counters Number of Cell-By-Cell Evalution Nodes and Number of Bul-Mode Evaluation Nodes

common MD MDX performance issues

-- member property is an attribute that is not used in a pivot table for placement on row or columns. -- set AttributeHierarchyEnabled property to false -- member properties are not displayed with other attributes in client applications. -- member properties can be referenced in MDX -- e.g. can be displayed in tool tip in Excel

convert attribute to member property

-- use Synchronize Databases Wizard in SSMS on target database node and pull from source -- use backup and restore (abf file) -- attach and detach database

copy SSAS database

-- contains data about the entities that a business user wants to analyze—typically a person, place, thing, or point in time.

dimension table

-- equivalent expressions: -- calculated column in table channel: COUNTROWS(CALCULATETABLE (Orders)) CALCULATE(COUNTROWS(Orders))

express CALCULATETABLE by CALCULATE

--A slowly changing dimension (SCD) is a dimension for which you implement specific types of columns and ETL techniques specifically to address how to manage table updates when data changes.

history tracking

-- note number of logical processors and processor groups on the server: use coreinfo tool: https://technet.microsoft.com/en-us/sysinternals/cc835722.aspx --list processors in descending order from left to right: 76543210 -- create bitmask per group: 00111110 -- use binary to hex converter to calculate hex value: 0x3E --update GroupAffinity property with hex value -- for multiple groups specifiy groups seperated by comma: 0x0, 0x3E, 0x3E, 0x0 0x0 to ignore specific processor group -- currently used thread pool settings and thread pool affinity masks are output to msmdsrv.log in log dir

how to set affinity mask

-- create hierarchies for attributes that contains thousands of members, so that each level is manageable within the user interface -- if there is no meaningful grouping consider creating alphabetical groups with new attribute -- another option is Discretization-Method and DiscretizationBucketCount properties to create artificial groupings -- Drillthrough limits: configure OLAP \ Query \ DefaultDrillthroughMaxRows server property -- client can limit drillthrough row count in DRILLTHROUGH MDX command: --sample: DRILLTHROUGH maxrows 1000 SELECT ([Date].[Calendar].[Calendar Month].&[2016]&[12]) ON 0 FROM [Wide World Importers DW] WHERE [Customer].[Buying Group].&[Tailspin Toys]

implement usability limits

-- GOAL: prevent memory usage from reaching TotalMemoryLimit -- cache warming -> diverse techniques to get data into cache before users query the cube -- memory management memory issues can occur with the following types of operations: -- Dimension processing -- Aggregation processing set AggregationMemoryLimitMin and AggregationMemoryLimitMax in msmdsrv.ini file to control memory for aggregations processing -- ROLAP dimension attribute stores

memory management

-- check Performance Counters Processing Pool Job Queue Length and Processing Pool Idle Non-I/O Threads -- if job queue is greater than 0 and Idle Threads zero increase ThreadPool \ Process \ MaxTrheads

optimize CPU utilization during tabular processing

-- leave all database permission check boxes cleared -- under membership enter windows users or groups to role -- under cubes tab click Read in Access drop-down for the cube

set READ permissions for users to cube

-- collection of members of same dimension and hierarchy -- samples Empty set: {} Single member: [Stock Item].[Color].&[Black] Multiple members: {[Stock Item].[Color].&[Blue],[Stock Item].[Color].&[Black]} Range of members: {[Invoice Date].[Calendar].[CY2013-Jan]:[Invoice Date].[Calendar].[CY2013-Mar]} Function: [Stock Item].[Color].members

sets

-- comparison and when to use start schema: http://sqljason.com/2011/05/when-and-how-to-snowflake-dimension.html

snowflake or star schema

-- optimize process data -- process multiple partitions as a group. SQL Server 2016 processes them in parallel by default -- process only partitions with current or changed data rather than all partitions and then perform one Process Recalc. -- for faster processing use Process Add instead of Process Data

tune tabular processing time

-- if Process Data operation is using too much memory try to -- break up object processing into multiple transactions -- reduce parallelism by using MaxParallelism property -- Process each object in separate transactions

tuning to reduce tabular processing memory consumption

-- variables can be used in DAX expressions -- to easier understand the logic -- any value can be stored: column, table or value -- can improve query performance -- sample: Internal > $15k of Total Sales:= VAR InternalTerritories = Filter( VALUES(City[Sales Territory]); City[Sales Territory] <> "External" && City[Sales Territory] <> "N/A" ) VAR InternalSalesOver15k = sumx( InternalTerritories; CALCULATE( [Total Sales]; Sale[Total Excluding Tax] > 15000 ) ) VAR InternalAllSales = SUMX( InternalTerritories; [Total Sales] ) return divide(InternalSalesOver15k;InternalAllSales)

variable declarations


Related study sets

Intro to Business CH. 14, 15, & 17

View Set

Digital Photography Unit 4: Post- Production

View Set

Elements 1-40: Latin Names & Origins

View Set

CHAPTER 36 ABDOMINAL & GENITOURINARY TRAUMA

View Set

Chapter 5: Flexible Feature Life Policies

View Set

CFA Level I - Quantitative Methods

View Set

Psychology of Personality Exam 3: chapter 15

View Set

Theory of Consumer Behavior - McGraw Hill Chapter 5

View Set