DP-600 Fabric Analytics Engineer Practice Set

Ace your homework & exams now with Quizwiz!

You have a Fabric tenant. You notice a Fabric compute usage issue, which is causing performance issues. You need to increase the Fabric capacity unit size. What should you use?

Azure portal

You are planning a Fabric analytics solution for the following users: 2,000 Microsoft Power BI consumers without an individual Power BI Pro license. 32 Power BI modelers with an individual Power BI Pro license. 16 data scientists You need to recommend a Fabric capacity SKU. The solution must minimize costs. What should you recommend?

F64

You have a Fabric warehouse that contains two tables named FactSales and dimGeography. The dimGeography table has a primary key column named GeographyKey. The FactSales table has a foreign key column named GeographyKey. You create a Dataflow Gen2 query and add the tables as queries. You plan to use the Diagram view to visually transform the data. You need to join the two queries so that you retain all the rows in FactSales even if there are no matching rows for them in dimGeography. What should you do after you select FactSales?

Use Merge queries as new transformation with join kind set to Left outer and FactSales as the left table and dimGeography as the right table.

TREATAS()

applies the result of a table expression as filters to columns from an unrelated table

best load for large data source

copy tool in pipelines

CROSSFILTER()

defines crossfiltering direction of a physical relationship

You have a Fabric tenant that contains a lakehouse. You are creating a notebook to explore the data in the lakehouse. You need create a query to find the total number of records in the fact table for every individual product. The displayed results must be sorted in descending order. How should you structure the query?

df.groupBy("ProductKey").count().sort("count", descending=True).show()

You have a Fabric lakehouse named Lakehouse1. You use a notebook in Lakehouse1 to explore customer data. You need to identify the rows of a DataFrame named df_customers in which any of the columns (axis 1 of the DataFrame) are NULL. Which statement should you run?

df_customers[df_customers.isnull().any(axis=1)

type 2 slowly changing dimension (SCD)

method of retaining full history of values in a data warehouse

You have a Fabric tenant that has XMLA Endpoint set to Read Write. You need to use the XMLA endpoint to deploy changes to only one table from the data model. What is the main limitation of using XMLA endpoints for the Microsoft Power BI deployment process?

A PBIX file cannot be downloaded from the Power BI service.

V-ORDER in Lakehouse Delta lake table maintenance

Applies optimized sorting, encoding, and compression to Delta parquet files to enable fast read operations

You have a Fabric tenant that contains two lakehouses named Lakehouse1 and Lakehouse2. Lakehouse1 contains a table named FactSales that is partitioned by a column named CustomerID. You need to create a shortcut to the FactSales table in Lakehouse2. The shortcut must only connect to data for CustomerID 100. What should you do?

As you create the shortcut, select the CustomerKey=100 folder under the FactSales folder in Tables.

You have a Fabric tenant that contains a lakehouse named Lakehouse1. You have an external Snowflake database that contains a table with 200 million rows. You need to use a data pipeline to migrate the database to Lakehouse1. What is the most performant (fastest) method for ingesting data this large (200 million rows) by using a data pipeline?

Data Pipeline (Copy data)

You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 contains a lakehouse. The lakehouse contains a table named Customers and a Fabric notebook. You plan to use the notebook to profile the data. In the notebook, you set up the following DataFrame that references the Customers table. df = Spark.sql("Select * from Customers") You need to profile the data in the DataFrame. The solution must minimize administrative effort. What should you do?

Display the DataFrame by running display(df), and then clicking the Inspect button.

You are planning a Fabric analytics solution. You need to recommend a licensing strategy to support 10 Microsoft Power BI report authors and 600 report consumers. The solution must use Dataflow Gen2 for data ingestion and minimize costs. Which Fabric license type should you recommend?

F64

You have a Fabric tenant. Your company has 1 TB of legacy accounting data stored in an Azure Data Lake Storage Gen2 account. The data is queried only once a year for a few ad-hoc reports that submit very selective queries. You plan to create a Fabric lakehouse or warehouse to store company sales data. Developers must be able to build reports from the lakehouse or warehouse based on the sales data. The developers must also be able to do ad-hoc analysis of the legacy data at the end of each year. You need to recommend which Fabric architecture to create and the process for integrating the accounting data into Fabric. The solution must minimize administrative effort and costs. What should you recommend?

Ingest the sales data into the Fabric lakehouse and set up a shortcut to the legacy accounting data in the storage account.

Type 2 slowly changing dimension (SCD)

Keeps the history of old data by adding new row

You have an Azure SQL database that contains a customer dimension table. The table contains two columns named CustomerID and CustomerCompositeKey. You have a Fabric workspace that contains a Dataflow Gen2 query that connects to the database. You need to use Dataflows Query Editor to identify which of the two columns contains non-duplicate values per customer. Which option should you use?

Column distribution - distinct values

You use Microsoft Power BI Desktop to connect to data stored in a CSV file. You need to use Power Query Editor to identify the percentage of valid records in a column before loading the data to a report. Which Power Query option should you use?

Column quality

You have a Fabric workspace that contains a Microsoft Power BI report named Sales. You plan to use Dataflow Gen2 to add an additional column to the report. The new column must be based on the unit price of a product. Any product that has a unit price that is greater than $1,000 must be labeled as High, while any product that has a unit price that is less than $1,000 must be labeled as Regular. What should you select on the Add column tab in Power Query Editor?

Conditional column

You have a Fabric workspace that contains a lakehouse named Lakehouse1. A user named User1 plans to use Lakehouse explorer to read Lakehouse1 data. You need to assign a workspace role to User1. The solution must follow the principle of least privilege. Which workspace role should you assign to User1?

Contributor

You have a Fabric tenant that contains a lakehouse named Lakehouse1. A notebook named Notebook1 is used to ingest and transform data from an external data source before loading the data to Lakehouse1. You need to ensure that the process meets the following requirements: Runs daily at 7:00 AM. Attempts to rerun the process two more times if a source file is unavailable. The solution must minimize development effort. What should you do?

Create a pipeline. Add Notebook1 to the pipeline. Schedule the pipeline to run daily.

You have a Fabric workspace that contains a lakehouse named Lakehouse1. Lakehouse1 contains a table named FactSales that is ingested by using a Dataflow Gen2 query. There are several applied steps and transformations applied to FactSales during the ingestion process. You notice that due to the number of Power Query transformations, there are occasional timeout issues for the dataflow. You need to recommend a solution to prevent the timeout issues. You have already confirmed that the query cannot be further optimized and that changing the refresh time does not improve the timeout issues. Which additional action should you recommend?

Create a second dataflow that ingests the FactSales table with no additional transformations, and then connect the original dataflow to transform the FactSales data by using this second dataflow.

You have a Fabric workspace that contains a Microsoft Power BI report named Report1. Your organization does not currently have an enterprise data warehouse. You need to leverage dataflows to bring data into a Power BI semantic model. You notice that access to one of the data sources is restricted to narrow time windows. What should you do?

Create a staging dataflow that will only copy the data from the source as-is.

You have a Fabric workspace named Workspace1 that contains a lakehouse named Lakehouse1. You have write permissions to an Azure Data Lake Storage Gen2 account named storage1 that contains a folder named Folder1. You plan to delete a shortcut named Shortcut1 that points to a file named File1 stored in Folder1. You run the delete operation on the following path. Lakehouse1\Files\Shortcut1 What will occur after you run the delete operation?

Only Shortcut1 will be deleted.

You have a Fabric workspace that contains a data pipeline with a fact table and two dimension tables. The fact table contains customer data. One dimension table contains customer information and a column with Customer ID information, and the other dimension table contains calendar information and a column with Date ID information. You need to ensure that each customer's sales data is provisioned to their own Parquet file under the Parquet folder structure. Which data pipeline configuration should you implement?

Partition by customer ID on the fact table.

You have a Microsoft Power BI report page that takes longer than expected to display all its visuals. You need to identify which report element consumes most of the rendering time. The solution must minimize administrative effort and how long it takes to capture the rendering information of each element on the report page. What should you use?

Performance analyzer

You have a new Fabric tenant. You need to recommend a workspace architecture to meet best practices for content distribution and data governance. Which two actions should you recommend? Each correct answer presents part of the solution.

Place semantic models and reports in separate workspaces. Reuse shared semantic models for multiple reports

You have a Microsoft Power BI report named Sales that uses a Microsoft Excel file as a data source. Data is imported as one flat table. The table contains the following columns: ProductID, ProductColor, ProductName, ProductCategory and SalesAmount. You need to create an optimal fact table data model by using a star schema. Which two columns should remain part of the new fact tables?

ProductID & SalesAmount

You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 contains a warehouse that has a table named Orders. You have a Microsoft Power BI semantic model in Power BI Desktop that sources data from the Orders table. You need to enable incremental refresh for the table. Which two parameters should you create?

RangeStart and RangeEnd

You have a Microsoft Power BI semantic model that contains a fact table. The table contains more than 200 million rows. You need to add incremental refresh to the table to reduce the scheduled refresh load times. Which two parameters should you add to Power Query to enable incremental refresh?

RangeStart and RangeEnd

You have a Fabric workspace that contains a lakehouse named Lakehouse1. Lakehouse1 contains a Delta Parquet table named FactSales. You use a Describe command to review the history of FactSales and notice that you have over 1000 versions of the table, and the retention policy is six months. You need to reduce the size of the FactSales table and the number of files in the table. What should you configure on the table?

Run the VACUUM command under Maintenance.

You have a table named Sales that contains the following columns: Order_ID Customer_ID Product_ID Quantity Sales_Date You need to write a SQL statement to find the total number of products sold for each Product_ID in January 2024. What should you run?

SELECT Product_ID, SUM(Quantity) FROM Sales WHERE MONTH(Sales_Date) = 1 AND YEAR(Sales_Date) = 2024 GROUP BY Product_ID

You have an Azure SQL database that contains a table named inventory. The inventory table contains the following columns: item_id category stock_quantity last_stocked_date You need to write a SQL statement that retrieves the latest last_stocked_date for each category, for which stock_quantity is less than 50. Which SQL statement should you run?

SELECT category, MAX(last_stocked_date) FROM inventory WHERE stock_quantity < 50 GROUP BY category

You have a Fabric tenant that contains a lakehouse named Lakehouse1. A SELECT query from a managed Delta table in Lakehouse1 takes longer than expected to complete. The table receives new records daily and must keep change history for seven days. You notice that the table contains 1,000 Parquet files that are each 1 MB. You need to improve query performance and reduce storage costs. What should you do from Lakehouse explorer?

Select Maintenance and run the OPTIMIZE command as well as the VACUUM command with a retention policy of seven days.

You have a Microsoft Power BI report that contains a table visual. The visual contains three DAX measures named Sales, Units, and Customers. You need to apply logic-based DAX formatting to the Sales measure. The solution must minimize administrative effort and prevent the modification of the other two measures. How should you apply the logic?

Use dynamic measure formatting.

You have a Fabric tenant that contains a lakehouse named Lakehouse1. You have forecast data stored in Azure Data Lake Storage Gen2. You plan to ingest the forecast data into Lakehouse1. The data is already formatted, and you do NOT need to apply any further data transformations. The solution must minimize development effort and costs. Which method should you recommend to efficiently ingest the data?

Use the Copy activity in a pipeline.

Type 4 slowly changing dimension (SCD)

Uses separate history table and surrogate key with fact table to identify original and historic data

You have a DAX measure that contains the following code. Variance KPI = IF( [Variance] > 0.80, "Amazing!", IF( [Variance] > 0.60, "Good", "Bad" ) ) You need to optimize the measure so that it will calculate faster. Which code should you use?

VAR Calc = [Variance] RETURN SWITCH(TRUE(), Calc > 0.80, "Amazing!", Calc > 0.60, "Good", "Bad" )

USERELATIONSHIP()

activates different physical relationships between tables during a query execution

disadvantage of Type 2 slowly changing dimension (SCD)

adding new records can be costly operation and is not recommended when attributes or columns might be added in the future

lakehouse shortcuts

allow users to reference data without copying it

You have a Fabric warehouse named Warehouse1. You discover a SQL query that performs poorly, and you notice that table statistics are out of date. You need to manually update the statistics to improve query performance for the table. Which column statistics should you update?

columns used in GROUP BY clauses

Type 6 slowly changing dimension (SCD)

combination of type 1 (keeps latest data, old data is overwritten), 2 (keeps the history of old data by adding new row) and 3 (adds new attribute to store changed value)

OPTIMIZE in Lakehouse Delta lake table maintenance

consolidates multiple Parquet files into large file

best load for small data or specific connector

dataflows

You have a Fabric tenant that contains a lakehouse. You plan to use a Fabric notebook and PySpark to read sales data and save the data as a Delta table named Sales. The table must be partitioned by Sales Year and Quarter. You load the sales data to a DataFrame named df that contains a Year column and a Quarter column. Which command should you run next?

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("Tables/Sales")

You have a Fabric lakehouse that contains a managed Delta table named Product. You plan to analyze the data by using a Fabric notebook and PySpark. You load the data to a DataFrame by running the following code. df = spark.sql("SELECT * FROM Product") You need to display the top 100 rows from the DataFrame. Which PySpark command should you run?

display(df.limit(100))

You have a Microsoft Power BI report that contains a bar chart visual. You need to ensure that users can change the y-axis category of the bar chart by using a slicer selection. Which Power BI feature should you add?

field parameters

ALM toolkit

free and open-source tool to manage Power BI datasets with features such as database compare, code merging, deployment, source-control integration, reuse definitions, etc.

You have a Parquet file named Customers.parquet uploaded to the Files section of a Fabric lakehouse. You plan to use Data Wrangler to view basic summary statistics for the data before you load it to a Delta table. You open a notebook in the lakehouse. You need to load the data to a pandas DataFrame. Which PySpark code should you run to complete the task?

import pandas as pd df = pd.read_parquet("/lakehouse/default/Files/Customers.parquet")

You have a Fabric tenant that contains a workspace named Workspace1. Workspace 1 contains a lakehouse named Lakehouse1. You plan to use Microsoft SQL Server Management Studio (SSMS) to write SQL queries against Lakehouse1. Where can you find the SQL connection string for Lakehouse1?

in the Lakehouse settings under Copy SQL connection string

slowly changing dimensions (SCD)

keep track of the change in dimension values and are used to report historical data at any given point of time

Type 1 slowly changing dimension (SCD)

keeps latest data, old data is overwritten

disadvantage of Type 3 slowly changing dimension (SCD)

keeps limited history about changed data

You have a Microsoft Power BI visual. You use DAX query view to review the following code extracted from the visual. DEFINE VAR __DS0Core = SUMMARIZECOLUMNS('Company'[Manufacturer], "Sales", 'Metrics'[Sales], "ConstantSales", 'Metrics'[ConstantSales]) VAR __DS0PrimaryWindowed = TOPN(1001, __DS0Core, [Sales], 0, 'Company'[Manufacturer], 1) EVALUATE ROW( "AnalyticsLine", MEDIANX(KEEPFILTERS(__DS0Core), [Sales]) ) EVALUATE __DS0PrimaryWindowed ORDER BY [Sales] DESC, 'Company'[Manufacturer] You need to identify which type of analytics line was added to the visual. What should you identify?

median line

best practice analyzer

tool in tabular editor that helps to avoid slow performance during refresh, usability, and other manageability issues

You have a Fabric warehouse named Warehouse1 that contains customer status information. You plan to implement a dimensional model in Warehouse1. The solution must meet the following requirements: Be able to perform point-in-time analysis. Whenever a customer's status changes, the change must be persisted in a table named DimCustomer, and a new row is added to include the timestamp of the status change. Which type of dimension should you choose for dimCustomer?

type 2 slowly changing dimension (SCD)

object-level security (OLS)

used to restrict user access to specified tables and columns

You have a Fabric workspace that contains a Microsoft Power BI report. You need to modify the column names in the Power BI report without changing the original names in the underlying Delta table. Which warehouse object should you create?

view

advantage of Type 1 slowly changing dimension (SCD)

with no historical data, maintenance is easy and data warehouse size is reduced

You have a Fabric warehouse. You write the following T-SQL statement to retrieve data from a table named Sales to display the highest sales amount for specific customers. SELECT TOP 10 CustomerKey , SalesAmount , ROW_NUMBER() OVER(ORDER BY SalesAmount DESC) AS Ranking FROM dbo.Sales WHERE CustomerKey IN (1, 2, 3) The following is an example of the expected result. CustomerKey|SalesAmount|Ranking 1|100|<value1> 2|100|<value2> What is the Ranking value (value1 and value2) for the first two records?

1 and 2

You have a Fabric workspace that contains a set of Dataflow Gen2 queries. You plan to use the native Dataflow Gen2 refresh scheduler to configure the queries to refresh as often as possible. What is the fastest refresh interval that can be configured?

30 minutes

You publish a very large Microsoft Power BI semantic model to a Power BI workspace. The model refresh will take two hours. In Power BI Desktop, you limit the data you work with by using parameters. You need to update the definition of a measure. What can you use to update the measure definition without having to refresh the model in the Power BI service?

ALM Toolkit

You have a Fabric warehouse. You create a table named dimCustomer by using the following code. CREATE TABLE dbo.Dim_Customer ( CustomerKey VARCHAR(255) NOT NULL, Name VARCHAR(255) NOT NULL, Email VARCHAR(255) NOT NULL ); You need to alter the table to add CustomerKey as a primary key. Which command should you run?

ALTER TABLE dbo.Dim_Customer add CONSTRAINT PK_Dim_Customer PRIMARY KEY NONCLUSTERED (CustomerKey) NOT ENFORCED

You have a semantic model that pulls data from an Azure SQL database and is synced via Fabric deployment pipelines to three workspaces named Development, Test, and Production. You need to reduce the size of the query requests sent to the Azure SQL database when full semantic model refreshes occur in the Development or Test workspaces. What should you do for the deployment pipeline?

Add a deployment parameter rule to filter the data.

You have an Azure SQL database. You have a Microsoft Power BI report connected to a semantic model that uses a DirectQuery connection to the database. You need to reduce the number of queries sent to the database when a user is interacting with the report by using filters and/or slicers. What should you do?

Add apply buttons to all the basic filters.

Type 3 slowly changing dimension (SCD)

Adds new attribute to store changed value

You have a Microsoft Power BI semantic model assigned to you for ownership and maintenance. You need to perform an audit on the model to identify and resolve potential performance or design issues. Which Tabular Editor tool should you use?

Best Practices Analyzer

You are profiling the data stored in a Fabric lakehouse. You run the following statement. df.describe().show() Which functions will be included in the results for the object data? Each correct answer presents a complete solution.

COUNT, MEAN, STD, MAX, TOP, UNIQUE, FREQ

You have a semantic model that contains a Calendar Dimension table and a Sales fact table. The tables have a 1-to-many relationship. From DAX studio, you discover a DAX measure that is performing slowly against the model. You plan to modify a filter in the measure to improve performance. Which measure provides the best performance for the model?

CALCULATE( [Sales], Calendar[Year] = 2023 )

You are profiling the data stored in a Fabric lakehouse. You run the following statement. df.describe().show() Which functions will be included in the results for the numeric data? Each correct answer presents a complete solution.

COUNT, MEAN, STD, MAX

You have a Fabric warehouse. You are writing a T-SQL statement to retrieve data from a table named Sales to display the highest sales amount for specific customers. SELECT CustomerKey , SalesAmount , **<target1>** OVER(ORDER BY SalesAmount DESC) AS Ranking FROM dbo.Sales WHERE CustomerKey IN (1, 2, 3) You need to ensure that after ties for SalesAmount, the next Sales amount increments the Ranking value by one. The following is an example of the expected result. CustomerKey|SalesAmount|Ranking 1|100|1 2|100|1 1|80|2 Which function should you use for <target1> in the T-SQL statement?

DENSE_RANK()

You are working with a large semantic model. You need to identify which columns have contributed the most to the model size so that you can focus design efforts on either removing them from the model or reducing their cardinality. Which external tool can you use to get information about the size of each table and column in a model?

DAX Studio

You are developing a large Microsoft Power BI semantic model that will contain a fact table. The table will contain 400 million rows. You plan to leverage user-defined aggregations to speed up the performance of the most frequently run queries. You need to confirm that the queries are mapped to aggregated data in the tables. Which two tools should you use? Each correct answer presents part of the solution.

DAX Studio & SQL Server Profiler

You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 contains a warehouse and a Dataflow Gen2 query, which ingests the current year's order data from an Azure SQL database. A table Orders in the source database has 20 years of data. The Orders table contains a column named OrderDateTime that has DateTime data type. The OrderDateTime column contains values in the format of MM/DD/YYYY, HH:MM:SS AM. E.g. 01/05/2024, 05:30:00 PM. The Dataflow Gen2 query applies the following steps: Source: Set to the Azure SQL database. Navigation: Set to the Orders table. Split Column by position: Applied to the OrderDateTime column at position 11. This step generates the following two columns: OrderDateTime.1 and OrderDateTime.2. Filtered rows: Applied to OrderDateTime.1 to filter it to the current year. The dataflow takes a longer than expected. For dataflow performance, what should you recommend?

For step 3, first apply the filter transformation to the OrderDateTime column to current year, and then apply the split by position transformation.

You are planning the configuration of a new Fabric tenant. You need to recommend a solution to ensure that reports meet the following requirements: Require authentication for embedded reports. Allow only read-only (live) connections against Fabric capacity cloud semantic models. Which two actions should you recommend performing from the Fabric admin portal? Each correct answer presents part of the solution.

From Tenant settings, disable Allow XMLA endpoints and Analyze in Excel with on-premises semantic models, and also disable Publish to web

You have a Fabric warehouse that contains the following tables: Sales (DateKey, ProductKey, SalesAmount) Product(ProductKey,ProductName) Date (DateKey, Day, Month, Year) There is a 1-to-many relationship between the Date and Sales tables on DateKey. There is a 1-to-many relationship between the Product and Sales tables on ProductKey. You begin to write the following SQL query to analyze the Sales data by ProductName and Year, but only for products that have a yearly SalesAmount of more than $10000. Select p.ProductName, d.Year, Sum(s.Sales Amount) From Sales s LEFT JOIN DimProduct p on s.ProductKey = p.ProductKey LEFT JOIN DimDate d on s.DateKey = d.DateKey You need to complete the query to meet the requirements. How should you complete the query?

GROUP BY p.ProductName, d.Year HAVING SUM(s.SalesAmount) > 10000

You have a Fabric lakehouse named Lakehouse1. You write the following T-SQL statement targeting the SQL analytics endpoint of Lakehouse1. SELECT CalendarYear , SalesAmount , **<target1>** (SalesAmount, **<target2>**, 0) OVER(ORDER BY CalendarYear) AS PreviousSalesAmount FROM dbo.Sales WHERE CalendarYear BETWEEN 2020 AND 2023 You need to compare the sales amount and ensure that the statement displays the value from the previous year in the PreviousSalesAmount column. The following is an example of the expected result. CalendarYear|SalesAmount|PreviousSalesAmount 2020|100|0 2021|200|100 2022|150|200 2023|500|150 Which function should you insert for <target1>, and which numeric value should you provide for <target2> in the statement?

LAG and 1

You are designing a semantic model for a Microsoft Power BI report. You have a table named Employee that contains the following columns: EmployeeID, EmployeeName, and EmployeePosition. You have a table named Contract that contains the following columns: EmployeeID and ContractType. You plan to denormalize both tables and include the ContractType attribute. You need to ensure that all the rows in the Employee table are preserved and include any matching rows from the Contract table. Which type of join should you specify in the Power Query Merge queries transformation?

Left outer

You have a Fabric warehouse. You have an Azure SQL database that contains a fact table named Sales and a second table named ExceptionRecords. Both tables contain a unique key column named Record ID. You plan to ingest the Sales table into the warehouse. You need to use Dataflow Gen2 to configure a merge type to ensure that the Sales table excludes any records found in the ExceptionRecords table, and that query folding is maintained. Which applied steps should you use?

Merge (left anti join) applied step, and then the expand columns applied step

You have a Fabric workspace that contains a lakehouse named Lakehouse1. You need to create a data pipeline and ingest data into Lakehouse1 by using the Copy data activity. Which properties on the General tab are mandatory for the activity?

Name only

You have a Fabric workspace named Workspace1 that contains a data pipeline named Pipeline1. You plan to use the Office 365 Outlook activity to send an email message each time Pipeline1 experiences issues with pipeline connectors. You need to connect the Office 365 Outlook activity to each main pipeline activity. The solution must minimize the number of email messages sent by the activity. Which connection should you use for the Office 365 Outlook activity?

On fail

You have a Fabric workspace and a Microsoft Power BI semantic model that contains the following tables: Sales (ProductKey,CustomerKey,DateKey,SalesAmont) Product (ProductKey,ProductName,ProductCategory) Date (DateKey,Date,Month,Year) Customer (CustomerKey,CustmomerName,CustomerCity) The Product table has a 1-to-many relationship with the Sales table based on ProductKey. The Customer table has a 1-to-many relationship with the Sales table based on CustomerKey. The Date table has a 1-to-many relationship with the Sales table based on DateKey. You need to create a Power BI report so that end users can use a one column chart to analyze SalesAmount by ProductCategory or Year or CustomerCity. The solution must minimize development effort. What should you do?

Set up a Fields parameter with ProductCategory, Year, and CustomerCity. Use the Fields parameter in the visual.

You have a Fabric tenant that contains a workspace named Workspace1. Workspace1 contains a lakehouse, a data pipeline, a notebook, and several Microsoft Power BI reports. A user named User1 plans to use SQL to access the lakehouse to analyze data. User1 must have the following access: User1 must have read-only access to the lakehouse. User1 must NOT be able to access the rest of the items in Workspace1. User1 must NOT be able to use Spark to query the underlying files in the lakehouse. You need to configure access for User1. What should you do?

Share the lakehouse with User1 directly and select Read all SQL Endpoint data.

You have a Fabric workspace that contains a complex semantic model for a Microsoft Power BI report. You need to optimize the semantic model for analytical queries and use denormalization to reduce the model complexity and the number of joins between tables. Which tables should you denormalize?

Snowflaked dimension tables

You have a Fabric lakehouse named Lakehouse1 that contains a Dataflow Gen2 query. You have an Azure SQL database that contains a type 2 slowly changing dimension database table named CustomerMaster. CustomerMaster contains the following columns: Customer ID - Number EffectiveDate - Date Address - Text Status - Text You plan to ingest CustomerMaster into Lakehouse1. The solution must only keep the latest record (unique) per Customer ID. Which two applied steps should you use? Each correct answer presents part of the solution.

Sort on Customer ID, EffectiveDate and remove duplicates on the Customer ID column

You have a Fabric workspace. The workspace contains a Dataflow Gen2 query that displays dimensional product information. The query table contains a column named Product ID/Name that is a concatenation of Product ID and Product Name values. You need to use an applied step in Microsoft Power Query Editor to create a new column for Product ID and Product Name. The solution must use a single command to create two new columns and remove the original combined (Product ID/Name) column. Which applied step should you use?

Split Column

You are developing a Microsoft Power BI semantic model. Two tables in the data model are not connected in a physical relationship. You need to establish a virtual relationship between the tables. Which DAX function should you use?

TREATAS()

You are developing a complex semantic model that contains more than 20 date columns. You need to conform the date format for all the columns as quickly as possible. What should you use?

Tabular Editor

PATH()

returns a string of all the members in the column heirarchy

You have Azure Databricks tables and a Fabric lakehouse. You need to create a new Fabric artifact to combine data from both architectures. The solution must use data pipelines for the Azure Databricks data and shortcuts for the existing Fabric lakehouse. What Fabric artifact should you create?

a lakehouse

You have a Fabric warehouse. You have an Azure SQL database that contains two tables named ProductCategory and Product. Each table contains a column named ProductCategoryKey. You plan to ingest the tables into the warehouse using Dataflow Gen2. You need to merge the tables into a single table named Product. The combined table must contain all the rows from the Product table and the matching rows from the ProductCategory table. Which join configuration should you use?

a left outer join Product to ProductCategory

You have a Fabric tenant that contains a lakehouse. On a local computer, you have a CSV file that contains a static list of company office locations. You need to recommend a method to perform a one-time copy to ingest the CSV file into the lakehouse. The solution must minimize administrative effort. Which method should you recommend?

a local file upload by using Lakehouse explorer

best load for complex data transformations

notebooks

You have a Fabric workspace named Workspace1. You plan to create a data pipeline to ingest data into Workspace1. You need to ensure that the pipeline activity supports parameterization. Which two activities support parameterization in the data pipeline UI? Each correct answer presents part of the solution.

notebooks & SQL stored procedures

You are designing a Microsoft Power BI semantic model that contains a measure named CompanyCosts. You need to restrict access to CompanyCosts and ensure that only a user named User1 can view the measure in reports. What should you implement?

object-level security (OLS)

Type 0 slowly changing dimension (SCD)

passive method that always retains original

You have the following measure that you are reviewing as part of a model audit. RANKX( ALL( 'Product'[Product Name] ), [Sales],, DESC, Skip ) You need to identify what the measure is calculating. Which statement accurately describes the DAX measure?

ranks the product names by Sales, with the largest values getting the smallest (e.g. 1,2,3) ranks, and when product names have tied values, then the next rank value, after a tie, is the rank value of the tie plus the count of tied values

VACUUM in Lakehouse Delta lake table maintenance

removes old files no longer referenced by a Delta table log


Related study sets

Preguntas cap 4 El viaje perdido

View Set

Chapter 12 drills and drilling machines

View Set

Section Exam #5 Ecology, 252 exam 4, Exam 1 252, Biology 252 - Midterm #3, BIOL 252 - Exam 2

View Set

AP PYSCH EXAM !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

View Set