DP-203

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Box 1: PIVOT Box 2: CAST

You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in the following table. You need to produce the following table by using a Spark SQL query. How should you complete the query? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. Values: - CAST - COLLATE - CONVERT - FLATTEN - PIVOT - UNPIVOT

Box 1: CREATE EXTERNAL TABLE Box 2: OPENROWSET

You are building a database in an Azure Synapse Analytics serverless SQL pool. You have data stored in Parquet files in an Azure Data Lake Storege Gen2 container. Records are structured as shown in the following sample. {"id": 123, "address_housenumber": "19c", "address_line": "Memory Lane", "applicant1_name": "Jane", "applicant2_name": "Dev"} The records contain two applicants at most. You need to build a table that includes only the address fields. How should you complete the Transact-SQL statement?

Box 1: DATEDIFF( Box 2: LAST

You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a webpage. The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has a type of either 'start' or 'end'. You need to calculate the duration between start and end events. How should you complete the query?

Box 1: Partition Box 2: [TransactionDateID]

You are building an Azure Synapse Analytics dedicated SQL pool that will contain a fact table for transactions from the first half of the year 2020. You need to ensure that the table meets the following requirements: - Minimizes the processing time to delete data that is older than 10 years - Minimizes the I/O for queries that use year-to-date values How should you complete the Transact-SQL statement?

A. 40

You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications: - Contain sales data for 20,000 products. Use hash distribution on a column named ProductID. - Contain 2.4 billion records for the years 2019 and 2020. Which number of partition ranges provides optimal compression and performance for the clustered columnstore index? A. 40 B. 240 C. 400 D. 2,400

A. surrogate primary key B. effective start date E. effective end date

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool. You plan to keep a record of changes to the available fields. The supplier data contains the following columns. Which three additional columns should you add to the data to create a Type 2 SCD? A. surrogate primary key B. effective start date C. business key D. last modified date E. effective end date F. foreign key

D. Apache Parquet

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload. You need to recommend a format for the transformed files. The solution must meet the following requirements: - Contain information about the data types of each column in the files. - Support querying a subset of columns in the files. - Support read-heavy analytical workloads. - Minimize the file size. What should you recommend? A. JSON B. CSV C. Apache Avro D. Apache Parquet

Yes Yes No

You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit. For each of the following statements, select Yes if the statement is true. Otherwise, select No. 1. When selecting all the rows in dbo.Table1, data from the mydata2.csv file will be returned. 2. When selecting all the rows in dbo.Table1, data from the mydata3.csv file will be returned. 3. When selecting all the rows in dbo.Table1, data from the _mydata4.csv file will be returned.

A. \DataSource\SubjectArea\YYYY\WW\FileData_YYYY_MM_DD.parquet

You are designing the folder structure for an Azure Data Lake Storage Gen2 account. You identify the following usage patterns: - Users will query data by using Azure Synapse Analytics serverless SQL pools and Azure Synapse Analytics serverless Apache Spark pools. - Most queries will include a filter on the current year or week. - Data will be secured by data source. You need to recommend a folder structure that meets the following requirements: - Supports the usage patterns - Simplifies folder security - Minimizes query times Which folder structure should you recommend? A. \DataSource\SubjectArea\YYYY\WW\FileData_YYYY_MM_DD.parquet B. \DataSource\SubjectArea\YYYY-WW\FileData_YYYY_MM_DD.parquet C. DataSource\SubjectArea\WW\YYYY\FileData_YYYY_MM_DD.parquet D. \YYYY\WW\DataSource\SubjectArea\FileData_YYYY_MM_DD.parquet E. WW\YYYY\SubjectArea\DataSource\FileData_YYYY_MM_DD.parquet

D. /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv

You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month. Which folder structure should you recommend to support fast queries and simplified folder security? A. /{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv B. /{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv C. /{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv D. /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv

C. Create an external table that contains a subset of columns from the Parquet files.

You are implementing a batch dataset in the Parquet format.Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool. You need to minimize storage costs for the solution. What should you do? A. Use Snappy compression for the files. B. Use OPENROWSET to query the Parquet files. C. Create an external table that contains a subset of columns from the Parquet files. D. Store all data as string in the Parquet files.

D. Only CSV that have file names that beginning with "tripdata_2020".

You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics serverless SQL pool. You execute the Transact-SQL query shown in the following exhibit. What do the query results include? A. Only CSV files in the tripdata_2020 subfolder. B. All files that have file names that beginning with "tripdata_2020". C. All CSV files that have file names that contain "tripdata_2020". D. Only CSV that have file names that beginning with "tripdata_2020".

Box 1: MAX Box 2: TumblingWindow Box 3: DATEDIFF

You are processing streaming data from vehicles that pass through a toll booth. You need to use Azure Stream Analytics to return the license plate, vehicle make, and hour the last vehicle passed during each 10-minute window. How should you complete the query?

B. a materialized view

You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The reports will be produced once daily. You need to implement a solution to make the dataset available for the reports. The solution must minimize query times. What should you implement? A. an ordered clustered columnstore index B. a materialized view C. result set caching D. a replicated table

Type: Tumbling window Additional Properties: Recurrence: 30 minutes, Start time: 2021-01-01T00:00, Delay: 2 minutes

You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage Gen2 container to a database in an Azure Synapse Analytics dedicatedSQL pool. Data in the container is stored in the following folder structure./in/{YYYY}/{MM}/{DD}/{HH}/{mm} The earliest folder is /in/2021/01/01/00/00. The latest folder is /in/2021/01/15/01/45. You need to configure a pipeline trigger to meet the following requirements: ✑ Existing data must be loaded. ✑ Data must be loaded every 30 minutes. ✑ Late-arriving data of up to two minutes must be included in the load for the time at which the data should have arrived. How should you configure the pipeline trigger? Type: - Event - On-demand - Schedule - Tumbling window Additional Properties: - Prefix: /in/, Event: Blob created - Recurrence: 30 minutes, Start time: 2021-01-01T00:00 - Recurrence: 30 minutes, Start time: 2021-01-01T00:00, Delay: 2 minutes - Recurrence: 32 minutes, Start time: 2021-01-15T01:45

Transform data for the dimension tables by: Denormalizing to a second normal form For the primary key columns in the dimension tables, use: New IDENTITY columns

You have a Microsoft SQL Server database that uses a third normal form schema. You plan to migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL pool. You need to design the dimension tables. The solution must optimize read operations. What should you include in the solution? Transform data for the dimension tables by: - Maintaining to a third normal form - Normalizing to a fourth normal form - Denormalizing to a second normal form For the primary key columns in the dimension tables, use: - New IDENTITY columns - A new computed column - The business key column from the source sys

Distribution: Hash Indexing: Clustered columnstore Partitioning: Date

You have a SQL pool in Azure Synapse. You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load. You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table. How should you configure the table? Distribution: - Hash - Replicated - Round-robin Indexing: - Clustered - Clustered columnstore - Heap Partitioning: - Date - None

Dim_Customer: Replicated Dim_Employee: Replicated Dim_Time: Replicated Fact_DailyBookings: Hash Distributed

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit. All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates. Which type of table should you use for each table? Dim_Customer: - Hash Distributed - Round-robin - Replicated Dim_Employee: - Hash Distributed - Round-robin - Replicated Dim_Time: - Hash Distributed - Round-robin - Replicated Fact_DailyBookings: - Hash Distributed - Round-robin - Replicated

C. [ManagerEmployeeKey] [int] NULL

You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement. CREATE TABLE [dbo].[DimEmployee] ( [EmployeeKey] [int] IDENTITY(1, 1) NOT NULL, [EmployeeID] [int] NOT NULL, [FirstName] [varchar](100) NOT NULL, [LastName] [varchar](100) NOT NULL, [JobTitle] [varchar](100) NULL, [LastHireDate] [date] NULL, [StreetAddress] [varchar](500) NOT NULL, [City] [varchar](200) NOT NULL, [StateProvince] [varchar](50) NOT NULL, [PortalCode] [varchar](10) NOT NULL) You need to alter the table to meet the following requirements: - Ensure that users can identify the current manager of employees - Support creating an employee reporting hierarchy for your entire company - Provide fast lookup of the managers' attributes such as name & job title Which column should you add to the table? A. [ManagerEmployeeID] [smallint] NULL B. [ManagerEmployeeKey] [smallint] NULL C. [ManagerEmployeeKey] [int] NULL D. [ManagerName] [varchar] (200) NULL

- Create an empty table named SalesFact_Work that has the same schema as SalesFact - Switch the partition containing the stale data from SalesFact to SalesFact_Work - Drop the SalesFact_Work table

You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. SalesFact contains sales data from the past 36 months and has the following characteristics: - Is partitioned by month - Contains one billion rows - Has clustered columnstore index At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as quickly as possible. Which three actions should you perform in sequence in a stored procedure? - Switch the partition containing the stale data from SalesFact to SalesFact_Work - Truncate the partition containing the stale data - Drop the SalesFact_Work table - Create an empty table named SalesFact_Work that has the same schema as SalesFact - Execute a DELETE statement where the value in the Date column is more than 36 months ago - Copy the data to a new table by using CREATE TABLE AS SELECT (CTAS)

A. Pipeline1 and Pipeline2 succeeded.

You have an Azure Data Factory instance that contains two pipelines named Pipeline1 and Pipeline2. Pipeline1 has the activities shown in the following exhibit. Stored procedure1 -> Set variable1 Pipeline2 has the activites shown in the following exhibit. Execute Pipeline1 -> Set variable1 You execute Pipeline2, and Stored procedure1 in Pipeline1 fails. What is the status of the pipeline runs? A. Pipeline1 and Pipeline2 succeeded. B. Pipeline1 and Pipeline2 failed. C. Pipeline1 succeeded and Pipeline2 failed. D. Pipeline1 failed and Pipeline2 succeeded.

D. an annotation

You have an Azure Data Factory that contains 10 pipelines. You need to label each pipeline with its main purpose of either ingest, transform, or load. The labels must be available for grouping and filtering when using the monitoring experience in Data Factory. What should you add to each pipeline? A. a resource tag B. a correlation ID C. a run group ID D. an annotation

df.write: .partitionBy Box 2: ("StoreID", "Year", "Month", "Day", "Hour") .mode("append"): .parquet("/Purchases")

You plan to develop a dataset named Purchases by using Azure Databricks. Purchases will contain the following columns: - ProductID - ItemPrice - LineTotal - Quantity - StoreID - Minute - Month - Hour Year - - Day You need to store the data to support hourly incremental load pipelines that will vary for each Store ID. The solution must minimize storage costs. How should you complete the code? df.write: - .bucketBy - .partitionBy - .range - .sortBy Box 2: - ("*") - ("StoreID", "Hour") - ("StoreID", "Year", "Month", "Day", "Hour") .mode("append") - .csv("/Purchases") - .json("/Purchases") - .parquet("/Purchases") - .saveAsTable("/Purchases")

D. zone-redundant storage (ZRS)

You plan to implement an Azure Data Lake Gen 2 storage account. You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs. Which type of replication should you use for the storage account? A. geo-redundant storage (GRS) B. geo-zone-redundant storage (GZRS) C. locally-redundant storage (LRS) D. zone-redundant storage (ZRS)

The files are moved to cool storage after 30 days The storage policy applies to container1/contoso.csv

You store files in an Azure Data Lake Storage Gen2 container. The container has the storage policy shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. The files are ___________ after 30 days: - deleted from the container - moved to the archive storage - moved to cool storage - moved to hot storage The storage policy applies to ______________ : - container1/contoso.csv - container1/docs/contoso.json - container1/mycontoso/contoso.csv

Copy Behavior: Preserve Hierarchy Sink File Type: Parquet

You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools. Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company. You need to move the files to a different folder and transform the data to meet the following requirements: - Provide the fastest possible query times. - Automatically infer the schema from the underlying files. How should you configure the Data Factory copy activity?

D. Scale the SU count for the job up. F. Implement query parallelization by partitioning the data input.

A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure StreamAnalytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU). You need to optimize performance for the Azure Stream Analytics job. Which two actions should you perform? A. Implement event ordering. B. Implement Azure Stream Analytics user-defined functions (UDF). C. Implement query parallelization by partitioning the data output. D. Scale the SU count for the job up. E. Scale the SU count for the job down. F. Implement query parallelization by partitioning the data input.

No

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. Solution: You use a hopping window that uses a hop size of 10 seconds and a window size of 10 seconds. Does this meet the goal? A. Yes B. No

No

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. Solution: You use a hopping window that uses a hop size of 5 seconds and a window size 10 seconds. Does this meet the goal? A. Yes B. No

EventCategory: DimEvent ChannelGrouping: DimChannel TotalEvents: FactEvents

From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns. You need to design a star schema to support analytical queries of the data. The star schema will contain four tables including a date dimension. To which table should you add each column? EventCategory: - DimChannel - DimDate - DimEvent - FactEvents ChannelGrouping: - DimChannel - DimDate - DimEvent - FactEvents TotalEvents: - DimChannel - DimDate - DimEvent - FactEvents

Box 1: select Box 2: explode Box 3: alias

How should you complete the PySpark code? Values: - alias - array_union - createDataFrame - explode - select - translate

Box 1: LAG Box 2: LIMIT DURATION

You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage. You need to calculate the difference in the number of readings per sensor per hour. How should you complete the query?

Path Pattern: {date}/product.csv Date Format: YYYY-MM-DD

You are building an Azure Stream Analytics job that queries reference data from a product catalog file. The file is updated daily. The reference data input details for the file are shown in the Input exhibit. Path Pattern: product.csv Date Format: YYYY/MM/DD Time Format: HH You need to configure the Stream Analytics job to pick up the new reference data. What should you configure? Path Pattern: - {date}/product.csv - {date}/{time}/product.csv - product.csv - */product.csv Date Format: - MM/DD/YYYY - YYYY/MM/DD - YYYY-DD-MM - YYYY-MM-DD

C. Azure Stream Analytics

You are designing a statistical analysis solution that will use custom proprietary Python functions on near real-time data from Azure Event Hubs. You need to recommend which Azure service to use to perform the statistical analysis. The solution must minimize latency. What should you recommend? A. Azure Synapse Analytics B. Azure Databricks C. Azure Stream Analytics D. Azure SQL Database

First Week: Hot After One Month: Cool After One Year: Cool

You are designing an application that will store petabytes of medical imaging data. When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes. You need to select a storage strategy for the data. The solution must minimize costs. Which storage tier should you use for each time frame? First Week: - Archive - Cool - Hot After One Month: - Archive - Cool - Hot After One Year: - Archive - Cool - Hot

D. Azure Databricks

You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java. Which service should you recommend using to process the streaming data? A. Azure Event Hubs B. Azure Data Factory C. Azure Stream Analytics D. Azure Databricks

Report 1: CSV Report 2: Avro

You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two reports that will access the data lake: - Report1: Reads three columns from a file that contains 50 columns. - Report2: Queries a single record based on a timestamp. You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times.What should you recommend for each report? To answer, select the appropriate options in the answer area. Report 1: - Avro - CSV - Parquet - TSV Report 2: - Avro - CSV - Parquet - TSV

Yes No Yes

The following code segment is used to create an Azure Databricks cluster. For each of the following statements, select Yes if the statement is true. Otherwise, select No. - The Databricks cluster supports multiple concurrent users. - The Databricks cluster minimizes costs when running scheduled jobs that execute notebooks. - The Databricks cluster supports the creation of a Delta Lake Table.

To minimize storage costs: Store the infrastructure logs in the Cool access tier and the application logs in the Archive access tier To delete logs automatically: Azure Blob storage lifecycle management rules

You have an Azure Data Lake Storage Gen2 account named account1 that stores logs as shown in the following table. You do not expect that the logs will be accessed during the retention periods. You need to recommend a solution for account1 that meets the following requirements: - Automatically deletes the logs at the end of each retention period - Minimizes storage costs What should you include in the recommendation? To minimize storage costs: - Store the infrastructure logs and the application logs in the Archive access tier - Store the infrastructure logs and the application logs in the Cool access tier - Store the infrastructure logs in the Cool access tier and the application logs in the Archive access tier To delete logs automatically: - Azure Data Factory pipelines - Azure Blob storage lifecycle management rules - Immutable Azure Blob storage time-based retention policies

1. Mount the Data Lake Storage onto DBFS. 2. Read the file into a data frame. 3. Perform transformations on the data frame. 4. Specify a temporary folder to stage the data. 5. Write the results to a table in Azure Synapse.

You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName and LastName. You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values. You create the following components: ✑ A destination table in Azure Synapse ✑ An Azure Blob storage container ✑ A service principal Which five actions should you perform in sequence next in is Databricks notebook? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Five-year old data: Move to the cool storage Seven-year old data: Move to the archive storage

You have an Azure Data Lake Storage Gen2 container .Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files. You need to design a data archiving solution that meets the following requirements: ✑ New data is accessed frequently and must be available as quickly as possible. ✑ Data that is older than five years is accessed infrequently but must be available within one second when requested. ✑ Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible. ✑ Costs must be minimized while maintaining the required availability. How should you manage the data? Five-year old data: - Delete the blob - Move to the archive storage - Move to the cool storage - Move to the hot storage Seven-year old data: - Delete the blob - Move to the archive storage - Move to the cool storage - Move to the hot storage

B. read-access geo-redundant storage (RA-GRS)

You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data. You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs. Which type of data redundancy should you use? A. geo-redundant storage (GRS) B. read-access geo-redundant storage (RA-GRS) C. zone-redundant storage (ZRS) D. locally-redundant storage (LRS)

Data over five years old: Move to cool storage Data over seven years old: Move to archive storage

You have an Azure Data Lake Storage Gen2 service. You need to design a data archiving solution that meets the following requirements: - Data that is older than five years is accessed infrequently but must be available within one second when requested. - Data that is older than seven years is NOT accessed. - Costs must be minimized while maintaining the required availability. How should you manage the data? Data over five years old: - Delete the blob - Move to archive storage - Move to cool storage - Move to hot storage Data over seven years old: - Delete the blob - Move to archive storage - Move to cool storage - Move to hot storage

D. Create a pool in workspace1.

You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. Workspace1 contains an all-purpose cluster named cluster1. You need to reduce the time it takes for cluster1 to start and scale up. The solution must minimize costs. What should you do first? A. Configure a global init script for workspace1. B. Create a cluster policy in workspace1. C. Upgrade workspace1 to the Premium pricing tier. D. Create a pool in workspace1.

D. MERGE

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Table1. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1. Which Apache Spark SQL operation should you use? A. CREATE B. UPDATE C. ALTER D. MERGE

HubA: Stream HubB: Stream Database1: Reference

You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB. The data consumed from each source is shown in the following table. You need to implement Azure Stream Analytics to calculate the average fare per mile by driver. How should you configure the Stream Analytics input for each source? HubA: - Stream - Reference HubB: - Stream - Reference Database1: - Stream - Reference

A. Azure integration runtime

You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region. You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements: ✑ Ensure that the data remains in the UK South region at all times. ✑ Minimize administrative effort. Which type of integration runtime should you use? A. Azure integration runtime B. Azure-SSIS integration runtime C. Self-hosted integration runtime

B. No

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You copy the files to a table that has a columnstore index. Does this meet the goal? A. Yes B. No

B. No

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal? A. Yes B. No

A. Yes

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You convert the files to compressed delimited text files. Does this meet the goal? A. Yes B. No

A. Yes

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You modify the files to ensure that each row is less than 1 MB. Does this meet the goal? A. Yes B. No

1. Add an Azure Stream Analytics Customer Deserializer Project (.NET) project to the solution 2. Add .NET deserializer code for Protobuf to the customer deserializer project 3. Add an Azure Stream Analytics Application project to the solution

You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format. You need to modify the job to accept data generated by the IoT devices in the Protobuf format. Which three actions should you perform from Visual Studio on sequence?

B. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10)

You have an Azure Stream Analytics job that receives clickstream data from an Azure event hub. You need to define a query in the Stream Analytics job. The query must meet the following requirements: ✑ Count the number of clicks within each 10-second window based on the country of a visitor. ✑ Ensure that each click is NOT counted more than once. How should you define the Query? A. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SlidingWindow(second, 10) B. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10) C. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, HoppingWindow(second, 10, 2) D. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SessionWindow(second, 5, 10)

D. Load the data by using PySpark.

You have an Azure Synapse Analytics Apache Spark pool named Pool1. You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file. You need to load the files into the tables. The solution must maintain the source data types. What should you do? A. Use a Conditional Split transformation in an Azure Synapse data flow. B. Use a Get Metadata activity in Azure Data Factory. C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool. D. Load the data by using PySpark.

Box 1: blob Box 2: TYPE = HADOOP

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named Account1. You plan to access the files in Account1 by using an external table. You need to create a data source in Pool1 that you can reference when you create the external table. How should you complete the Transact-SQL statement?

C. Switch the first partition from stg.Sales to dbo.Sales.

You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales with the content of the same partition in stg.Sales. The solution must minimize load times. What should you do? A. Insert the data from stg.Sales into dbo.Sales. B. Switch the first partition from dbo.Sales to stg.Sales. C. Switch the first partition from stg.Sales to dbo.Sales. D. Update dbo.Sales from stg.Sales.

D. ALTER INDEX ALL on table1 REBUILD

You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a table named table1. You load 5 TB of data into table1. You need to ensure that columnstore compression is maximized for table1. Which statement should you execute? A. DBCC INDEXDEFRAG (pool1, table1) B. DBCC DBREINDEX (table1) C. ALTER INDEX ALL on table1 REORGANIZE D. ALTER INDEX ALL on table1 REBUILD

B. once per year

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following: - One billion rows - A clustered columnstore index - A hash-distributed column named Product Key - A column named Sales Date that is of the date data type and cannot be null Thirty million rows will be added to Table1 each month. You need to partition Table1 based on the Sales Date column. The solution must optimize query performance and data loading. How often should you create a partition? A. once per month B. once per year C. once per day D. once per week

A. Add the managed identity to the Sales group. B. Use the managed identity as the credentials for the data load process. F. Create a managed identity.

You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNET1. You are building a SQL pool in Azure Synapse that will use data from the data lake. Your company has a sales team. All the members of the sales team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the Sales group access to the files in the data lake. You plan to load data to the SQL pool every hour. You need to ensure that the SQL pool can load the sales data from the data lake. Which three actions should you perform? A. Add the managed identity to the Sales group. B. Use the managed identity as the credentials for the data load process. C. Create a shared access signature (SAS). D. Add your Azure Active Directory (Azure AD) account to the Sales group. E. Use the shared access signature (SAS) as the credentials for the data load process. F. Create a managed identity.

Input Type: Stream Function: Geospatial

You plan to create a real-time monitoring app that alerts users when a device travels more than 200 meters away from a designated location. You need to design an Azure Stream Analytics job to process the data for the planned app. The solution must minimize the amount of code developed and the number of technologies used. What should you include in the Stream Analytics job? Input Type: - Stream - Reference Function: - Aggregate - Geospatial - Windowing

Box 1: HASH Box 2: OrderDateKey

You plan to create a table in an Azure Synapse Analytics dedicated SQL pool. Data in the table will be retained for five years. Once a year, data that is older than five years will be deleted. You need to ensure that the data is distributed evenly across partitions. The solution must minimize the amount of time required to delete old data. How should you complete the Transact-SQL statement? Values: - CustomerKey - HASH - ROUND_ROBIN - REPLICATE - OrderDateKey - SalesOrderNumber

Replication Mechanism: Read-access geo-zone-redundant storage (RA-GRS) Failover Process: Failover manually initiated by the customer

You plan to create an Azure Data Lake Storage Gen2 account. You need to recommend a storage solution that meets the following requirements: - Provides the highest degree of data resiliency - Ensures that content remains available for writes if a primary data center fails What should you include in the recommendation? Replication Mechanism: - Change feed - Zone-redundant storage (ZRS) - Read-access geo-redundant storage (RA-GRS) - Read-access geo-zone-redundant storage (RA-GRS) Failover Process: - Failover initiated by Microsoft - Failover manually initiated by the customer - Failover automatically initiated by an Azure Automation job

B. Convert the files to Avro

You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour. File sizes range from 4 KB to 5 GB. You need to ensure that the files stored in the container are optimized for batch processing. What should you do? A. Convert the files to JSON B. Convert the files to Avro C. Compress the files D. Merge the files

B. Parquet

You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained. What should you recommend? A. JSON B. Parquet C. CSV D. Avro

B. automated

You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use? A. High Concurrency B. automated C. interactive

Ingest: Azure Data Factory Store: Azure Data Lake Storage Prepare and Train: Azure Databricks Model and Serve: Azure Synapse Analytics

A company plans to use Platform-as-a-Service (PaaS) to create the new data pipeline process. The process must meet the following requirements: Ingest: ✑ Access multiple data sources. ✑ Provide the ability to orchestrate workflow. ✑ Provide the capability to run SQL Server Integration Services packages. Store: ✑ Optimize storage for big data workloads. ✑ Provide encryption of data at rest. ✑ Operate with no size limits. Prepare and Train: ✑ Provide a fully-managed and interactive workspace for exploration and visualization. ✑ Provide the ability to program in R, SQL, Python, Scala, and Java. Provide seamless user authentication with Azure Active Directory. Model & Serve: ✑ Implement native columnar storage. ✑ Support for the SQL language ✑ Provide support for structured streaming. You need to build the data integration pipeline. Which technologies should you use? Ingest: - Logic Apps - Azure Data Factory - Azure Automation Store: - Azure Data Lake Storage - Azure Blob Storage - Azure files Prepare and Train: - HDInsight Apache Spark cluster - Azure Databricks - HDInsight Apache Storm cluster Model and Serve: - HDInsight Apache Kafka cluster - Azure Synapse Analytics - Azure Data Lake Storage

A. To the data flow, add a sink transformation to write the rows to a file in blob storage. B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date. The data flow already contains the following: ✑ A source transformation. ✑ A Derived Column transformation to set the appropriate types of data. ✑ A sink transformation to land the data in the pool. You need to ensure that the data flow meets the following requirements: ✑ All valid rows must be written to the destination table. ✑ Truncation errors in the comment column must be avoided proactively. ✑ Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage. Which two actions should you perform? A. To the data flow, add a sink transformation to write the rows to a file in blob storage. B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors. C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors. D. Add a select transformation to select only the rows that will cause truncation errors.

DimProduct is a Type 2 slowly changing dimension (SCD) The ProductKey column is a business key

You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You create a table by using the Transact-SQL statement shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. DimProduct is a ___________ slowly changing dimension (SCD): - Type 0 - Type 1 - Type 2 The ProductKey column is ___________: - a surrogate key - a business key - an audit column

C. a dimension table for Employee E. a fact table for Transaction

You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions. From a source system, you have a flat extract that has the following fields: ✑ EmployeeID FirstName - ✑ LastName ✑ Recipient ✑ GrossAmount ✑ TransactionID ✑ GovernmentID ✑ NetAmountPaid ✑ TransactionDate You need to design a star schema data model in an Azure Synapse Analytics dedicated SQL pool for the data mart. Which two tables should you create? A. a dimension table for Transaction B. a dimension table for EmployeeTransaction C. a dimension table for Employee D. a fact table for Employee E. a fact table for Transaction

C. Type 2

You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes. Which type of slowly changing dimension (SCD) should you use? A. Type 0 B. Type 1 C. Type 2 D. Type 3

C. an IDENTITY column

You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.You need to create a surrogate key for the table. The solution must provide the fastest query performance. What should you use for the surrogate key? A. a GUID column B. a sequence object C. an IDENTITY column

B. hash-distributed on PurchaseKey

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns. FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily. SELECT SupplierKey, StockItemKey, COUNT(*)FROM FactPurchase WHERE DateKey >= 20210101 AND DateKey <= 20210131 GROUP BY SupplierKey, StockItemKey Which table distribution will minimize query times? A. replicated B. hash-distributed on PurchaseKey C. round-robin D. hash-distributed on DateKey

B. hash-distributed on PurchaseKey

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns. FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily. SELECT SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*) FROM FactPurchase WHERE DateKey >= 20210101 -AND DateKey <= 20210131 GROUP BY SupplierKey, StockItemKey, IsOrderFinalized Which table distribution will minimize query times? A. replicated B. hash-distributed on PurchaseKey C. round-robin D. hash-distributed on IsOrderFinalized

D. TransactionMonth

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns: - TransactionType: 40 million rows per transaction type - CustomerSegment: 4 million per customer segment - TransactionMonth: 65 million rows per month AccountType: 500 million per account type You have the following query requirements: - Analysts will most commonly analyze transactions for a given month. - Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type You need to recommend a partition strategy for the table to minimize query times. On which column should you recommend partitioning the table? A. CustomerSegment B. AccountType C. TransactionType D. TransactionMonth

Azure Stream Analytics input type: Azure Event Hub Azure Stream Analytics output type: Microsoft Power BI Aggregation query location: Azure Stream Analytics

You are designing a near real-time dashboard solution that will visualize streaming data from remote sensors that connect to the internet. The streaming data must be aggregated to show the average value of each 10-second interval. The data will be discarded after being displayed in the dashboard. The solution will use Azure Stream Analytics and must meet the following requirements: ✑ Minimize latency from an Azure Event hub to the dashboard. ✑ Minimize the required storage. ✑ Minimize development effort. What should you include in the solution? Azure Stream Analytics input type: - Azure Event Hub - Azure SQL Database - Azure Stream Analytics - Microsoft Power BI Azure Stream Analytics output type: - Azure Event Hub - Azure SQL Database - Azure Stream Analytics - Microsoft Power BI Aggregation query location: - Azure Event Hub - Azure SQL Database - Azure Stream Analytics - Microsoft Power BI

C. Microsoft.EventGrid

You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container. Which resource provider should you enable? A. Microsoft.Sql B. Microsoft.Automation C. Microsoft.EventGrid D. Microsoft.EventHub

When User2 queries the YearlyIncome column, the values returned will be: - 0 When User 1 queries the BirthDate column, the values returned will be: - the values stored in the database

You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table: Name | Role ------------------------- User 1 | Server Admin ------------------------- User 2 | db_datereader User 1 executes a query on the database, and the query returns the results shown in the following exhibit. SELECT c.name, tbl.name as table_name, typ.name as datatype, c.is_masked c.masking_function FROM sys.masked_columns AS c INNER JOIN sys.tables as tbl ON c.[object_id] = tbl.[object_id] INNER JOIN sys.types typ ON c.user_type_id = typ.user_type_id WHERE is_masked = 1; User1 is the only user who has access to the unmasked data. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. When User2 queries the YearlyIncome column, the values returned will be: - a random number - the values stored in the database - XXXX - 0 When User 1 queries the BirthDate column, the values returned will be: - a random date - the values stored in the database - XXXX - 1900-01-01

B. month

You have an Azure Synapse Analytics dedicated SQL pool. You need to create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations: - Show order counts by week. - Calculate sales totals by region. - Calculate sales totals by product. - Find all the orders from a given month. Which data should you use to partition Table1? A. product B. month C. week D. region

(CLUSTERED COLUMNSTORE INDEX Hash([ProductKey])

You have an Azure Synapse Analytics dedicated SQL pool. You need to create a table named FactInternetSales that will be a large fact table in a dimensional model. FactInternetSales will contain 100 million rows and two columns named SalesAmount and OrderQuantity. Queries executed on FactInternetSales will aggregate the values in SalesAmount and OrderQuantity from the last year for a specific product. The solution must minimize the data size and query execution time. How should you complete the code?

D. Parquet

You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool. Which format should you use for the tables in DB1? A. CSV B. ORC C. JSON D. Parquet

A. 24

You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb. You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace. CREATE TABLE mytestdb.myParquetTable( EmployeeID int, EmployeeName string, EmployeeStartDate date) USING Parquet -You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data. EmployeeName | EmployeeID | EmployeeStartDate Alice | 24 | 2020-01-25 One minute later you execute the following query from a serverless SQL pool in MyWorkspace. SELECT EmployeeID FROM mytestdb.dbo.myParquetTable WHERE EmployeeName = 'Alice'; What will be returned by the query? A. 24 B. an error C. a null value

B. Parquet

You have an Azure subscription that contains an Azure Blob Storage account named storage1 and an Azure Synapse Analytics dedicated SQL pool namedPool1. You need to store data in storage1. The data will be read by Pool1. The solution must meet the following requirements: - Enable Pool1 to skip columns and rows that are unnecessary in a query. - Automatically create column statistics. - Minimize the size of files. Which type of file should you use? A. JSON B. Parquet C. Avro D. CSV

Common.Data: Replicated Marketing.Web.Sessions: Hash Staging.Web.Sessions: Round-robin

You have an Azure subscription. You plan to build a data warehouse in an Azure Synapse Analytics dedicated SQL pool named pool1 that will contain staging tables and a dimensional model. Pool1 will contain the following tables. You need to design the table storage for pool1. The solution must meet the following requirements: - Maximize the performance of data loading operations to Staging.WebSessions. - Minimize query times for reporting queries against the dimensional model. Which type of table distribution should you use for each table? Values: - Hash - Replicated - Round-robin Answer Area: - Common.Data - Marketing.Web.Sessions - Staging.Web.Sessions

RIGHT 20100101, 20110101, 20120101

You have an enterprise data warehouse in Azure Synapse Analytics that contains a table named FactOnlineSales. The table contains data from the start of 2009 to the end of 2012.You need to improve the performance of queries against FactOnlineSales by using table partitions. The solution must meet the following requirements: ✑ Create four partitions based on the order date. ✑ Ensure that each partition contains all the orders placed during a given calendar year. How should you complete the T-SQL command?

C. DROP EXTERNAL TABLE [Ext].[Items] CREATE EXTERNAL TABLE [Ext].[Items] ( [ItemID] [int] NULL, [ItemName] nvarchar(50) NULL, [ItemType] nvarchar(20) NULL, [ItemDescription] nvarchar(250)) WITH ( LOCATION = '/Items/', DATA_SOURCE = AzureDataLakeStore, FILE_FORMAT = PARQUET, REJECT_TYPE = VALUE, REJECT_VALUE = 0);

You have an enterprise data warehouse in Azure Synapse Analytics. Using PolyBase, you create an external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2 without importing the data to the data warehouse. The external table has three columns. You discover that the Parquet files have a fourth column named ItemID. Which command should you run to add the ItemID column to the external table? A. ALTER EXTERNAL TABLE [Ext].[Items] ADD [ItemID] int; B. DROP EXTERNAL FILE FORMAT parquetfile1; CREATE EXTERNAL FILE FORMAT parquetfile1 WITH ( FORMAT_TYPE = PARQUET, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'); C. DROP EXTERNAL TABLE [Ext].[Items] CREATE EXTERNAL TABLE [Ext].[Items] ( [ItemID] [int] NULL, [ItemName] nvarchar(50) NULL, [ItemType] nvarchar(20) NULL, [ItemDescription] nvarchar(250)) WITH ( LOCATION = '/Items/', DATA_SOURCE = AzureDataLakeStore, FILE_FORMAT = PARQUET, REJECT_TYPE = VALUE, REJECT_VALUE = 0); D. ALTER TABLE [Ext].[Items] ADD [ItemID] int;

Box 1: openrowset Box 2: openjson

You need to use the serverless SQL pool in WS1 to read the files. How should you complete the Transact-SQL statement?

A. replicated

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB. You need to create the table to meet the following requirements: - Provide the fastest query time. - Minimize data movement during queries. Which type of table should you use? A. replicated B. hash distributed C. heap D. round-robin

1. Create an external data source that uses the abfs location 2. Create an external file format and set the First_Row option 3. Use CREATE EXTERNAL TABLE AS SELECT (CETAS) and configure the reject options to specify reject values or percentages

You have data stored in thousands of CSV files in Azure Data Lake Storage Gen2. Each file has a header row followed by a properly formatted carriage return (/ r) and line feed (/n). You are implementing a pattern that batch loads the files daily into a dedicated SQL pool in Azure Synapse Analytics by using PolyBase. You need to skip the header row when you import the files into the data warehouse. Before building the loading pattern, you need to prepare the required database objects in Azure Synapse Analytics. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. - Create a database scoped credential that uses Azure Active Directory Application and a Service Principal Key - Create an external data source that uses the abfs location - Use CREATE EXTERNAL TABLE AS SELECT (CETAS) and configure the reject options to specify reject values or percentages - Create an external file format and set the First_Row option

B. File1.csv and File4.csv only

You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit. You create an external table named ExtTable that has LOCATION='/topfolder/'. When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned? A. File2.csv and File3.csv only B. File1.csv and File4.csv only C. File1.csv, File2.csv, File3.csv, and File4.csv D. File1.csv only

No Yes Yes

You have the following Azure Stream Analytics query. For each of the following statements, select Yes if the statement is true. Otherwise, select No. The query combines two streams of partitioned data: Yes/No The stream scheme key and count must match the output scheme: Yes/No Providing 60 streaming units will optimize the performance of the query: Yes/No

Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy

You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format. You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements: - No transformations must be performed. - The original folder structure must be retained. - Minimize time required to perform the copy activity. How should you configure the copy activity? Source dataset type: - Binary - Parquet - Delimited text Copy activity copy behavior: - Flatten Hierarchy - Merge Files - Preserve Hierarchy

1. Create an external data source 2. Create an external file format object 3. Create an external table

You need to build a solution to ensure that users can query specific files in an Azure Data Lake Storage Gen2 account from an Azure Synapse Analytics serverless SQL pool. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. - Create an external file format object - Create an external data source - Create a query that uses Create Table as Select - Create a table - Create an external table

Box 1: CASE Box 2: ELSE

You need to calculate the employee_type value based on the hire_date value. How should you complete the Transact-SQL statement?

1. Distribution 2. Partition

You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool.How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. Values: - Clustered Index - Collate - Distribution - Partition - Partition Function - Partition Scheme

Box 1: dept == 'ecommerce', dept == 'retail', dept == 'wholesale' Box 2: disjoint: false Box 3: ecommerce, retail, wholesale, all

You need to create an Azure Data Factory pipeline to process data for the following three departments at your company: Ecommerce, retail, and wholesale. The solution must ensure that data can also be processed for the entire company. How should you complete the Data Factory data flow script? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

D. as a Type 2 slowly changing dimension (SCD) table

You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following requirements: - Can return an employee record from a given point in time. - Maintains the latest employee information. - Minimizes query complexity. How should you model the employee data? A. as a temporal table B. as a SQL graph table C. as a degenerate dimension table D. as a Type 2 slowly changing dimension (SCD) table

B. [CurrentProductCategory] [nvarchar] (100) NOT NULL, E. [OriginalProductCategory] [nvarchar] (100) NOT NULL,

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool. You have a table that was created by using the following Transact-SQL statement. Which two columns should you add to the table? A. [EffectiveEndDate] [datetime] NULL, B. [CurrentProductCategory] [nvarchar] (100) NOT NULL, C. [ProductCategory] [nvarchar] (100) NOT NULL, D. [EffectiveStartDate] [datetime] NOT NULL, E. [OriginalProductCategory] [nvarchar] (100) NOT NULL,

B. [CurrentProductCategory] [nvarchar] (100) NOT NULL, E. [OriginalProductCategory] [nvarchar] (100) NOT NULL,

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool. You have a table that was created by using the following Transact-SQL statement. Which two columns should you add to the table? Each correct answer presents part of the solution. A. [EffectiveStartDate] [datetime] NOT NULL, B. [CurrentProductCategory] [nvarchar] (100) NOT NULL, C. [EffectiveEndDate] [datetime] NULL, D. [ProductCategory] [nvarchar] (100) NOT NULL, E. [OriginalProductCategory] [nvarchar] (100) NOT NULL,

Columnar Format: Parquet JSON with a timestamp: Avro

You need to output files from Azure Data Factory. Which file format should you use for each type of output? Columnar Format: - Avro - GZip - Parquet - TXT JSON with a timestamp: - Avro - GZip - Parquet - TXT

D. event

You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container. Which type of trigger should you use? A. on-demand B. tumbling window C. schedule D. event


संबंधित स्टडी सेट्स

World History -REGENTS Practice Q's & Answers

View Set

MODULE 4M - STAGES OF GROWTH AND DEVELOPMENT

View Set

CH 2 Culture, Diversity, and Global Engagement Assignment details

View Set

Ch 13 Viruses Nester Microbiology

View Set

Mental Health Nursing Chapter 4: Setting for Psychiatric Care

View Set