MS Official Practice Questions

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

You have a table named sales in azure synapse analytics SQL pool with the following definition: CREATE TABLE Sales( ID [int] identity NOT NULL, Customer [int] NOT NULL, Amount [money] NOT NULL, Date [date] NOT NULL) WITH (DISTRIBUTION = HASH (CustomerKey), PARTITION ( [DATE] RANGE RIGHT FOR VALUES ('2018-01-01',' 2019-01-01',' 2020-01-01','2021-01-01'))); The table does not contain dates after the year 2020. You need to archive the oldest partition (with dates before January 1, 2018). The archive table does not exist, and the name should be SalesHistory. After archiving the old data, you need to add a partition into the sales table for the next year. Which for actions should you perform in sequence? Switch partition 1 of the sales table to partition 1 of the sales history table. Create a new partition in the sales history a table with this command: ALTER TABLE SalesHistory SPLIT RANGE ('2022-01-01'); Create the sales history table with the same column definitions, data distribution and partition boundaries as the sales table. Create a check constraint on the date column limiting the dates to smaller than '2018-01-01'. Create the sales history table with the same column definitions and data distribution as the sales table. Include one partition boundary as '2018-01-01'. Remove the boundary value '2018-01-01' in the Sales table by using: ALTER TABLE Sales MERGE RANGE ('2018-01-01'); Switch partition 1 of sales table to partition 0 of sales history table. Create a new partition in the sales table with this command: ALTER TABLE Sales SPLIT RANGE ('2022-01-01');

1. Create the sales history table with the same column definitions, data distribution and partition boundaries as the sales table. Create a check constraint on the date column limiting the dates to smaller than '2018-01-01'. 2. Switch partition 1 of the sales table to partition 1 of the sales history table. 3. Remove the boundary value '2018-01-01' in the Sales table by using: ALTER TABLE Sales MERGE RANGE ('2018-01-01'); 4. Create a new partition in the sales table with this command: ALTER TABLE Sales SPLIT RANGE ('2022-01-01'); 1. The physical structure of the Sales and SalesHistory table must be exactly the same. Partition switching is a metadata-only operation, so no actual data is moved or copied. The database engine must guarantee that all dates in a partition fall within the boundaries. This is only possible if the boundaries of the partitions that are involved in the switch process are equal . 2. By doing this you archive the data in the first partition. The data in the first partition of the Sales table is switched to the first partition of the SalesHistory Table. 3. The first partition in the Sales table is now empty, and it can be removed. MERGE RANGE removes a boundary value, and thus removes a partition. 4. SPLIT RANGE will create a new boundary, and thus a new partition. You should not create a check constraint on the SalesHistory table. You cannot create a check constraint on a date range in Azure Synapse Analytics. You should not switch partition 1 of Sales table to partition 0 of the SalesHistory Table. Partition 0 does not exist. You should not perform a partition split of the SalesHistory table. This is not a requirement. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=azure-sqldw-latest https://learn.microsoft.com/en-us/sql/relational-databases/tables/unique-constraints-and-check-constraints?view=sql-server-ver15

You are a data engineer for an Azure Synapse Analytics. You want to import data to a database from a large pipe-separated file in Azure blob storage container. You need to create a connection to the container. How should you complete the T-SQL statement? CREATE EXTERNAL _________ BlobStorage with ( TYPE = _____________ , LOCATION = 'wabs://[email protected]' ); 1. - Data source - File format - Table 2. - Blob_storage - Hadoop

1. DATA SOURCE 2. HADOOP The CREATE EXTERNAL DATA SOURCE statement creates the data source from which to import data. You must set the TYPE parameter to HADOOP when accessing data from Azure blob storage. The LOCATION parameter specifies the location of the blob container. You should not set the TYPE property to BLOB_STORAGE. BLOB_STORAGE does not represent Azure Blob Storage. It simply designates the data source as one that will be used with BULK IMPORT or OPENROWSET. It cannot be used with external data sources such as Azure blob storage. An external data source is one that is not located in Azure SQL Database. You should not use CREATE EXTERNAL TABLE. This statement allows you to define the columns to represent the external data source. However, you must first define the external data source. References - https://learn.microsoft.com/en-us/training/paths/build-data-analytics-solutions-using-azure-synapse-serverless-sql-pools/ https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-queries?view=sql-server-2017 https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=azuresqldb-current&tabs=dedicated https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-azure-blob-storage?view=sql-server-2017

You work as an Azure Data Engineer. You need to build Dim.Customer as an external table for an azure synapse analytics dedicated SQL pool with the following requirements: - It should be able to retrieve data from the </Customer/> directory. - It needs to use an external data source named Customer_ds. - It needs to use an external file format named Customer_ff. How should you complete the following SQL command? CREATE _______________________ [Dim].[Customer] ( CustomerID INT NOT NULL, CustomerName VARCHAR(100) CustomerEmail VARCHAR (100)) WITH ( LOCATION = ______________, ___________________ = Customer_ds, ___________________ = Customer_ff); Options : EXTERNAL TABLE DATA_SOURCE FILE_FORMAT '/Customer/' EXTERNAL DATA SOURCE

1. EXTERNAL TABLE 2. '/Customer/' 3. DATA_SOURCE 4. FILE_FORMAT The EXTERNAL TABLE reserved word allows you to create an external table followed by a table name. To retrieve data from the </Customer/> directory, you should set the Location property to </Customer/> in single quotes. You should use DATA_SOURCE to specify the name of the data source Customer_ds. You should use FILE_FORMAT to specify the file format Customer_ff. You should not use the EXTERNAL DATA SOURCE reserved word. This allows you to define a location and credential for an external data source to be used by external tables. Reference - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables?tabs=hadoop https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest&preserve-view=true&tabs=dedicated https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=sql-server-linux-ver16&viewFallbackFrom=azure-sqldw-latest

CASE: You are designing an Azure SQL Database solution. You are designing a table named Sales that will contain sales records for the company. A column named SalesRep will represent the sales representative associated with the sale. The SalesRep column will be of type SYSNAME. Only the sales representative associated with a sale should be able to view sales data. You need to choose the security mechanism. Solution: You implement Always Encrypted. Does this solution meet the goal? Y or N

No

CASE: You are designing an Azure SQL Database solution. You are designing a table named Sales that will contain sales records for the company. A column named SalesRep will represent the sales representative associated with the sale. The SalesRep column will be of type SYSNAME. Only the sales representative associated with a sale should be able to view sales data. You need to choose the security mechanism. Solution: You implement Transparent Data Encryption (TDE). Does this solution meet the goal? Y or N

No

CASE: You are designing an Azure SQL Database solution. You are designing a table named Sales that will contain sales records for the company. A column named SalesRep will represent the sales representative associated with the sale. The SalesRep column will be of type SYSNAME. Only the sales representative associated with a sale should be able to view sales data. You need to choose the security mechanism. Solution: You implement column-level encryption (CLE). Does this solution meet the goal? Y or N

No

CASE: You are migrating an application and its on-premises SQL Server Enterprise Edition database to the cloud. application and database changes should be kept to a minimum during migration. You want to choose the exact amount of compute resources dedicated to the workload. Management overhead should be kept to a minimum. You need to choose an appropriate deployment and purchase model to meet your needs. Solution: You choose a single database deployment and vCore pricing model. Does this solution meet the goal? Y or N

No

CASE: You are migrating an application and its on-premises SQL Server Enterprise Edition database to the cloud. application and database changes should be kept to a minimum during migration. You want to choose the exact amount of compute resources dedicated to the workload. Management overhead should be kept to a minimum. You need to choose an appropriate deployment and purchase model to meet your needs. Solution: You choose an elastic pool deployment and eDTU pricing model. Does this solution meet the goal? Y or N

No

CASE: You are using Azure Stream Analytics (ASA) to process real-time defect detection events from your factory's production lines. You need to select the right built-in windowing function in your ASA job's SELECT query, so that you can: - Group events per line if they occur within specific time intervals between each other, but not exceeding maximum duration time set for the window. - Filter out periods of time when no defects are reported. - Count each event only once. Solution: In the ASA job query, you group events by using the hopping window. Does this solution meet the goal? Y or N

No

CASE: You are using Azure Stream Analytics (ASA) to process real-time defect detection events from your factory's production lines. You need to select the right built-in windowing function in your ASA job's SELECT query, so that you can: - Group events per line if they occur within specific time intervals between each other, but not exceeding maximum duration time set for the window. - Filter out periods of time when no defects are reported. - Count each event only once. Solution: In the ASA job query, you group events by using the sliding window. Does this solution meet the goal? Y or N

No

CASE: You are using Azure Stream Analytics (ASA) to process real-time defect detection events from your factory's production lines. You need to select the right built-in windowing function in your ASA job's SELECT query, so that you can: - Group events per line if they occur within specific time intervals between each other, but not exceeding maximum duration time set for the window. - Filter out periods of time when no defects are reported. - Count each event only once. Solution: In the ASA job query, you group events by using the tumbling window. Does this solution meet the goal? Y or N

No

CASE: You create a blob storage account. You must be able to administer the storage account via the REST API. You need to provide secure access to the storage account. Solution: You create a service shared access signature (SAS). Does this solution meet the goal? Y or N

No

CASE: You have a data pipeline in Azure Synapse Analytics that ingests data from multiple sources. You need to process personal information such as first name, last name, and social security number (SSN). Because the data comes from different sources, the data contains duplicate records and other issues such as the following: - In some of the records, the names are mixed cases (Customer), while in others, they appear uppercase (CUSTOMER). - The SSNs are sometimes hyphenated (123-45-6789), while others are missing hyphens (123436789). You need to remove duplicates from the input data and normalize the records. Solution: You transform the data using SQL pool with the SQL query below: with deduplicateData AS ( SELECT UPPER([FirstName]) as UpperFirstName, UPPER([LastName]) as UpperLastName, CAST([SSN] as INT) as NumberSSN FROM [CustomerDataExternalTable] ) SELECT * from deduplicateData GROUP BY UpperFirstName, UpperLastName, NumberSSN Does this solution meet the goal? Y or N

No

CASE: You have a data pipeline in Azure Synapse Analytics that ingests data from multiple sources. You need to process personal information such as first name, last name, and social security number (SSN). Because the data comes from different sources, the data contains duplicate records and other issues such as the following: - In some of the records, the names are mixed cases (Customer), while in others, they appear uppercase (CUSTOMER). - The SSNs are sometimes hyphenated (123-45-6789), while others are missing hyphens (123436789). You need to remove duplicates from the input data and normalize the records. Solution: You transform the data using Spark pool with the Scala Code below: val dedupedDF = inputDataFrame.dropDuplicates() Does this solution meet the goal? Y or N

No

CASE: You manage an Azure SQL Database containing a column with sensitive data. The column is used for joins by an application. You need to configure encryption for this database to protect the column. Solution: You configure Always Encrypted with a randomized type. Does this solution meet the goal? Y or N

No

CASE: You manage an Azure SQL Database containing a column with sensitive data. The column is used for joins by an application. You need to configure encryption for this database to protect the column. Solution: You configure dynamic data masking (DDM) with partial masking. Does this solution meet the goal? Y or N

No

CASE: You manage an Azure SQL Database containing a column with sensitive data. The column is used for joins by an application. You need to configure encryption for this database to protect the column. Solution: You configure dynamic data masking (DDM) with random masking. Does this solution meet the goal? Y or N

No

CASE: You use Azure Stream Analytics to stream real-time IoT sensor data for a brewery company. You need to use a window function with a fixed-size. Events should belong to a single window. Solution: You analyze the stream with a hopping window function. Does the solution meet the goal?

No

CASE: You use Azure Stream Analytics to stream real-time IoT sensor data for a brewery company. You need to use a window function with a fixed-size. Events should belong to a single window. Solution: You analyze the stream with a session window function. Does the solution meet the goal?

No

CASE: You use Azure Stream Analytics to stream real-time IoT sensor data for a brewery company. You need to use a window function with a fixed-size. Events should belong to a single window. Solution: You analyze the stream with a sliding window function. Does the solution meet the goal?

No

CASE: You work as an Azure Data Engineer. You need to build a Product dimension that is able to store historical changes to product sales prices with a flag marking the current price as Active. Solution: You implement slowly changing dimension (SCD) Type 3. Does this solution meet the goal? Y or N

No

CASE: You work as an Azure Data Engineer. You need to transform an incoming JSON file into a relational structure using Azure Data Factory. Solution: You use the pivot transformation in mapping data flow. Does this solution meet the goal? Y or N

No

CASE: You work as an Azure Data Engineer. You need to transform an incoming JSON file into a relational structure using Azure Data Factory. Solution: You use the unpivot transformation in mapping data flow. Does this solution meet the goal? Y or N

No

CASE: You work with Azure Synapse Analytics solution. You need to transform incoming stream based on the following criteria: - Rows prior to the year 2000 need to be assigned to an output stream named Processed. - Rows after the year 2000 need to be assigned to another stream named Unprocessed for further processing. Solution: You use the filter transformation in mapping data flow. Does this solution meet the goal?

No

CASE: You work with Azure Synapse Analytics solution. You need to transform incoming stream based on the following criteria: - Rows prior to the year 2000 need to be assigned to an output stream named Processed. - Rows after the year 2000 need to be assigned to another stream named Unprocessed for further processing. Solution: You use the lookup transformation in mapping data flow. Does this solution meet the goal?

No

CASE: Your company is developing an Azure Data Solution that uses Azure Blob Storage. A select set of Azure Active Directory (Azure AD) accounts should be provided access to storage during the initial test period. Access should expire automatically after 30 days. You want to avoid using the storage account key to provide access. You must be able to manage access specifics through role-based access control (RBAC). You need to configure authentication and access to the storage blob. Solution: You request a user delegation key and configure shared access key. Does this solution meet the goal? Y or N

No

CASE: Your company is developing an Azure Data Solution that uses Azure Blob Storage. A select set of Azure Active Directory (Azure AD) accounts should be provided access to storage during the initial test period. Access should expire automatically after 30 days. You want to avoid using the storage account key to provide access. You must be able to manage access specifics through role-based access control (RBAC). You need to configure authentication and access to the storage blob. Solution: You request a user delegation key and configure the account level shared access signature (SAS). Does this solution meet the goal? Y or N

No

CASE: Your company supports a software as a service (SaaS) application with a large number of customers. You are designing a support database to use elastic pools and elastic queries to retrieve customer information. Customer information is stored in table that includes values for CustomerID and RegionalID. You need to partition data to optimize queries by customer sorted by geographic location. The solution should minimize support costs. You need to recommend a partition strategy. Solution: You configure horizontal partitioning based on CustomerID. Does this solution meet the goal? Y or N

No

CASE: Your company supports a software as a service (SaaS) application with a large number of customers. You are designing a support database to use elastic pools and elastic queries to retrieve customer information. Customer information is stored in table that includes values for CustomerID and RegionalID. You need to partition data to optimize queries by customer sorted by geographic location. The solution should minimize support costs. You need to recommend a partition strategy. Solution: You configure vertical partitioning based on CustomerID. Does this solution meet the goal? Y or N

No

You work as an Azure Data Engineer. You have an Employee table in an Azure Synapse Analytics - dedicated SQL pool as shon in the exhibit. You need to store the ManagerID in the Employee table. Which statement should you use to meet the requirement? - ALTER TABLE Dim.Employee ADD ManagerID BIGINT NOT NULL - ALTER TABLE Dim.Employee ADD ManagerID VARCHAR(50) - ALTER TABLE Dim.Employee ADD ManagerID INT - ALTER TABLE Dim.Employee ADD ManagerID BIGINT

- ALTER TABLE Dim.Employee ADD ManagerID BIGINT Adding a [ManagerID] column to the employee table allows you to store the manager's employee ID. When an employee is a manager and does not have any superior manager, the entry can have a blank or NULL in it. EmployeeID and ManagerID need to be of the same data type. You should not use the following statement: - ALTER TABLE Dim.Employee ADD ManagerID BIGINT NOT NULL This statement cannot be used as it does not allow NULLs to be inserted for the ManagerID column. When an employee is a manager and does not have any superior manager, this will not allow a NULL value in ManagerID column. You should not use the following statements: - ALTER TABLE Dim.Employee ADD ManagerID VARCHAR(50) - ALTER TABLE Dim.Employee ADD ManagerID INT These statements cannot be used to meet the requirement to store employee manager ID. Both EmployeeID and ManagerID need to be of same data tyoe. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview https://learn.microsoft.com/en-us/analysis-services/tabular-models/hierarchies-ssas-tabular?view=asallproducts-allversions

You are building a dimension table in an Azure Synapse SQL pool. The dimension has the following initial structure: CREATE TABLE dbo.[DimProduct] ( [ProductID] [int] NOT NULL. [Name] [nvarchar] (8) NULL, [Description] [nvarchar] (256) NULL, [Supplier] [nvarchar] (50) NULL, [isCurrent] [bit] NOT NULL, [HashKey] [char] (66) )WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX) Some columns like the product description and the supplier may change over time and you need to keep these historical changes. You need to update the initial table structure to make this dimension a Type 2 slowly changing dimension (SCD). What should you do? - Add InsertedDate and ModifiedDate columns to represent when the product was changed. - Add StartDate and EndDate columns to represent the Product state at a given point in time. - Rename the Supplier columns to CurrentSupplier and add a new OriginalSupplier Column. - Remove the isCurrent column.

- Add StartDate and EndDate columns to represent the Product state at a given point in time. You should add StartDate and EndDate columns to represent the Product state at a given point in time. In a Type 2 SCD, Changes in a given product are added to a new row, with the StartDate updated to that moment and the optional isCurrent column updated as 1 (representing a True value). The previous record EndDate is also update, and the IsCurrent column is set to 0 (representing a False value). You should not remove the isCurrent column. This column can be used to easily filter the current Product state in a Type 2 SCD. You should not add InsertedDate and ModifiedDate columns to represent when the product was changed. These fields are usually used with a Type 1 SCD, where dimension columns are changed in the same row and the ModifiedDate field is also updated. You should not rename the Supplier column to CurrentSupplier and add a new OriginalSupplier column. These fields are usually used with a Type 2 SCD, where the dimension fields prefixed with Current are updated in the same row, keeping the original or previous value in another field. References - https://learn.microsoft.com/en-us/training/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

CASE: Company A develops inventory management software. Its flagship product allows employees to scan product barcodes in stores and have that data delivered to a central repository. A website allows supervisors to view the data sent to the repository. Inventory data consists of the following fields: - UPC - Name - Description - Quantity - Store ID - Aisle Number - Price - Expiration Date Inventory Data is currently store in two Microsoft SQL Server databases. One database resides in California, and the other database resides in New York. Over 200 TB of total data is store across two databases. The scanners submit inventory data to an application server over HTTPS. A service on the application server then analyzes the data and sends it to the databases. Th new solution must allow processing of the inventory data in batches every hour. After the data is processed, it must be kept for at lest two years. It must be stored in such a way that parallel queries can be run against the data. Business stakeholders must be able to graphically visualize the data without writing any code. Data engineers must be able to geographically visualize the data by using python. The data engineers at your company are familiar with C#, Python, and SQL. Any recommended solution must take advantage of their existing skills. You need to design a solution for storing the data in the long term. Which resource should you use? - Azure Databricks - Azure Data Factory - Azure SQL Data Warehouse - Azure SQL Database

- Azure Databricks - Azure Data Factory - Azure SQL Data Warehouse - Azure SQL Database

CASE: Company A develops inventory management software. Its flagship product allows employees to scan product barcodes in stores and have that data delivered to a central repository. A website allows supervisors to view the data sent to the repository. Inventory data consists of the following fields: - UPC - Name - Description - Quantity - Store ID - Aisle Number - Price - Expiration Date Inventory Data is currently store in two Microsoft SQL Server databases. One database resides in California, and the other database resides in New York. Over 200 TB of total data is store across two databases. The scanners submit inventory data to an application server over HTTPS. A service on the application server then analyzes the data and sends it to the databases. Th new solution must allow processing of the inventory data in batches every hour. After the data is processed, it must be kept for at lest two years. It must be stored in such a way that parallel queries can be run against the data. Business stakeholders must be able to graphically visualize the data without writing any code. Data engineers must be able to geographically visualize the data by using python. The data engineers at your company are familiar with C#, Python, and SQL. Any recommended solution must take advantage of their existing skills. You need to design a solution to allow business stakeholders to visualize the data. What should you use? - Azure Databricks - Power BI - Stream Analytics - Azure Data Lake

- Azure Databricks - Power BI - Stream Analytics - Azure Data Lake

CASE: You work as an Azure Data Engineer. You need to design an analytical solution that meets the following requirements: - It should be able to analyze Twitter data feeds related to specific product launches. - It should be able to send Twitter data feeds for analysis every hour. Which solution should you use? - Azure Synapse Analytics - Azure Databricks - Azure Stream Analytics - Power BI

- Azure Stream Analytics You should use Azure Stream analytics. This allows you to analyze streaming datasets and integrate them with stream analytics engine to provide real-time analysis. You can configure Event Hub and a Stream Analytics job to meet the requirement. Azure Stream Analytics provides the best way to meet the requirements as it seamlessly connects between Azure services. You should not use Azure Synapse Analytics. This provides a single environment for analyzing big data and data warehouse solutions. You can integrate big data with azure data warehouse and analyze it together using Azure Synapse analytics. You should not use Azure Databricks. This allows you to build cloud-based Machine Learning solutions using apache spark. You could use Databricks, but you would need to configure Event Hub and Twitter APIs in Spark notebooks to meet the requirement. It requires additional scala coding to configure Event Hub and read twitter feeds. You should not use Power BI. This provides you the ability to connect with hundreds of available data sources and turn your data into insights. It allows you to build reports and dashboards using existing data. References - https://learn.microsoft.com/en-us/training/modules/ingest-data-streams-with-azure-stream-analytics/3-process-streaming-data-azure-stream-analytics?ns-enrollment-type=LearningPath&ns-enrollment-id=learn.data-ai.implement-data-streaming-with-asa https://learn.microsoft.com/en-us/training/modules/introduction-azure-synapse-analytics/2-what-happening-business?ns-enrollment-type=LearningPath&ns-enrollment-id=learn.wwl.realize-integrated-analytical-solutions-with-azure-synapse-analytics https://learn.microsoft.com/en-us/training/modules/explore-azure-databricks/ https://learn.microsoft.com/en-us/power-bi/fundamentals/power-bi-overview

CASE: You are building a modern data warehouse solution for your company. Consumer and sales data is currently stored in an Azure SQL Database, while the product catalog is maintained in a Cosmos DB backend. The marketing department also has access to the market research data, published weekly by a third-party vendor on their Amazon S3 storage in CSV format. The Marketing team wants to use PowerBI as a reporting tool and query against a single consolidated dataset in Azure Synapse Analytics. You have internal data scientists who can help with the data transformation and consolidation. Your company is using Cherwell as its service desk platform. You establish a bi-directional connection between your Azure subscription and Cherwell using a connector from the Azure Marketplace, but you have not used it yet. The new data warehouse solution must meet the following business requirements: - Unauthorized users should not be able to see the contact details of consumers in Azure Synapse Analytics. - System owners want to enforce the data retention policy and every month delete consumers who were not active for more than two years. The new Data warehouse solution must meet the following technical requirements: - Market research data must be copied over to Azure and retained in its original format, storing files in a year and month-based hierarchal structure. - Incidents with Azure Synapse Analytics based on Azure Monitor alerts need to be logged automatically in Cherwell using an existing Azure connection. You need to select an Azure resource to store and market research raw data. Which resource should you choose? - Azure Table Storage - Azure Data Lake Storage Gen2 (ADLSG2) - Azure Managed Disks - Azure Cosmos DB

- Azure Table Storage - Azure Data Lake Storage Gen2 (ADLSG2) - Azure Managed Disks - Azure Cosmos DB

You have a Synapse serverless SQL pool named pool1 to read parquet files stored in a Data Lake Store Gen 2 named census. The parquet files are stored in a container named censusdata and in the prefix/population. Pool1 has a scoped credential named census_credential configured with a managed identity. You plan to create an external table ith the following query: CREATE EXTERNAL TABLE demographic_external_table ( censusYear as INT, cityName as VAR_CHAR (120), stateName as VAR_CHAR (80), population as INT) WITH( LOCATION = '/population/', DATA_SOURCE = census_datasource, FILE_FORMAT = population_file_format ) You need to create the file format and data source for the external table. Which two queries should you run before creating the external table? Each correct answer presents part of the solution. - CREATE EXTERNAL FILE FORMAT population_file_format WITH ( FORMAT_FILE = DELIMITEDTEXT) - CREATE EXTERNAL DATA SOURCE census_datasource WITH ( LOCATION = 'https://census.dfs.core.windows.net/censusdata/population',TYPE = HADOOP, CREDENTIAL = census_credential) - CREATE EXTERNAL FILE FORMAT population_file_format WITH ( FORMAT_FILE = PARQUET) - CREATE EXTERNAL DATA SOURCE census_datasource WITH ( LOCATION = 'https://census.azuredatalakestore.net/webdfs/v1',TYPE = HADOOP, CREDENTIAL = census_credential)

- CREATE EXTERNAL DATA SOURCE census_datasource WITH ( LOCATION = 'https://census.dfs.core.windows.net/censusdata/population',TYPE = HADOOP, CREDENTIAL = census_credential) - CREATE EXTERNAL FILE FORMAT population_file_format WITH ( FORMAT_FILE = PARQUET) You should run the above two queries before creating the external table. You should create the external data source using the above https link. This location creates a data source for a native external table pointing to the censusdata container in the census storage account. You should use a URL with the https://<storage>.dfs.core.windows.net/<container> format to connect to a Data Lake Store Gen2. You should not create the external data source using the HADOOP type. You can create two types of external tables in Azure Synapse Analytics: Hadoop and native external table. The external table type depends on the external data source created and the SQL pool used. A serverless SQL pool only supports native external tables, so you need to omit the TYPE = HADOOP to create the external table for pool1. You should not the external file format with DELIMITEDTEXT as the format type. You should use this format type to read data stored in comma-separated values (CSV) files. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables?tabs=native https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest&preserve-view=true&tabs=dedicated https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest&tabs=delimited

You work as a data engineer for a company that uses multiple data storage solutions. You plan to use Azure Data Factory to copy data between the different data storages using a Copy activity. The copy activity should copy the files in binary format and preserve the files' metadata. You plan to copy data between the following data storages: - Marketing: Data stored in ADLS G2 - Sales: Data stored in an Amazon S3 bucket - Legal: Data stored in Azure File Storage - IT: Data stored in Azure Blob Storage You need to identify which combinations of data storage can you use as sources and sinks in the Copy activity. Which three sources and sinks you should use? (Can be multiple) - Copy data from Legal to IT - Copy data from Legal to Sales - Copy data from Sales to Marketing - Copy data from Marketing to Legal - Copy data from IT to Sales

- Copy data from Legal to IT - Copy data from Sales to Marketing - Copy data from Marketing to Legal You should be able to copy data and preserve file metadata between the above data storages. All these data storage solutions support preserving files metadata during the Copy activity as a data source. To preserve file metadata, you should enable the Attributes option from the Preserve configuration. You should not use the Sales data storage as a sink to copy data. A sink storage represents the destination where the data is moving to. Amazon S3 buckets are only supported as a source for a Copy activity. References - https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-preserve-metadata https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview

You have an Azure data warehouse solution as shown in the exhibit. You need to implement a relationship between FactSales and DimCalendar tables in such a way that users can filter the sales amount based on sales date. Which should you do? - Create a one to many relationship between the [Date] column of the DimCalendar table and the [ShipDate] column of the FactSales table. - Create a one to many relationship between the [Date] column of the DimCalendar table and the [SaleDate] column of the FactSales table. - Create a many to one relationship between the [ShipDate] column of the FactSales table and the [Date] column of the DimCalendar table. - Create a one to many relationship between the [Date] column of the DimCalendar table and the [OrderDate] column of the FactSales table.

- Create a one to many relationship between the [Date] column of the DimCalendar table and the [SaleDate] column of the FactSales table. You should create a one to many relationship between the [Date] column of the DimCalendar table and the [SaleDate] column of the FactSales table. The Azure data warehouse shown in the exhibit is an example of star schema. DimCalendar represents the one side of the relationship while FactSales can have multiple entries for the same sales date. In order to filter the sales amount based on sales date, you need to choose the [SaleDate] column from the FactSales table and connect it with [Date] column of the DimCalendar table. You should not create a many to one relationship between the [ShipDate] column of the FactSales table and the [Date] column of the DimCalendar table. FactSales represents the many side of the relationship while DimCalendar can only have one entry for the same ship date. This filters out the sales amount based on the [Ship Date] column. You should not create a one to many relationship between the [Date] column of the DimCalendar table and the [OrderDate] column of the FactSales table. This filters out the sales amount based on the order date from the FactSales table. You should not create a one to many relationship between the [Date] column of the DimCalendar table and the [ShipDate] column of the FactSales table. This filters out the sales amount based on the ship date from the FactSales table. References - https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/2-design-star-schema https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-create-and-manage-relationships

CASE: You are building a modern data warehouse solution for your company. Consumer and sales data is currently stored in an Azure SQL Database, while the product catalog is maintained in a Cosmos DB backend. The marketing department also has access to the market research data, published weekly by a third-party vendor on their Amazon S3 storage in CSV format. The Marketing team wants to use PowerBI as a reporting tool and query against a single consolidated dataset in Azure Synapse Analytics. You have internal data scientists who can help with the data transformation and consolidation. Your company is using Cherwell as its service desk platform. You establish a bi-directional connection between your Azure subscription and Cherwell using a connector from the Azure Marketplace, but you have not used it yet. The new data warehouse solution must meet the following business requirements: - Unauthorized users should not be able to see the contact details of consumers in Azure Synapse Analytics. - System owners want to enforce the data retention policy and every month delete consumers who were not active for more than two years. The new Data warehouse solution must meet the following technical requirements: - Market research data must be copied over to Azure and retained in its original format, storing files in a year and month-based hierarchal structure. - Incidents with Azure Synapse Analytics based on Azure Monitor alerts need to be logged automatically in Cherwell using an existing Azure connection. You need to enable the required protection of consumer contact details in Azure Synapse Analytics. What should you do? - Create a secret in Azure Key Vault - Enable Dynamic data masking (DDM). - Enable Transparent data encryption (TDE). - Enable Row-Level Security (RLS).

- Create a secret in Azure Key Vault - Enable Dynamic data masking (DDM). - Enable Transparent data encryption (TDE). - Enable Row-Level Security (RLS).

You work as a data engineer. You manage a SQL Sever Managed Instance named Database1. You need to build an external table in Database1 named Dim.Product from an oracle database using Polybase. Which three steps should you perform in sequence? - Create external file format - Create master key encryption - Create database scoped credential - Create external table Dim.Product - Create external data source

- Create database scoped credential - Create external data source - Create external table Dim.Product You first need to create a database scoped credential. This allows you to store an identity and a password to connect to an external data source. You should then create an external data source. You can define a location and credential for an external data source to be used for this table. You should not use create external file format. This allow you to define external file format, it is required when you external table needs to refer to Hadoop files. You should not use create master key encryption. This allows you to create a Master encryption key. References - https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-ver15&tabs=dedicated https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-as-select-transact-sql?view=sql-server-linux-ver16&viewFallbackFrom=azure-sqldw-latest https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15

CASE: You are implementing a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You plan to use a relational star schema to import the tables shown in the exhibit. The data warehouse needs to store more than 10,000 products and more than 100,000 sales orders data from the last five years. You plan to use the fact tables for large aggregations involving the sale quantity and revenue. You need to decide how to implement the dimensions and fact tables with optimal performance. Which two actions should you perform to meet the goal? Each correct answer presents part of the solution. - Recreate the foreign keys in the data warehouse to ensure table constraints. - Denormalize the order_items and orders tables to create the Sales fact table and add a clustered index. - Denormalize the order_index and orders tables to create the Sales fact table and add a columnstore index. - Denormalize the products, category, and sub_category tables to create the Products dimension.

- Denormalize the order_index and orders tables to create the Sales fact table and add a columnstore index. - Denormalize the products, category, and sub_category tables to create the Products dimension. You should denormalize the products, category, and sub_category tables to create the Products dimension. In relational star schema, you need to denormalize tables that represent business entities to create the dimension tables. If you keep the dimension table normalized, this would result in a snowflake schema instead. You should also denormalize the order_items and orders tables to create the Sales fact table and add a column-store index. You should denormalize the tables to create the fact table to complete the star schema and include a column-store index. Using a column store index significantly improves queries that involve aggregation in large fact tables, which is a really common feature of these tables. You should not recreate the foreign keys in the data warehouse to ensure tables constraints. Foreign keys are not supported in dedicated SQL pools. Foreign keys are also discarded in data warehouses because they might negatively affect query performance, especially in insert operations. You should not denormalize the order_items and orders tables to create the Sales fact table and add a clustered index. Although you need to denormalize the table to create the Sales fact table, for analytical workloads you should use a column-store index instead. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/2-design-star-schema https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/ https://learn.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-ver15

CASE: Company A develops inventory management software. Its flagship product allows employees to scan product barcodes in stores and have that data delivered to a central repository. A website allows supervisors to view the data sent to the repository. Inventory data consists of the following fields: - UPC - Name - Description - Quantity - Store ID - Aisle Number - Price - Expiration Date Inventory Data is currently store in two Microsoft SQL Server databases. One database resides in California, and the other database resides in New York. Over 200 TB of total data is store across two databases. The scanners submit inventory data to an application server over HTTPS. A service on the application server then analyzes the data and sends it to the databases. Th new solution must allow processing of the inventory data in batches every hour. After the data is processed, it must be kept for at lest two years. It must be stored in such a way that parallel queries can be run against the data. Business stakeholders must be able to graphically visualize the data without writing any code. Data engineers must be able to geographically visualize the data by using python. The data engineers at your company are familiar with C#, Python, and SQL. Any recommended solution must take advantage of their existing skills. You need to design a solution to allow data engineers to visualize the data. Which resource should you use? - Power BI - Stream Analytics - Azure Data Lake - Azure Databricks

- Power BI - Stream Analytics - Azure Data Lake - Azure Databricks

You implement an Azure SQL Database managed instance to support a business-critical application. You must design and implement disaster recovery strategy to ensure operations in case of a regional failure in the primary region. Your solution must support automatic failover configured through user-defined policy. You need to identify the best solution to meet your requirements. What should you recommend. - Geo-redundant storage (GRS) - Failover group - SQL Server replication - Geo-replication

- Failover group You should recommend a failover group. You can use a failover group to replicate a managed instance to geographically diverse locations, protecting against regional failure or unavailability. Failover can be manual or can be automatic through a user defined policy. Secondary instances support read-only access. You should not recommend geo replication. Even though failover group is based on the same underlying technology as geo replication, geo replication does not support Azure SQL Database managed instances. You should not recommend SQL Server replication. This does provide for replicated instances of SQL Server, but it does not provide for automated failover in case of failure. You should not recommend GRS. GRS provides disaster recovery for Azure storage but it does not replicate the SQL Sever instance. References - https://learn.microsoft.com/en-us/azure/azure-sql/database/auto-failover-group-sql-db?view=azuresql&tabs=azure-powershell https://learn.microsoft.com/en-us/azure/azure-sql/database/active-geo-replication-overview?view=azuresql https://learn.microsoft.com/en-us/azure/azure-sql/database/replication-to-sql-database?view=azuresql

You are setting up storage support for an application that uses Azure Data Lake Storage Gen 2. A primary concern is disaster recovery. You will need to implement a disaster recovery solution that helps to ensure recovery after a catastrophic event resulting in widespread failure throughout a region. What should you use? - Geo - replication - Geo - redundant storage (GRS) - Zone - redundant storage (ZRS) - Locally - redundant storage (LRS)

- Geo - redundant storage (GRS) You should implement GRS. With GRS, data remains available even if an entire datacenter becomes unavailable or if there is widespread regional failure. Alternately you could implement read-access geo-redundant storage (RA-GRS), which provides read-access to the data in alternate locations. You should not implement LRS or ZRS. LRS does not provide protection in case of a zone or regional failure. ZRS protects against zone failure but not regional failure. You should not implement geo-replication. This is a feature of Azure SQL Database and is not supported by Azure Data Lake Storage Gen2. Geo - replication replicates a full SQL Server instance to another location. References - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy https://www.skylinesacademy.com/blog/2019/7/31/azure-storage-replication https://learn.microsoft.com/en-us/azure/azure-sql/database/active-geo-replication-overview?view=azuresql

You are creating a solution that uses Azure Databricks to process data pipelines as Spark jobs. These jobs need to retrieve the data from an Azure Data Lake Storage Gen 2. You have the following requirements to deliver the data that is processed by these Spark jobs: - The data schema definition should be stored with the data. - Data compression and querying optimization should be supported. You need to use the best data format for this solution. Which data format would you use? - Parquet files - Zip files - JSON Files - CSV files

- Parquet files You should use parquet files. A parquet file includes the schema definition together with the data itself. It also supports data compression mechanisms that result in best query performance from Spark jobs when compared with JSON or CSV files. You should not use JSON or CSV files. These file formats do not support data compression or schema definition together with the data. You need to define the data schema separately or infer the schema during runtime, which results in suboptimal query performance. You should not use zip files. Spark does not support zip files as a compression method. You need to include additional steps before decompressing the data from zip files. References - https://learn.microsoft.com/en-us/azure/databricks/external-data/parquet https://learn.microsoft.com/en-us/azure/databricks/external-data/csv https://learn.microsoft.com/en-us/azure/databricks/external-data/json https://learn.microsoft.com/en-us/azure/databricks/external-data/zip-files

CASE: Company A develops inventory management software. Its flagship product allows employees to scan product barcodes in stores and have that data delivered to a central repository. A website allows supervisors to view the data sent to the repository. Inventory data consists of the following fields: - UPC - Name - Description - Quantity - Store ID - Aisle Number - Price - Expiration Date Inventory Data is currently store in two Microsoft SQL Server databases. One database resides in California, and the other database resides in New York. Over 200 TB of total data is store across two databases. The scanners submit inventory data to an application server over HTTPS. A service on the application server then analyzes the data and sends it to the databases. Th new solution must allow processing of the inventory data in batches every hour. After the data is processed, it must be kept for at lest two years. It must be stored in such a way that parallel queries can be run against the data. Business stakeholders must be able to graphically visualize the data without writing any code. Data engineers must be able to geographically visualize the data by using python. The data engineers at your company are familiar with C#, Python, and SQL. Any recommended solution must take advantage of their existing skills. You need to design a solution for storing the initial inventory. Which resource should you use? - Power BI - Event Hub - Azure SQL Data Warehouse - Azure Data Lake

- Power BI - Event Hub - Azure SQL Data Warehouse - Azure Data Lake

You are a data engineer for your company. You create an Azure synapse analytics table that uses round-robin distribution. You write the following query to retrieve store sales data: SELECT S.[Amount] AS [Sales], ST.[Name] From [FactSalesByStore] AS S JOIN [DimStore] AS ST on S.[StoreId] = ST.[StoreId] This query returns over 200,000 records, and it runs slowly. There are over 50,000 stores. You need to improve the performance of this query. Which of the following are true? (can be multiple) - Using hash distribution on StoreId for FactSalesByStore improves the performance of this query. - Using hash distribution on StoreId for DimStore improves the performance of the query. - Using replicated distribution for DimStore improves the performance of the query. - Using an outer join instead of an inner join improves the performance of the query.

- Using hash distribution on StoreId for FactSalesByStore improves the performance of this query. - Using hash distribution on StoreId for DimStore improves the performance of the query. - Using replicated distribution for DimStore improves the performance of the query. Using hash distribution on StoreId for FactSalesByStore improves the performance of the query. Hash distribution share data across compute nodes by placing all data that uses the same hash key on the same compute node. Using hash distribution on StoreId for DimStore improves the performance of the query. Hash distribution share data across compute nodes by placing all data that uses the same hash key on the same compute node. Using replicated distribution for DimStore improves the performance of the query. Replicated distribution copies the same data to all the compute nodes. This is useful when you want to read data from small fact tables. Using an outer join instead of an inner join does not improve the performance of the query. This would return more rows than necessary, and it would therefore not improve performance. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://learn.microsoft.com/en-gb/training/paths/realize-integrated-analytical-solutions-with-azure-synapse-analytics/

You have a table in Azure Synapse Analytics dedicated SQL pool to represent the product categories for a retail company. The table was created using the following SQL Query: CREATE TABLE dbo.[DimProductCategory] ( [CategoryKey] [int] IDENTITY (1,1) NOT NULL, [CategoryID] [int] NOT NULL, [Name] [nvarchar] (25) NOT NULL, [EnglishDescription] [nvarchar] (256) NULL, [SpanishDescription] [nvarchar] (256) NULL) You need to add a column to the table to create a hierarchy of categories by using a self-referencing relationship. You should be able to filter all top-level categories where the parent category is defined as null. Which column should you add to the table? - [ParentCategoryId] [int] NOT NULL - [ParentCategoryId] [int] NULL - [ParentName] [nvarchar] (25) NULL - [ParentCategoryKey] [int] NULL

- [ParentCategoryKey] [int] NULL You should add the ParentCategoryKey column. The category key represents a surrogate key for this dimension table. Creating the self-referencing relationship based on the surrogate key guarantees that changes in the business key (CategoryId) do not interfere with the dimension hierarchy. You should not add the ParentCategoryId column. The category id represents the business key that the dimension table is based on. This would not guarantee that a possible change in the business key would not interfere with the dimension hierarchy. You should not add the ParentName column. The category name is not ia key field to create a self - referencing relationship. It might result in incorrect hierarchies in case a category has the same name. You should not add the ParentCategoryId column with a NOT NULL constraint. Despite the fact that the category id is not using the surrogate key, the not null constraint will remove the possibility to define a top level category as null, like a category that does not have a parent. References - https://learn.microsoft.com/en-us/analysis-services/multidimensional-models/parent-child-dimension?view=asallproducts-allversions

You are designing the folder structure for an Azure Data Lake Storage Gen2 container for a multi-national company. The container data will be queried from Azure Databricks and Azure Synapse Analytics serverless SQL pools for different businesses subjects. Each branch of the company needs to access the data from its own region. The queries usually include a range of dates to retrieve data from the current year or current month. You need to simplify folder security by region and subject, as well as optimize the queries performance. Which folder structure should you recommend? -/{region}/{subject}/{YYYY}/{MM}/{DD}/{subject}_{YYYY}{MM}{DD}.csv - {YYYY}/{MM}/{DD}//{region}/{subject}/{subject}_{YYYY}{MM}{DD}.csv -/{region}/{subject}/{DD}/{MM}/{YYYY}/{subject}_{YYYY}{MM}{DD}.csv -{subject}/{YYYY}/{MM}/{DD}/{region}/{subject}_{YYYY}{MM}{DD}.csv

-/{region}/{subject}/{YYYY}/{MM}/{DD}/{subject}_{YYYY}{MM}{DD}.csv This folder structure allows you to apply for POSIX security permissions at the region's tope level of the container, making it easy to grant permissions for each branch data. Partitioning the data from the subject and date from the year, month, and day sequence allow the queries to perform better nu requesting only the data they need. The other folder options make it harder to apply folder level security for region data because you need to apply for the security permission multiple times for the data partitions. References - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices Optimize Azure Data Lake Storage Gen2 for performance

You have an Azure SQL Managed Instance. You plan to create a system - managed temporary table to manage record changes over time. You need to build the query to create the table. How should you complete the query? CREATE TABLE Department ( [Id] INT NOT NULL PRIMARY KEY CLUSTERED, [Name] VARCHAR(50) NOT NULL, [MangerId] INT NULL, [ParentId] INT NULL, [ValidFrom] DATETIME2 ______________ NOT NULL, [ValidTo] DATETIME2 _____________ NOT NULL, ____________________________________________) WITH(SYSTEM_VERSIONING = ON); 1. - DEFAULT CURRENT TIMESTAMP - GENERATED ALWAYS AS ROW END - GENERATED ALWAYS AS ROW START 2. - DEFAULT CURRENT TIMESTAMP - GENERATED ALWAYS AS ROW END - GENERATED ALWAYS AS ROW START 3. - [isCurrent] bit NOT NULL - PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)

1. GENERATED ALWAYS AS ROW START 2. GENERATED ALWAYS AS ROW END 3. PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) You should define the date range columns with ValidFrom as GENERATED ALWAYS AS ROW START and ValidTo as GENERATED ALWAYS AS ROW END, with the PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) constraint as SYSTEM_VERSIONING = ON to create a system-managed temporal table. A temporal table is designed to keep a full history of data changes and to allow easy point-in-time analysis. Every operation that changes the table data is stored in a historical table with the date range referred to that row. Using the ValidFrom and ValidTo columns respectively as start and end date results in the expected range for a temporal table. You should not define the date range fields with DEFAULT CURRENT_TIMESTAMP or add a [isCurrent] bit NOT NULL field. This table structure will not result in a system-managed temporal table. As similar structure might be used if you are creating a Type 2 Slow Changing Dimension (SCD) to represent this business entity in your data warehouse, with some different concepts like the historical values are stored in the same table but in different rows. References https://learn.microsoft.com/en-us/sql/relational-databases/tables/creating-a-system-versioned-temporal-table?view=sql-server-ver15 https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 https://learn.microsoft.com/en-us/training/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types

You work as an Azure Data Engineer. You need to optimize the output files for the following workloads using Azure data factory. Which file format should you use for each workload? Analytical workloads optimized for heavy reads. ________________ Transactional workloads optimized for heavy writes. _______________ - Parquet - Avro - Txt

1. Parquet 2. Avro You should use the Parquet file for analytical workloads optimized for heavy reads. The parquet format is optimized for analytical workloads. Parquet file stores data in a columnar format, which is optimized for heavy reads. You should use the Avro file format for transactional workloads optimized for heavy writes. Avro is optimized for transactional workloads and stores data in JSON format, which is optimized for heavy writes. You should not use the Txt file format. Txt file format is not supported by Azure Data factory copy activity task. References - https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs https://learn.microsoft.com/en-us/azure/databricks/external-data/parquet https://learn.microsoft.com/en-us/azure/databricks/external-data/avro https://blog.clairvoyantsoft.com/big-data-file-formats-3fb659903271?gi=c226e22f6207

You work as an Azure Data Engineer. You have a data warehouse solution as shown in the exhibit. The data warehouse solution shown in the exhibit is an example of ________________________ The [Date] column in the DimCalendar table is connected to the [OrderDate] column of the FactOrder table using a ___________________________ 1. - snowflake scema - star schema 2. - many to one relationship - one to many relationship - one to one relationship

1. Star Schema 2. one to many relationship The data warehouse shown in the exhibit is an example of star schema. Star schema organizes tables in the form of dimensions and facts. Facts are directly connected to dimensions using relationships. In the star schema, the lowest level of dimensional attribute key is connected to the fact table using a common key. The [Date] column of the DimCalendar table is connected to the [OrderDate] column of the FactOrder table using a one to many relationship. The DimCalendar table represents one side of the relationship, while the FactOrder table can have multiple entries for the same date. In one to many relationships, the column in one side of the relationship can have only one instance of attribute value while the many side table can have multiple attribute values. You should not select the snowflake schema. A snowflake schema is a more normalized form of dimensional modeling. In a snowflake schema, dimensions are further normalized, and only the lowest level dimensions are connected to the fact table, while other related dimensions are interconnected using a common key. You should select the one to one relationship. In this kind of relationship, the columns on both sides can have only one instance of attribute value, meaning both sides only contain one instance attribute value. You should not select a many to one relationship. In this kind of relationship, the column in the many side of the relationship can have more than one instance of attribute value, while the other related table can have only one instance of that attribute value. References - https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/2-design-star-schema https://learn.microsoft.com/en-us/training/modules/design-multidimensional-schema-to-optimize-analytical-workloads/ https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-create-and-manage-relationships

CASE: You are a data architect for your company. Your company manages data from customers all over the world. Each customer supplies your company with data in a variety of formats. Your company must transform the data after it receives it. The total size of all customer data is under one petabyte (PiB). You need to recommend a data storage solution for customer data. Solution: You recommend Azure Blob storage. Does this solution meet the goal? Y or N

Yes

CASE: You create a blob storage account. You must be able to administer the storage account via the REST API. You need to provide secure access to the storage account. Solution: You use the secondary access key. Does this solution meet the goal? Y or N

Yes

You are a data engineer for your company. You create an Azure SQL Data Warehouse instance to house data for 30 stores. You want to improve performance when accessing store data. The size of the data is less than 200 megabytes (MB). You need to write a TSQL statement to generate the store table. CREATE TABLE [dbo].[Store]( [StoreId] [varchar](20), [StoreName] [varchar](200), [StoreNumber] [int] , [StoreLocation] [varchar] (200) ) WITH ( CLUSTERED COLUMNSTORE INDEX , A = B ) A: - DISTRIBUTION - PARTITION B: - HASH - REPLICATE -ROUND ROBIN

DISTRIBUTION, REPLICATE You should use the following statement: CREATE TABLE [dbo].[Store]( [StoreId] [varchar](20), [StoreName] [varchar](200), [StoreNumber] [int] , [StoreLocation] [varchar] (200) ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = REPLICATE ) This creates a replicated table. A replicated table is copied across all the compute nodes in a data warehouse. This improves the performance of queries for data in small table. In this scenario, the store table is small. It is less than 200 MB. You should not use the PARTITION table option. This creates a partitioned table. In this scenario, you should create a replicated table. You should not specify HASH as the distribution type. This uses hash distribution. Data is sharded across compute nodes by a column that you specify. You should not specify ROUND_ROBIN as the distribution type. This creates a round robin distributed table. A round-robin distribution shards data evenly. Query performance is better when using hash distribution. Reference - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017 https://learn.microsoft.com/en-gb/training/paths/realize-integrated-analytical-solutions-with-azure-synapse-analytics/

You are a data engineer for your company. You create an Azure SQL Data Warehouse instance to house inventory data for 1000 stores, starting with 100 GB of data and growing 5 GB every day. Each store has a distinct store number. You want to distribute the store data and ensure that the data is evenly distributed across compute nodes. You need to write a TSQL statement to generate the store table. How should you complete the statement? CREATE TABLE [dbo].[Store]( [StoreId] [ varchar](20), [StoreName][varchar](200), [StoreNumber][int]], [StoreLocation][varchar](200) ) WITH ( CLUSTERED COLUMNSTORE INDEX, A=B ) A: - DISTRIBUTION - PARTITION B: -HASH (StoreNumber) -REPLICATE -ROUND_ROBIN

DISTRIBUTION, ROUND ROBIN You should use the following statement: CREATE TABLE [dbo].[Store]( [StoreId] [ varchar](20), [StoreName][varchar](200), [StoreNumber][int]], [StoreLocation][varchar](200) ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN ) This creates a round-robin-distributed table. Around robin distribution distributes data evenly across compute nodes. You should not use the PARTITION table option. This creates a partitioned table. In this scenario, you should create a replicated table. You should not specify REPLICATE as the distribution type. This uses replicated distribution which copies the same data across compute nodes. This is beneficial for small tables. You should not specify HASH (StoreNumber) as the distributed type. This creates a hash distributed table that uses StoreNumber as the distribution column. This allows the table to be distributed by the store. Because store number is unique per store and because there are 1000 stores, this would not guarantee that data is distributed evenly. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017 https://learn.microsoft.com/en-gb/training/paths/realize-integrated-analytical-solutions-with-azure-synapse-analytics/

You are a data engineer for your company. You create an Azure Synapse Analytics instance to house inventory data for 60 stores. There are approximately 10,000 products among the stores, with each distinct type of product having a different stock keeping unit (SKU). The inventory data will be captured ever Friday and these updates cause massive changes to inventory data. After the inventory, each store reads their inventory position for all SKUs. You need to write a T-SQL statement to generate the table. How should you complete the statement? CREATE TABLE [dbo].[StoreInventoryData]( [Sku] [varchar] (30) , [Price] [decimal] (38, 0) , [StoreId] [varchar] (20), [Quantity] [int] , [Date] [datetime] ) WITH ( CLUSTERED COLUMNSTORE INDEX , A = B ( C ) A : - DISTRIBUTION - PARTITION B : - HASH - REPLICATE - ROUND_ROBIN C: - Sku - StoreId

Distribution, Hash, StoreId You should use the following statement CREATE TABLE [dbo].[StoreInventoryData]( [Sku] [varchar] (30) , [Price] [decimal] (38, 0) , [StoreId] [varchar] (20), [Quantity] [int] , [Date] [datetime] ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH (StoreId) ) This creates a hash-distributed table that uses StoreId as the distribution column. This allows the table to be distributed by each store. Parallel queries can be run for different stores on different compute nodes. You should not use Sku as the distributed column. This would distribute data across every SKU, making parallel queries that read the inventory position for all SKUs in a given store slow. You should not use the PARTITION table option. this creates a partitioned table. In this scenario, you should create a distributed table. You should not specify REPLICATE as the distribution type. This uses replicated distribution which copies the same data across compute nodes. This is beneficial for small tables. You should not specify ROUND_ROBIN as the distribution type. This creates a round-robin distributed table. A round robin distribution shard data evenly. Performance is better when using hash distribution. Reference - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017 https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017

You are creating table for Azure Synapse Analytics. You need to choose the appropriate types of tables for various needs. Which table option should you use in the following cases? - Large fact table - Staging table with unknown data - Dimension table with size of 1 GB - Table with queries that scan a data range

Large fact table = Hash-distributed Staging table with unknown data = Round-robin Dimension table with size of 1 GB = Replicated Table with queries that scan a data range = Partitioned For a large fact table, you should create a hash-distributed table. Query performance improves when the table is joined with a replicated table or with a table that is distributed on the same column. This avoids data movement. For a staging table with unknown data, you should create a round-robin distributed table. The data will be evenly distributed across the nodes, and no distribution column needs to be chosen. For a small dimension table, you should use a replicated table. No data movement is involved when the table is joined with another table on any column. For a table that has queries scan a date range, you should create a partitioned table. Partition elimination can improve the performance of the scans when the scanned range is a small part of the table. You should not create a temporary table. A temporary table is only visible to the session that creates it and will be deleted when the session ends. References - https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-temporary

CASE: You are a data architect for a grocery delivery company. The development team is building an app that allows grocery stores to deliver groceries. Individual grocery stores sell different types of grocery products. You want to implement a solution that does not restrict the product attributes that are used by each grocery store. You want to access product information by using OData queries. You need to create the appropriate data store. Solution: You create an Azure SQL Data Warehouse. Does this solution meet the goal?

No

CASE: You are a data architect for a grocery delivery company. The development team is building an app that allows grocery stores to deliver groceries. Individual grocery stores sell different types of grocery products. You want to implement a solution that does not restrict the product attributes that are used by each grocery store. You want to access product information by using OData queries. You need to create the appropriate data store. Solution: You create an Azure SQL Database. Does this solution meet the goal?

No

CASE: You are a data architect for a polling company. Each pollster submits data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: You perform the following steps: - Create an Azure Databricks instance. - Create an Azure Automation runbook that extracts and queries data from Databricks. Does this solution meet the goal? Y or N

No

CASE: You are a data architect for a polling company. Each pollster submits data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: You perform the following steps: - Create an Azure Relay service. - Create an Azure Function app that extracts and queries data from Azure Relay. Does this solution meet the goal? Y or N

No

CASE: You are a data architect for your company . You plan to import data into Azure SQL Data Warehouse. You want to import a dimension table named Manufacturer that has about 200 rows. The size of the table is about 100 KB. 90 percent of the manufacturers are located in the same postal code. This table will be used in joins for most queries . You need to recommend a partition strategy for the table. Solution: You use a hash-distributed table with the column that represents the postal code. Does this solution meet the goal? Y or N

No

CASE: You are a data architect for your company . You plan to import data into Azure SQL Data Warehouse. You want to import a dimension table named Manufacturer that has about 200 rows. The size of the table is about 100 KB. 90 percent of the manufacturers are located in the same postal code. This table will be used in joins for most queries . You need to recommend a partition strategy for the table. Solution: You use round-robin distribution. Does this solution meet the goal? Y or N

No

CASE: You are a data architect for your company. Your company manages data from customers all over the world. Each customer supplies your company with data in a variety of formats. Your company must transform the data after it receives it. The total size of all customer data is under one petabyte (PiB). You need to recommend a data storage solution for customer data. Solution: You recommend Azure Table storage. Does this solution meet the goal? Y or N

No

CASE: You are a data architect. You are designing a data solution to manage statistics for a world-wide sports league. You plan to store the data in an Azure table. Every team in the league has a unique name, and each team is part of a division. There are a total of four divisions. The statistics keep track of the winds and losses for each team in the division. Sample data is show below. The production table will contain over 4 million rows. You need to recommend an appropriate partition key. Solution: You choose Team as the partition key. Does this solution meet the goal? Y or N

No

CASE: You are a data engineer for an autonomous vehicle manufacturer. Each vehicle contains a transmitter that submits sensor data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the sensor data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: - Create an Azure Databricks instance. - Create an Azure Automation runbook that extracts and queried data from databricks Does this solution meet the goal? Y or N

No

CASE: You are a data engineer for an autonomous vehicle manufacturer. Each vehicle contains a transmitter that submits sensor data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the sensor data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: - Create an Azure Relay service. - Create an Azure Function app that extracts and queries data from Azure Relay. Does this solution meet the goal?

No

CASE: You are a data engineer. You are developing a data ingestion solution that ingests data from large pipe-delimited text files in an Azure Data Lake Storage account to Azure Synapse Analytics. You need to load the data. Solution: - Create an Azure Cosmos DB account and a linked server. - Create an external table that points to the Azure Cosmos DB account. - Load the data by running the BULK IMPORT statement Does this solution meet the goal? Y or N

No

CASE: You are a data engineer. You are developing a data ingestion solution that ingests data from large pipe-delimited text files in an Azure Data Lake Storage account to Azure Synapse Analytics. You need to load the data. Solution: - Create an Azure Databricks account and a linked server. - Create an external table that points to the Azure Databricks account. - Load the data by running the dbutils.fs.cp command. Does this solution meet the goal? Y or N

No

CASE: You are designing a real-time processing solution using Azure Steam Analytics. You need to ingest data from IoT sensors installed in driverless vehicles. You also need to support a 200 MB reference data input to correlate related static values. Solution: You use Azure Event Hubs for stream data input and Azure Cosmos DB for reference input. Does this solution meet the goal? Y or N

No

You are implementing a data engineering solution for your company. You plan to use Azure Table Storage to store receipts. Each receipt contains a date, a category, and a unique number. Over 50 percent of the receipts have one of the 10 most popular categories. Approximately five percent of the receipts have the same date. You need to define the row key statements, select Yes if you should perform the task. Otherwise, select No. Specify the date as a partition key? Specify the category as a partition key? Specify the unique number as the row key?

Specify the date as a partition key? - No Specify the category as a partition key? - Yes Specify the unique number as the row key? - Yes You should not specify the date as a partition key. A partition key allows you to create logical partitions for data. Values that have the same partition key share the same logical partition. Partition keys improve performance by allowing data to be scaled out horizontally. You should specify the category as a partition key. All receipts that have the same category share the same logical partition. When you query receipts by category, you can be sure that they all are retrieved from the same partition, which improves query performance. Because 50 percent of receipts are for the 10 most popular categories, this improves performance because most queries will be performed against those receipts. You should specify the unique number as the row key. A row key uniquely identifies an entity. References - https://learn.microsoft.com/en-us/rest/api/storageservices/designing-a-scalable-partitioning-strategy-for-azure-table-storage

CASE: You are a data architect for a grocery delivery company. The development team is building an app that allows grocery stores to deliver groceries. Individual grocery stores sell different types of grocery products. You want to implement a solution that does not restrict the product attributes that are used by each grocery store. You want to access product information by using OData queries. You need to create the appropriate data store. Solution: You create an Azure Cosmos DB account with the Table API. Does this solution meet the goal?

Yes

CASE: You are a data architect for a grocery delivery company. The development team is building an app that allows grocery stores to deliver groceries. Individual grocery stores sell different types of grocery products. You want to implement a solution that does not restrict the product attributes that are used by each grocery store. You want to access product information by using OData queries. You need to create the appropriate data store. Solution: You create an Azure Table. Does this solution meet the goal?

Yes

CASE: You are a data architect for a polling company. Each pollster submits data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: You perform the following steps: - Create an Event Hub instance. - Create a Stream Analytics job that uses a query to extract data. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect for a polling company. Each pollster submits data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: You perform the following steps: - Create an IoT Hub instance. - Create a Stream Analytics job that uses a query to extract data. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect for your company . You plan to import data into Azure SQL Data Warehouse. You want to import a dimension table named Manufacturer that has about 200 rows. The size of the table is about 100 KB. 90 percent of the manufacturers are located in the same postal code. This table will be used in joins for most queries . You need to recommend a partition strategy for the table. Solution: You use a replicated table. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect for your company. Your company manages data from customers all over the world. Each customer supplies your company with data in a variety of formats. Your company must transform the data after it receives it. The total size of all customer data is under one petabyte (PiB). You need to recommend a data storage solution for customer data. Solution: You recommend Azure Data Lake. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect. You are designing a data solution to manage statistics for a world-wide sports league. You plan to store the data in an Azure table. Every team in the league has a unique name, and each team is part of a division. There are a total of four divisions. The statistics keep track of the winds and losses for each team in the division. Sample data is show below. The production table will contain over 4 million rows. You need to recommend an appropriate partition key. Solution: You choose Division as the partition key. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect. You are designing a data solution to manage statistics for a world-wide sports league. You plan to store the data in an Azure table. Every team in the league has a unique name, and each team is part of a division. There are a total of four divisions. The statistics keep track of the winds and losses for each team in the division. Sample data is show below. The production table will contain over 4 million rows. You need to recommend an appropriate partition key. Solution: You choose Losses as the partition key. Does this solution meet the goal? Y or N

Yes

CASE: You are a data architect. You are designing a data solution to manage statistics for a world-wide sports league. You plan to store the data in an Azure table. Every team in the league has a unique name, and each team is part of a division. There are a total of four divisions. The statistics keep track of the winds and losses for each team in the division. Sample data is show below. The production table will contain over 4 million rows. You need to recommend an appropriate partition key. Solution: You choose Wins as the partition key. Does this solution meet the goal? Y or N

Yes

CASE: You are a data engineer for an autonomous vehicle manufacturer. Each vehicle contains a transmitter that submits sensor data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the sensor data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: - Create an Event Hub instance. - Create a Stream Analytics job that uses a query to extract data. Does this solution meet the goal?

Yes

CASE: You are a data engineer for an autonomous vehicle manufacturer. Each vehicle contains a transmitter that submits sensor data over Advanced Message Queuing Protocol (AMQP). You want to retrieve the sensor data in real time so that you can extract relevant information, transform it, and then send it to Power BI. You need to implement the solution. Solution: - Create an IoT Hub instance. - Create a Stream Analytics job that uses a query to extract data. Does this solution meet the goal? Y or N

Yes

CASE: You are a data engineer. You are developing a data ingestion solution that ingests data from large pipe-delimited text files in an Azure Data Lake Storage account to Azure Synapse Analytics. You need to load the data. Solution: - Create an external file format and an external data source. - Create an external table that uses the external data source. - Load the data from the external table. Does this solution meet the goal? Y or N

Yes

CASE: You are designing a real-time processing solution using Azure Steam Analytics. You need to ingest data from IoT sensors installed in driverless vehicles. You also need to support a 200 MB reference data input to correlate related static values. Solution: You use Azure Event Hubs for stream data input and Azure SQL Database for reference input. Does this solution meet the goal? Y or N

Yes

CASE: You are designing a real-time processing solution using Azure Steam Analytics. You need to ingest data from IoT sensors installed in driverless vehicles. You also need to support a 200 MB reference data input to correlate related static values. Solution: You use Azure IoT Hub for stream data input and Azure Blob storage for reference input. Does this solution meet the goal? Y or N

Yes

CASE: You are designing an Azure SQL Database solution. You are designing a table named Sales that will contain sales records for the company. A column named SalesRep will represent the sales representative associated with the sale. The SalesRep column will be of type SYSNAME. Only the sales representative associated with a sale should be able to view sales data. You need to choose the security mechanism. Solution: You implement row-level security (RLS). Does this solution meet the goal? Y or N

Yes

CASE: You are migrating an application and its on-premises SQL Server Enterprise Edition database to the cloud. application and database changes should be kept to a minimum during migration. You want to choose the exact amount of compute resources dedicated to the workload. Management overhead should be kept to a minimum. You need to choose an appropriate deployment and purchase model to meet your needs. Solution: You choose a managed instance deployment and vCore pricing mode. Does this solution meet the goal? Y or N

Yes

CASE: You are using Azure Stream Analytics (ASA) to process real-time defect detection events from your factory's production lines. You need to select the right built-in windowing function in your ASA job's SELECT query, so that you can: - Group events per line if they occur within specific time intervals between each other, but not exceeding maximum duration time set for the window. - Filter out periods of time when no defects are reported. - Count each event only once. Solution: In the ASA job query, you group events by using the session window. Does this solution meet the goal? Y or N

Yes

CASE: You create a blob storage account. You must be able to administer the storage account via the REST API. You need to provide secure access to the storage account. Solution: You create an account shared access signature (SAS). Does this solution meet the goal? Y or N

Yes

CASE: You create a blob storage account. You must be able to administer the storage account via the REST API. You need to provide secure access to the storage account. Solution: You use the primary access key. Does this solution meet the goal? Y or N

Yes

CASE: You have a data pipeline in Azure Synapse Analytics that ingests data from multiple sources. You need to process personal information such as first name, last name, and social security number (SSN). Because the data comes from different sources, the data contains duplicate records and other issues such as the following: - In some of the records, the names are mixed cases (Customer), while in others, they appear uppercase (CUSTOMER). - The SSNs are sometimes hyphenated (123-45-6789), while others are missing hyphens (123436789). You need to remove duplicates from the input data and normalize the records. Solution: You transform the data using Spark pool with the Python Code below: from pyspark.sql.functions import * dedupedDF = (dataFrame.select(col("*"), upper(col("firstName")).alias("upperFirstName"), upper(col("lastName")).alias("upperLastNAme"), regexp_replace(col("ssn"), "-","").alias("ssnOnlyNumber")) .dropDuplicates(["upperFirstName", "upperLastName", "ssnOnlyNumber"])) Does this solution meet the goal? Y or N

Yes

CASE: You manage an Azure SQL Database containing a column with sensitive data. The column is used for joins by an application. You need to configure encryption for this database to protect the column. Solution: You configure Always Encrypted with a deterministic type. Does this solution meet the goal? Y or N

Yes

CASE: You use Azure Stream Analytics to stream real-time IoT sensor data for a brewery company. You need to use a window function with a fixed-size. Events should belong to a single window. Solution: You analyze the stream with a tumbling window function. Does the solution meet the goal?

Yes

CASE: You work as an Azure Data Engineer. You need to build a Product dimension that is able to store historical changes to product sales prices with a flag marking the current price as Active. Solution: You implement slowly changing dimension (SCD) Type 2. Does this solution meet the goal? Y or N

Yes

CASE: You work as an Azure Data Engineer. You need to build a Product dimension that is able to store historical changes to product sales prices with a flag marking the current price as Active. Solution: You implement slowly changing dimension (SCD) Type 6. Does this solution meet the goal? Y or N

Yes

CASE: You work as an Azure Data Engineer. You need to transform an incoming JSON file into a relational structure using Azure Data Factory. Solution: You use the flatten transformation in mapping data flow. Does this solution meet the goal? Y or N

Yes

CASE: You work with Azure Synapse Analytics solution. You need to transform incoming stream based on the following criteria: - Rows prior to the year 2000 need to be assigned to an output stream named Processed. - Rows after the year 2000 need to be assigned to another stream named Unprocessed for further processing. Solution: You use the conditional split transformation in mapping data flow. Does this solution meet the goal?

Yes

CASE: Your company is developing an Azure Data Solution that uses Azure Blob Storage. A select set of Azure Active Directory (Azure AD) accounts should be provided access to storage during the initial test period. Access should expire automatically after 30 days. You want to avoid using the storage account key to provide access. You must be able to manage access specifics through role-based access control (RBAC). You need to configure authentication and access to the storage blob. Solution: You request a user delegation key and configure the user delegation shared access signature (SAS). Does this solution meet the goal? Y or N

Yes

CASE: Your company supports a software as a service (SaaS) application with a large number of customers. You are designing a support database to use elastic pools and elastic queries to retrieve customer information. Customer information is stored in table that includes values for CustomerID and RegionalID. You need to partition data to optimize queries by customer sorted by geographic location. The solution should minimize support costs. You need to recommend a partition strategy. Solution: You configure sharding by RegionalID. Does this solution meet the goal? Y or N

Yes


Ensembles d'études connexes

Yoost questions for chapters 21 and 22

View Set

Organ Systems Overview exercise 2 REVIEW

View Set

Ch. 11: The Cardiovascular System

View Set

Chapter 5: Elasticity: A Measure of Responsiveness

View Set

International Marketing MC Questions

View Set

3.11.M - Quiz: A Christmas Carol, Staves IV-V

View Set

Earth Science Final Exam - Part 1

View Set