DP-203: Data Engineering on Microsoft Azure

Ace your homework & exams now with Quizwiz!

You want to use an output to write the results of a Stream Analytics query to a table named device-events in a dataset named realtime-data in a Power BI workspace named analytics workspace. What should you do? -Create only the workspace. The dataset and table will be created automatically. -Create the workspace and dataset. The table will be created automatically. -Create the workspace, dataset, and table before creating the output.

Create only the workspace. The dataset and table will be created automatically. -The dataset and table will be created dynamically by the output.

Which of the following descriptions best fits Delta Lake? -A Spark API for exporting data from a relational database into CSV files. -A relational storage layer for Spark that supports tables based on Parquet files. -A synchronization solution that replicates data between SQL Server and Spark clusters.

A relational storage layer for Spark that supports tables based on Parquet files. - Delta Lake provides a relational storage layer in which you can create tables based on Parquet files in a data lake.

You need to process a stream of sensor data, aggregating values over one minute windows and storing the results in a data lake. Which service should you use? -Azure SQL Database -Azure Cosmos DB -Azure Stream Analytics

Azure Stream Analytics -Azure Stream Analytics is a stream processing engine.

You need to run a notebook in the Azure Databricks workspace referenced by a linked service. What type of activity should you add to a pipeline? -Notebook -Python -Jar

Notebook -Use a Notebook activity to run an Azure Databricks notebook

Suppose a retailer's operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen? -Both orders will be processed and use the in-store credit. -One order will be processed and use the in-store credit. The other order will update the remaining inventory for the items in the basket, but it won't complete the order. -One order will be processed and use the in-store credit, and the other order won't be processed.

One order will be processed and use the in-store credit, and the other order won't be processed. -When the second order determines that the in-store credit has already been used, it will roll back the transaction.

In which version of SQL Server was SSIS Projects introduced? -SQL Server 2008. -SQL Server 2012. -SQL Server 2016.

SQL Server 2012. -SSIS Projects was introduced in SQL Server 2012 and is the unit of deployment for SSIS solutions.

You want to aggregate event data by contiguous, fixed-length, non-overlapping temporal intervals. What kind of window should you use? -Sliding -Session -Tumbling

Tumbling -Tumbling windows define contiguous, fixed-length, non-overlapping windows.

You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use? -%spark -%pyspark -%sql

%sql - The %sql magic instructs Spark to interpret the code in the cell as SQL.

What must you create in your Azure Synapse Analytics workspace as a target database for Azure Synapse Link for Azure SQL Database? -A serverless SQL pool -An Apache Spark pool -A dedicated SQL pool

A dedicated SQL pool -Use a dedicated SQL pool as a target database.

You want to connect to an Azure Databricks workspace from Azure Data Factory. What must you define in Azure Data Factory? -A global parameter -A linked service -A customer managed key

A linked service -An Azure Databricks linked service is required to connect to an Azure Databricks workspace.

Which statement should you use to create a database in a SQL warehouse? -CREATE VIEW -CREATE SCHEMA -CREATE GROUP

CREATE SCHEMA - The CREATE SCHEMA statement is used to create a database.

In which of the following table types should an insurance company store details of customer attributes by which claims will be aggregated? -Staging table -Dimension table -Fact table

Dimension table - Attributes of an entity by which numeric measures will be aggregated are stored in a dimension table.

Which feature of Azure Synapse Analytics enables you to transfer data from one store to another and apply transformations to the data at scheduled intervals? -Serverless SQL pool -Apache Spark pool -Pipelines

Pipelines -Pipelines provide a way to encapsulate one or more actions that can be applied to data as it is transferred from one data store to another.

What can cause a slower performance on join or shuffle jobs? -Data skew. -Enablement of autoscaling -Bucketing.

Data skew. - Due to asymmetry in your job data.

You need to write a query to return the total of the UnitsProduced numeric measure in the FactProduction table aggregated by the ProductName attribute in the FactProduct table. Both tables include a ProductKey surrogate key field. What should you do? -Use two SELECT queries with a UNION ALL clause to combine the rows in the FactProduction table with those in the FactProduct table. -Use a SELECT query against the FactProduction table with a WHERE clause to filter out rows with a ProductKey that doesn't exist in the FactProduct table. -Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.

Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName. - To aggregate measures in a fact table by attributes in a dimension table, include an aggregate function for the measure, join the tables on the surrogate key, and group the results by the appropriate attributes.

Which Dynamic Management View enables the view the active connections against a dedicated SQL pool? -sys.dm_pdw_exec_requests. -sys.dm_pdw_dms_workers. -DBCC PDW_SHOWEXECUTIONPLAN.

sys.dm_pdw_exec_requests. -sys.dm_pdw_exec_requests enables you to view the active connections against a dedicated SQL pool

You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints are greyed out. Why is the option not available? -Azure Synapse Studio doesn't support the creation of private endpoints. -A Conditional Access policy has to be defined first. -A managed virtual network hasn't been created.

A managed virtual network hasn't been created. -In order to create a private endpoint, you first must create a managed virtual network.

Which Azure Service is Azure Synapse Pipelines based on? -Azure Data Explorer. -Azure Stream Analytics. -Azure Data Factory.

Azure Data Factory. - Azure Synapse Pipelines is based in the Azure Data Factory service.

Which Azure Data Factory component orchestrates a transformation job or runs a data movement command? -Linked Services -Datasets -Activities

Activities -Activities contains the transformation logic or the analysis commands of the Azure Data Factory's work.

Which feature in alerts can be used to determine how an alert is fired? -Add rule. -Add severity. -Add criteria.

Add criteria. -The add criteria feature enables you to determine how an alert is fired

In a typical project, when would you create your storage account(s)? -At the beginning, during project setup. -After deployment, when the project is running. -At the end, during resource cleanup.

At the beginning, during project setup. -Storage accounts are stable for the lifetime of a project. It's common to create them at the start of a project.

You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity? -Azure-SSIS -Azure -Self-hosted

Azure -When moving data between Azure data platform technologies, the Azure Integration runtime is used when copying data between two Azure data platform.

Which data platform technology is a globally distributed, multimodel database that can perform queries in less than a second? -Azure SQL Database -Azure Cosmos DB -Azure SQL Data Warehouse

Azure Cosmos DB -Azure Cosmos DB is a globally distributed, multimodel database that can offer subsecond query performance.

Which technology is typically used as a staging area in a modern data warehousing architecture? -Azure Data Lake. -Azure Synapse SQL Pools. -Azure Synapse Spark Pools.

Azure Data Lake. - Azure Data Lake Store Gen 2 is the technology that will be used to stage data before loading it into the various components of Azure Synapse Analytics.

Which technology is typically used as a staging area in a modern data warehousing architecture? -Azure Data Lake. -Azure Synapse SQL Pools. -Azure Synapse Spark Pools.

Azure Data Lake. -Azure Data Lake Store Gen 2 is the technology that will be used to stage data before loading it into the various components of Azure Synapse Analytics.

Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account? -Azure Defender for Storage -Azure Storage Account Security Feature -Encryption in transit

Azure Defender for Storage -Microsoft Defender for Storage detects anomalies in account activity. It then notifies you of potentially harmful attempts to access your account.

Which Azure service is the best choice to manage and govern your data? -Azure Data Factory -Azure Purview -Azure Data Lake Storage

Azure Purview -Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multi-cloud and software-as-a-service (SaaS) data.

You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB for NoSQL container. Which Azure Synapse Link service should you use? -Azure Synapse Link for SQL -Azure Synapse Link for Dataverse -Azure Synapse Link for Azure Cosmos DB

Azure Synapse Link for Azure Cosmos DB - Azure Synapse Link for Azure Cosmos DB integrates with multiple Azure Cosmos DB APIs, including Azure Cosmos DB for NoSQL.

Which Azure Synapse Analytics component enables you to perform Hybrid Transactional and Analytical Processing? -Azure Synapse Pipeline. -Azure Synapse Studio. -Azure Synapse Link.

Azure Synapse Link. - Azure Synapse Link is the component that enables Hybrid Transactional and Analytical Processing.

Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture. -Business applications store data in an operational data store, which is also used to support analytical queries for reporting. -Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis. -Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.

Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis. - an HTAP solution replicates operational data to an analytical store, enabling you to perform analytics and reporting without impacting the performance of the operational system.

ow does splitting source files help maintain good performance when loading into Synapse Analytics? -optimized processing of smaller file sizes. -Compute node to storage segment alignment. -Reduced possibility of data corruptions.

Compute node to storage segment alignment. -SQL Pools have 60 storage segments. Compute can also scale to 60 nodes and so optimizing for alignment of these 2 resources can dramatically decrease load times.

Which of the following describes a good strategy for creating storage accounts and blob containers for your application? -Create both your Azure Storage accounts and containers before deploying your application. -Create Azure Storage accounts in your application as needed. Create the containers before deploying the application. -Create Azure Storage accounts before deploying your app. Create containers in your application as needed.

Create Azure Storage accounts before deploying your app. Create containers in your application as needed. -Creating an Azure Storage account is an administrative activity and can be done prior to deploying an application. Container creation is lightweight and is often driven by run-time data which makes it a good activity to do in your application.

Suppose you have two video files stored as blobs. One of the videos is business-critical and requires a replication policy that creates multiple copies across geographically diverse datacenters. The other video is non-critical, and a local replication policy is sufficient. Which of the following options would satisfy both data diversity and cost sensitivity consideration. -Create a single storage account that makes use of Local-redundant storage (LRS) and host both videos from here. -Create a single storage account that makes use of Geo-redundant storage (GRS) and host both videos from here. -Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.

Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content. -In general, increased diversity means an increased number of storage accounts. A storage account by itself has no financial cost. However, the settings you choose for the account do influence the cost of services in the account. Use multiple storage accounts to reduce costs.

You've loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage? -CSV -PARQUET -DELTA

DELTA -Storing a dataframe in DELTA format creates parquet files for the data and the transaction log metadata necessary for Delta Lake tables.

Which Azure Synapse Studio hub would you go to create Notebooks? -Data. -Develop. -Integrate.

Develop. - The Develop hub is where you manage SQL scripts, Synapse notebooks, data flows, and Power BI reports.

When using Spark Structured Streaming, a Delta Lake table can be which of the following? -Only a source -Only a sink -Either a source or a sink

Either a source or a sink - A Delta Lake table can be a source or a sink.

Which data processing framework will a data engineer use to ingest data onto cloud data platforms in Azure? -Online transaction processing (OLTP) -Extract, transform, and load (ETL) -Extract, load, and transform (ELT)

Extract, load, and transform (ELT) - ELT is a typical process for ingesting data from an on-premises database into the cloud.

Suppose you need to store profile and order information about your customers. You need to query the data to answer questions like "who are my top 100 customers?" and "how many customers live in a given geographic region?". True or false: blob storage is a good choice for this data? -True -False

False -Blobs are not appropriate for structured data that needs to be queried frequently. They have higher latency than memory and local disk and don't have the indexing features that make databases efficient at running queries.

Which version control software does Azure Data Factory integrate with? -Team Foundation Server. -Source Safe. -Git repositories.

Git repositories. -Azure Data Factory allows you to configure a Git repository with either Azure Repos or GitHub, and is a version control system that allows for easier change tracking and collaboration.

The name of a storage account must be: -Unique within the containing resource group. -Unique within your Azure subscription. -Globally unique.

Globally unique. -The storage account name is used as part of the URI for API access, so it must be globally unique.

What distribution option would be best for a sales fact table that will contain billions of records? -HASH -ROUND_ROBIN -REPLICATE

HASH - Hash distribution provides good read performance for a large table by distributing records across compute nodes based on the hash key.

Which component enables you to perform code free transformations in Azure Synapse Analytics? -Studio. -Copy activity. -Mapping Data Flow.

Mapping Data Flow. - You can natively perform data transformations with Azure Data Factory code free using the Mapping Data Flow task.

Which tool is used to create and deploy SQL Server Integration Packages on an Azure-SSIS integration runtime, or for on-premises SQL Server? -SQL Server Data Tools. -SQL Server Management Studio. -dtexec.

SQL Server Data Tools. -SQL Server Data Tools is typically used to create and deploy SQL Server Integration Services (SSIS) packages.

In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used? -Python. -SQL. -Scala.

Scala. - The connector uses Scala to integrate Apache Spark pools with dedicated SQL pools in Azure Synapse Analytics.

In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used? -Python. -SQL. -Scala.

Scala. - The connector uses Scala to integrate Apache Spark pools with dedicated SQL pools in Azure Synapse Analytics.

You use the RANK function in a query to rank customers in order of the number of purchases they have made. Five customers have made the same number of purchases and are all ranked equally as 1. What rank will the customer with the next highest number of purchases be assigned? -2 -6 -1

6 -There are five customers with a higher number of purchases, and RANK takes these into account.

You need to share data visualizations, including charts and tables of data, with users in your organization. What should you create? -A table -A query -A dashboard

A dashboard -A dashboard can be used to share data visualizations with other users.

Azure Defender for Storage -Microsoft Defender for Storage detects anomalies in account activity. It then notifies you of potentially harmful attempts to access your account.

Which data store is the least expensive choice when you want to store data but don't need to query it? -Azure Stream Analytics -Azure Databricks -Azure Storage

Azure Storage - Azure Storage offers a massively scalable object store for data objects and file system services for the cloud. If you create a Blob storage account, you can't directly query the data.

Which type of output should you use to ingest the results of an Azure Stream Analytics job into a dedicated SQL pool table in Azure Synapse Analytics? -Azure Synapse Analytics -Blob storage/ADLS Gen2 -Azure Event Hubs

Azure Synapse Analytics - An Azure Synapse Analytics output writes data to a table in an Azure Synapse Analytics dedicated SQL pool.

You can use either the REST API or the Azure client library to programmatically access a storage account. What is the primary advantage of using the client library? -Cost -Availability -Localization -Convenience

Convenience -Code that uses the client library is much shorter and simpler than code that uses the REST API. The client library handles assembling requests and parsing responses for you.

You want to create a visualization that updates dynamically based on a table in a streaming dataset in Power BI. What should you do? -Create a report from the dataset. -Create a dashboard with a tile based on the streaming dataset. -Export the streaming dataset to Excel and create a report from the Excel workbook.

Create a dashboard with a tile based on the streaming dataset. -A dashboard with a tile based on a streaming dataset updates dynamically as new data arrives.

You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Azure Cosmos DB. What must you do? -Create a linked service for the Azure Cosmos DB database where the analytical store enabled container is defined. -Disable automatic pausing for the Spark pool in Azure Synapse Analytics. -Install the Azure Cosmos DB SDK for Python package in the Spark pool.

Create a linked service for the Azure Cosmos DB database where the analytical store enabled container is defined. -A linked service that connects to the Azure Cosmos DB account containing the analytical store enabled container is required.

You need to use Spark to analyze data in a parquet file. What should you do? -Load the parquet file into a dataframe. -Import the data into a table in a serverless SQL pool. -Convert the data to CSV format.

Load the parquet file into a dataframe. - You can load data from files in many formats, including parquet, into a Spark dataframe.

Which type of transactional database system would work best for product data? -OLAP -OLTP

OLTP -OLTP systems support a large set of users, have quick response times, handle large volumes of data, are highly available, and are great for small or relatively simple transactions. An OLTP system would work best for transactional data like product data that's closely linked to inventory.

Which is the default distribution used for a table in Synapse Analytics? -HASH. -Round-Robin. -Replicated Table.

Round-Robin. -Round-Robin is the default distribution created for a table and delivers fast performance when used for loading data but may negatively impact larger queries.

From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics? -Azure Cosmos DB -SQL Server 2022 -Azure SQL Managed Instance

SQL Server 2022 -You can use Azure Synapse Link for SQL to replicate data from SQL Server 2022.

You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use? -CORS Support -Storage Account -Shared Access Signatures

Shared Access Signatures - A shared access signature is a string that contains a security token that can be attached to a URI. Use a shared access signature to delegate access to storage objects and specify constraints, such as the permissions and the time range of access.

Shared Access Signatures -A shared access signature is a string that contains a security token that can be attached to a URI. Use a shared access signature to delegate access to storage objects and specify constraints, such as the permissions and the time range of access.

You're writing PySpark code to load data from an Azure Cosmos DB analytical store into a dataframe. What format should you specify? -cosmos.json -cosmos.olap -cosmos.sql

cosmos.olap -cosmos.olap is the appropriate format to read data from a Cosmos DB analytical store

How many access keys are provided for accessing your Azure storage account? -1 -2 -3 -4

2 -Each storage account has two access keys. This lets you follow the best-practice guideline of periodically replacing the key used by your applications without incurring downtime.

By default, how long are the Azure Data Factory diagnostic logs retained for? -15 days -30 days -45 days

45 days -The Azure Data Factory diagnostic logs are retained for 45 days.

You've created an Azure Databricks workspace in which you plan to use code in notebooks to process data. What must you create in the workspace? -A SQL Warehouse -A Spark cluster -A Windows Server virtual machine

A Spark cluster - A Spark cluster is required to process data using code in notebooks.

Which definition best describes Apache Spark? -A highly scalable relational database management system. -A virtual server with a Python runtime. -A distributed platform for parallel data processing using multiple languages.

A distributed platform for parallel data processing using multiple languages. - Spark provides a highly scalable distributed platform on which you can run code written in many languages to process data.

You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create? -An Azure Storage account with an Azure Data Lake Storage Gen2 container -An Azure Key Vault containing the SQL Server admin password -An Azure Application Insights resource

An Azure Storage account with an Azure Data Lake Storage Gen2 container - An Azure Data Lake Storage Gen2 account is required to be used as a landing zone when using Azure Synapse Link for SQL Server 2022.

Which Azure data platform is commonly used to process data in an ELT framework? -Azure Data Factory -Azure Databricks -Azure Data Lake Storage

Azure Data Factory -Azure Data Factory is a cloud-integration service that orchestrates the movement of data between various data stores.

Which type of output should be used to ingest the results of an Azure Stream Analytics job into files in a data lake for analysis in Azure Synapse Analytics? -Azure Synapse Analytics -Blob storage/ADLS Gen2 -Azure Event Hubs

Blob storage/ADLS Gen2 -A Blob storage/ADLS Gen2 output writes data to files in a data lake.

Which T-SQL Statement loads data directly from Azure Storage? -LOAD DATA. -COPY. -INSERT FROM FILE.

COPY. -The T-SQL COPY Statement reads data from Azure Blob Storage or the Azure Data Lake and inserts it into a table within the SQL Pool.

Which Index Type offers the highest compression? -Columnstore. -Rowstore. -Heap.

Columnstore. - This is the default index type created for a table. It works on segments of rows that get compressed and optimized by column.

Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions? -Lookup. -Conditional Split. -Select.

Conditional Split. -A Conditional Split transformation routes data rows to different streams based on matching conditions. The conditional split transformation is similar to a CASE decision structure in a programming language.

Which tool is used to perform an assessment of migrating SSIS packages to Azure SQL Database services? -Data Migration Assistant. -Data Migration Assessment. -Data Migration Service.

Data Migration Assistant. -The Data Migration Assistant is used to perform an assessment of migrating SSIS packages to Azure SQL Database services.

You need to use Spark to process data in files, preparing it for analysis. Which persona view should you use in the Azure Databricks portal? -Data Science and Engineering -Machine Learning -SQL

Data Science and Engineering -The Data Science and Engineering persona is optimized to help with data engineering tasks such as data processing.

Which definition of stream processing is correct? -Data is processed continually as new data records arrive. -Data is collected in a temporary store, and all records are processed together as a batch. -Data that is incomplete or contains errors is redirected to separate storage for correction by a human operator.

Data is processed continually as new data records arrive. -Stream processing is used to continually process new data as it arrives.

You want to create a data warehouse in Azure Synapse Analytics in which the data is stored and queried in a relational data store. What kind of pool should you create? -Serverless SQL pool -Dedicated SQL pool -Apache Spark pool

Dedicated SQL pool - A dedicated SQL pool defines a relational database in which data can be stored and queried.

You have an Azure Cosmos DB for NoSQL account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics? -Configure global replication in Azure Cosmos DB. -Create a dedicated SQL pool in Azure Synapse Analytics. -Enable Azure Synapse Link in Azure Cosmos DB.

Enable Azure Synapse Link in Azure Cosmos DB. -The first step in setting up HTAP integration is to enable Azure Synapse Link in Azure Cosmos DB.

You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics? -Delete and recreate the container. -Enable Azure Synapse Link in the container to create an analytical store. -Add an item to the container.

Enable Azure Synapse Link in the container to create an analytical store. - Before a container can be used for analytical queries, you need to enable Synapse link for the container; which creates an analytical store.

In Azure Synapse Studio, where would you view the contents of the primary data lake store? -In the Integration section of the Monitor hub. In the workspace tab of the Data hub. In the linked tab of the Data hub.

In the linked tab of the Data hub. -The linked tab of the data hub is where you can view the contents of the primary data lake store.

Load the parquet file into a dataframe. -You can load data from files in many formats, including parquet, into a Spark dataframe.

You're writing a SQL code in a serverless SQL pool to query an analytical store in Azure Cosmos DB. What function should you use? -OPENDATASET -ROW -OPENROWSET

OPENROWSET -OPENROWSET is used to query external data, including analytical stores in Cosmos DB.

Mike is creating an Azure Data Lake Storage Gen2 account. He must configure this account to be able to process analytical data workloads for best performance. Which option should he configure when creating the storage account? -On the Basic tab, set the Performance option to Standard. -On the Basic Tab, set the Performance option to ON. -On the Advanced tab, set the Hierarchical Namespace to Enabled.

On the Advanced tab, set the Hierarchical Namespace to Enabled. - If you want to enable the best performance for analytical workloads in Data Lake Storage Gen2, then on the Advanced tab of the Storage Account creation set the Hierarchical Namespace to Enabled.

Which type of Azure Stream Analytics output should you use to support real-time visualizations in Microsoft Power BI? -Azure Synapse Analytics -Azure Event Hubs -Power BI

Power BI - A Power BI output creates a dataset with a table of streaming data in a Power BI workspace.

You plan to create an Azure Databricks workspace and use the SQL persona view in the Azure Databricks portal. Which of the following pricing tiers can you select? -Enterprise -Standard -Premium

Premium -Premium tier is required for the SQL persona.

Which feature commits the changes of Azure Data Factory work in a custom branch created with the main branch in a Git repository? -Repo. -Pull request. -Commit.

Pull request. -After a developer is satisfied with their changes, they create a pull request from their feature branch to the master or collaboration branch to get their changes reviewed by peers.

Which of the following workloads is best suited for Azure Databricks SQL? -Running Scala code in notebooks to transform data. -Querying and visualizing data in relational tables. -Training and deploying machine learning models.

Querying and visualizing data in relational tables. -Azure Databricks SQL is optimized for SQL-based querying and data visualization.

You want to ingest data from a SQL Server database hosted on an on-premises Windows Server. What integration runtime is required for Azure Data Factory to ingest data from the on-premises server? -Azure-SSIS Integration Runtime -Self-Hosted Integration Runtime -Azure Integration Runtime

Self-Hosted Integration Runtime -A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network.

What type of data is a JSON file? -Structured -Semi-structured -Unstructured

Semi-structured -A JSON file contains semi-structured data. The data contains tags that make the organization and hierarchy of the data apparent.

Which transformation is used to load data into a data store or compute resource? -Window. -Source. -Sink.

Sink. -A Sink transformation allows you to choose a dataset definition for the destination output data. You can have as many sink transformations as your data flow requires.

In which phase of big data processing is Azure Data Lake Storage located? -Ingestion -Store -Model & Serve

Store -Store is the phase in which Azure Data Lake Storage resides for processing big data solution.

Which of the following can be used to initialize the Blob Storage client library within an application? -An Azure username and password. -The Azure Storage account connection string. -A globally-unique identifier (GUID) that represents the application. -The Azure Storage account datacenter and location identifiers.

The Azure Storage account connection string. -A storage account connection string contains all the information needed to connect to Blob storage, most importantly the account name and the account key.

What is an example of a branching activity used in control flows? -The If-condition -Until-condition -Lookup-condition

The If-condition -An example of a branching activity is The If-condition activity which is similar to an if-statement provided in programming languages.

You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen? -The table metadata and data files will be deleted. -The table metadata will be removed from the catalog, but the data files will remain intact. -The table metadata will remain in the catalog, but the data files will be deleted.

The table metadata and data files will be deleted. -The life-cycle of the metadata and data for a managed table are the same.

Which SCD type would you use to keep history of changes in dimension members by adding a new row to the table for each change? -Type 1 SCD. -Type 2 SCD. -Type 3 SCD.

Type 2 SCD. - When a value changes, Type 2 SCD will add a new row for the entity with a start date, end date, and unique key which will join back to any transactions in the fact table within the effective data range.

What type of data is a video? -Structured -Semi-structured -Unstructured

Unstructured -A video might have an overall structure, but the data that forms the video itself is unstructured. Unstructured data often is delivered in file format.

The schema of what data type can be defined at query time? -Structured data -Azure Cosmos DB -Unstructured data

Unstructured data -The schema of unstructured data is typically defined at query time. This means that data can be loaded onto a data platform in its native format.

When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool? -Use the integrated notebook experience from Azure Synapse Studio. -Use the PySpark connector. -Use token-based authentication.

Use the integrated notebook experience from Azure Synapse Studio. - Import statements are not needed since they are pre-loaded in case you use the Azure Synapse Studio integrated notebook experience.

Which of the following is a good analogy for the access keys of a storage account? -IP Address -REST Endpoint -Username and password -Cryptographic algorithm

Username and password -Possession of an access key identifies the account and grants you access. This is similar to login credentials like a username and password.

To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters? -VARCHAR(MAX) -VARCHAR(128) -NVARCHAR(128)

VARCHAR(128) -Limiting the size of the data type and not using size variability will provide the best performance.

Which Workload Management capability manages minimum and maximum resource allocations during peak periods? -Workload Isolation. -Workload Importance. -Workload Containment.

Workload Isolation. -Workload Isolation assigns maximum and minimum usage values for varying resources under load. These adjustments can be done live without having to take the SQL Pool offline.

Which workload management feature influences the order in which a request gets access to resources? -Workload classification. -Workload importance. -Workload isolation.

Workload importance. -Workload importance influences the order in which a request gets access to resources. On a busy system, a request with higher importance has first access to resources.

You need to use a parameter in a notebook. Which library should you use to define parameters with default values and get parameter values that are passed to the notebook? -notebook -argparse -dbutils.widget

dbutils.widget - Use the dbutils.widget library to define and read parameters in an Azure Databricks notebook.

In Azure Data Factory authoring tool, where would you find the Copy data activity? -Move & Transform -Batch Service -Databricks

Move & Transform -The Move & Transform section contains activities that are specific to Azure Data Factory copying data and defining data flows.

What is one of the possible ways to optimize an Apache Spark Job? -Remove all nodes. -Remove the Apache Spark Pool. -Use bucketing.

Use bucketing. - Bucketed tables are optimized because it is a metadata operation about how the data is bucketed and sorted.

What Transact-SQL function verifies if a piece of text is valid JSON? -JSON_QUERY -JSON_VALUE -ISJSON

ISJSON - ISJSON is a Transact-SQL function that verifies if a piece of text is valid JSON.

Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale? -SCALE. -MODIFY -CHANGE.

MODIFY -MODIFY is used to scale a dedicated SQL pool.

What component of Azure Synapse analytics allows the different engines to share the databases and tables between Spark pools and SQL on-demand engine? -Azure Synapse Link. -Azure Synapse shared metadata. -Azure Synapse Spark pools.

Azure Synapse shared metadata. - The shared metadata gives the workspace SQL engines access to databases and tables created with Spark.

A distributed platform for parallel data processing using multiple languages. - Spark provides a highly-scalable distributed platform on which you can run code written in many languages to process data.

A data analyst wants to analyze data by using Python code combined with text descriptions of the insights gained from the analysis. What should they use to perform the analysis? -A notebook connected to an Apache Spark pool. -A SQL script connected to a serverless SQL pool. -A KQL script connected to a Data Explorer pool.

A notebook connected to an Apache Spark pool. - A notebook enables you to interactively run Python code in an Apache Spark pool and embed notes using Markdown.

You create a dimension table for product data, assigning a unique numeric key for each row in a column named ProductKey. The ProductKey is only defined in the data warehouse. What kind of key is ProductKey? -A surrogate key -An alternate key -A business key

A surrogate key -A surrogate key uniquely identifies each row in a dimension table, irrespective of keys used in source systems.

You need to compare approximate production volumes by product while optimizing query response time. Which function should you use? -COUNT -NTILE -APPROX_COUNT_DISTINCT

APPROX_COUNT_DISTINCT -APPROX_COUNT_DISTINCT returns an approximate count within 2% of the actual count while optimizing for minimal response time.

What Transact-SQL function is used to perform a HyperLogLog function? -APPROX_COUNT_DISTINCT -COUNT_DISTINCT_APPROX -COUNT

APPROX_COUNT_DISTINCT -The APPROX_COUNT_DISTINCT function is used to perform a HyperLogLog function.

How do column statistics improve query performance? -By keeping track of which columns are being queried. -By keeping track of how much data exists between ranges in columns. -By caching column values for queries.

By keeping track of how much data exists between ranges in columns. -It tracks cardinality and range density to determine which data access paths return the fewest rows for speed.

. You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored? -In an Azure Synapse dedicated SQL pool -In an Azure Data Lake Gen2 storage container. -In an Azure Cosmos DB container.

In an Azure Data Lake Gen2 storage container. - Azure Synapse Link for Dataverse replicates data to an Azure Data Lake Gen2 storage account.

Which hub is where you can grant access to Synapse workspace and resources? -Monitor hub. -Manage hub. -Integrate hub.

Manage hub. -You can grant access to Synapse workspace in the Manage hub.

You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use? -Storage account keys. -Shared access signatures. -Managed identities.

Managed identities. - Managed identities provide Azure services with an automatically managed identity in Azure Active Directory. You can use the Managed Identity capability to authenticate to any service that support Azure Active Directory authentication.

Which language can be used to define Spark job definitions? -Transact-SQL -PowerShell -PySpark

PySpark - Pyspark can be used to define spark job definitions.

Which of the following statements is a benefit of materialized views? -Reducing the execution time for complex queries with JOINs and aggregate functions. -Increased resiliency benefits. -Increased high availability.

Reducing the execution time for complex queries with JOINs and aggregate functions. -Materialized views help to improve complex query performance. The more complex the query, the higher the potential for execution-time saving.

Blobs provide unstructured data storage. What does unstructured mean? -Blobs can't be organized or named. -There are no restrictions on the type of data you can store in blobs. -Blobs can't contain structured data, like JSON or XML.

There are no restrictions on the type of data you can store in blobs. -Blobs do not impose any structure on your data, meaning your application can store any type of data in a blob.

What feature of Delta Lake enables you to retrieve data from previous versions of a table? -Spark Structured Streaming -Time Travel -Catalog Tables

Time Travel -The Time Travel feature is based on the transaction log, which enables you to specify a version number or timestamp for the data you want to retrieve.

When configuring network access to your Azure Storage Account, what is the default network rule? -To allow all connections from all networks -To allow all connection from a private IP address range -To deny all connections from all networks

To allow all connections from all networks -The default network rule is to allow all connections from all networks.

Which SCD type would you use to update the dimension members without keeping track of history? -Type 1 SCD. -Type 2 SCD. -Type 3 SCD.

Type 1 SCD. -When a value changes, Type 1 SCD will update the existing record without keeping history.

You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use? -%%spark -%%pyspark -%%sql

%%sql -The %%sql magic instructs Spark to interpret the code in the cell as SQL.

What is a supported connector for built-in parameterization? -Azure Data Lake Storage Gen2 -Azure Synapse Analytics -Azure Key Vault

Azure Synapse Analytics - Azure Synapse Analytics is a supported connector for built-in parameterization for Linked Services in Azure Data Factory.

What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob? -A new block blob is created in storage. -A BlobClient object is created locally. No network calls are made. -An exception is thrown if the blob does not exist in storage. -The contents of the named blob are downloaded.

A BlobClient object is created locally. -No network calls are made. Getting a blob reference does not make any calls to Azure Storage, it simply creates an object locally that can work with a stored blob.

Which role works with Azure Cognitive Services, Cognitive Search, and the Bot Framework? -A data engineer -A data scientist -An AI engineer

An AI engineer - Artificial intelligence (AI) engineers work with AI services such as Cognitive Services, Cognitive Search, and the Bot Framework.

Duplicating customer content for redundancy and meeting service-level agreements (SLAs) in Azure meets which cloud technical requirement? -Maintainability -High availability -Multilingual support

High availability -High availability duplicates customer content for redundancy and meets SLAs in Azure.

What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions? -Grouping the data based on partitions and counting rows with a T-SQL query. -Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions. -Monitor query speeds by testing the same query for each partition.

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions. -DBCC PDW_SHOWSPACEUSED returns the number of table rows that are stored in each of the 60 distributions.

DP-203: Data Engineering on Microsoft Azure

Related study sets

CHROMOSOMES IN MEIOSIS

Test 4

African American History: Chapter 9 Test

End of Chapter Quizes Biol 105

PrepU ch. 64

Mastering Microbiology Ch. 9 & 10

Final Exam Study Guide

Chapter 6: Cartilage and Bones (Unit 2)

Hepatic/Pancreatic Review

SAFe SGP

Covenants

NCLEX Questions-Week 9

Chapter 9- Motivation, Performance, and Effectiveness

micro unit 7

Health 1100 UVU 2018 TEST Chapter 1

Xcel Testing Solutions- Life and Health Insurance- Weighted Exam- 150 Questions

Chapter9 Real Estate Contracts - Mini Quiz

NC Casualty

Chapter 2 Job Order Costing Calculating Unit Product Costs

Exam 4: Hepatitis