DP-900
A _________ is a virtual table based on the results of a SELECT query.
view
Transcribing videos is an example of _________________________ - Prescriptive analysis - Cognitive analysis - Diagnostic analysis - Descriptive analysis - Predictive analysis
Cognitive analysis
Which language would you use to query real-time log data in Azure Synapse Data Explorer? - SQL - Python - KQL
KQL
Azure Data Factory pipeline can pass parameters to a notebook ? Y/N
Yes
What three main types of workload can be found in a typical modern data warehouse?
- Streaming Data - Batch Data - Relational Data
Which type of non-relational data store supports a flexible schema, stores data as JSON files, and stores the all the data for an entity in the same document? Select the correct option. - time series - graph - document - columnar
- document Keywords: non-relational database, flexible schema, JSON => document db Correct answer is option document A document database is a type of nonrelational database that is designed to store and query data as JSON-like documents.
You have an Azure Cosmos DB account that uses the Core (SQL) API. Which settings can you configure at the container level? Select two correct options. - the read region - the throughput - the partition key - the API
- the throughput - the partition key
A. JOIN B. WHERE C. SUM D. COUNT 1. Filter records. 2. Combine rows from multiple tables. 3. Calculate the total value of a numeric column. 4. Determine the number of rows retrieved.
1 - B 2 - A 3 - C 4 - D
Which one of the following statements is a characteristic of a relational database? All columns in a table must be of the same data type A row in a table represents a single instance of an entity Rows in the same table can contain different columns
A row in a table represents a single instance of an entity
What type of analytics answers the question "what happened", such as a sales report for yesterday? Select the correct option. - Prescriptive - Descriptive - Predictive - Diagnostic - Cognitive
Descriptive
Which of the following component is used to get messages from Twitter clients to Azure? Select the correct option. - Event Hub - Azure Data Factory - Azure Blob - IoT Hub
Event Hub
_____________ is a term used by database professionals for a schema design process that minimizes data duplication and enforces data integrity.
Normalization
Blob storage provides what three access tiers?
The Hot tier The Cool tier The Archive tier
In ________ processing, each new piece of data is processed when it arrives.
stream
_________________ is the online Microsoft OLAP platform, which you can use to perform data analysis and manage huge volumes of information from different pointes of view.
Azure Synapse Analytics
_________________ API of Cosmos DB is used to work with data that has many different entities that share a relationship. - Table - MongoDB - Gremlin - CassandraDB
Gremlin The Gremlin API implements a graph database interface to Cosmos DB. A graph is a collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you to perform graph queries over data. Using the Gremlin API you can walk through the objects and relationships in the graph to discover all manner of complex relationships.
You need to identify correct DML(Data Manipulation Language) commands from the below , select three Create Select Insert Update Delete Drop
Insert Update Delete
You need to use Transact-SQL to query files in Azure Data Lake Storage from an Azure Synapse Analytics data warehouse. What should you use to query the files? - Azure Data Factory - Azure Functions - PolyBase - Microsoft SQL Server Integration Services (SSIS)
PolyBase PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources.
Which of the following Azure database services has almost 100% compatibility with SQL Server running in your own environment? Select the correct option. - Azure SQL Database - SQL Managed Instance - Azure SQL Database Elastic Pool - Table Storage
SQL Managed Instance SQL Server Managed Instance is a fully-managed database product that Azure offers that has very close compatibility to SQL Server running in your own environment. There are a few things that are not supported, but those are relatively rare.
Which one of the following tasks is the responsibility of a database administrator? Backing up and restoring databases Creating dashboards and reports Creating pipelines to process data in a data lake
Backing up and restoring databases
What type of analysis answers the question "Why did it happen?"
Diagnostic Analysis
What should you define in your data model to enable drill-up/down analysis? - A measure - A hierarchy - A relationship
A hierarchy
A data ingestion service that you can use to manage queues of event data, ensuring that each event is processed in order, exactly once.
Azure Event Hubs
What are the elements of an Azure Table storage key? - Table name and column name - Partition key and row key - Row number
Partition key and row key
Which type of analytics helps answer questions about what has happened in the past? Descriptive analytics Prescriptive analytics Predictive analytics Cognitive analytics
Descriptive analytics
Apart from the way in which batch processing and streaming processing handle data, there are other differences: Data size: _____________ processing is intended for individual records or micro batches consisting of few records. ________ processing is suitable for handling large datasets efficiently.
Stream Batch
Which of the components of Azure Synapse Analytics allows you to train AI models using AzureML? Select the correct option. - Synapse Studio - Synapse Spark - Synapse Pipelines
Synapse Spark
The numeric measures that will be aggregated by the various dimensions in the model are stored in _________ tables. Each row in a ________ table represents a recorded event that has numeric measures associated with it.
fact
You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL. Which switch should you use to switch between languages? A. % B. \\[] C. \\() D. @ E. Change the language at the time of creation of notebook
A. %
A ____________________ is a continuous flow of information, where continuous does not necessarily mean regular or constant.
data stream
What is a security principal? Select the correct option. - An object that represents a user, group, service, or managed identity that is requesting access to Azure resources - A named collection of permissions that can be granted to a service, such as the ability to use the service to read, write, and delete data. In Azure, examples include Owner and Contributor - A set of resources managed by a service to which you can grant access
- An object that represents a user, group, service, or managed identity that is requesting access to Azure resources Azure authentication uses security principles to help determine whether a request to access service should be granted.
A. Blob containers B. File shares C. Tables 1 - network file shares such as you typically find in corporate networks. 2 - key-value storage for applications that need to read and write data values quickly. 3 - scalable, cost-effective storage for binary files.
1 - B 2 - C 3 - A
A. Azure Data Studio B. Azure Query editor C. SQL Server Data Tools 1. Query data while working within a Visual Studio project. 2. Query data located in a non-Microsoft platform. 3. Query data from within the Azure portal
1 - C 2 - A 3 - B
A. Azure SQL Database B. Azure SQL Managed Instance C. Azure SQL VM 1 - a virtual machine with an installation of SQL Server, allowing maximum configurability with full management responsibility. 2 - a hosted instance of SQL Server with automated maintenance, which allows more flexible configuration than Azure SQL DB but with more administrative responsibility for the owner. 3 - a fully managed platform-as-a-service (PaaS) database hosted in Azure
1 - C 2 - B 3 - A
When you must store binary information, images, sounds, video, or large data in different formats and schemas, you should use __________________.
Azure Blob Storage
Azure Blob Storage supports what three types of blobs?
Block blobs Page blobs Append blobs
What database style is ideal for data that heavily relies on relationships, such as the relationships between People, Places, and Things? Select the correct option. Graph Data Document DB Blob Storage Mongo DB
Graph Data Graph Data stores nodes and their relationships to each other (called edges). A graph is a collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you to perform graph queries over data.
A ________________________ enables businesses to maximize the value of their data assets. They're responsible for exploring data to identify trends and relationships, designing and building analytical models, and enabling advanced analytics capabilities through reports and visualizations.
data analyst
To query Data Explorer tables, you can use __________________, a language that is specifically optimized for fast read performance - particularly with telemetry data that includes a timestamp attribute.
Kusto Query Language (KQL)
This data usually does not come from relational stores, since even if it could have some sort of internal organization, it is not mandatory. Good examples are XML and JSON files.
Semi-structured Data
You have a requirement to process data whenever data is changed (real time) which processing strategy will you recommend ? Batch processing Stream Processing Analytical Processing
Stream Processing
Azure SQL managed instance supports user created backup Y/N
Yes
Azure analysis services is used for transactional workloads Y/N
Yes
A _________________ is a relational database in which the data is stored in a schema that is optimized for data analytics rather than transactional workloads. Commonly, the data from a transactional store is denormalized into a schema in which numeric values are stored in central _____________ tables, which are related to one or more ____________ tables that represent entities by which the data can be aggregated.
data warehouse fact dimension
Which of the following is an example of unstructured data? An Employee table with columns EmployeeID, EmployeeName, and EmployeeDesignation Audio and Video files A table within a relational database
Audio and Video files
What is a good example of a paginated report? Select the correct option. - A drill-down report that allows you to explore the data from many angles - An invoice - A dashboard showing all the key metrics of the business, with live updates - A line chart showing CPU utilization of the virtual machine, at 6 second intervals
- An invoice A paginated report is one that is designed to be printed. It is static. An invoice is a good example of that.
A. Azure Database for MySQL B. Azure Database for MariaDB C. Azure Database for PostgreSQL 1 - a simple-to-use open-source database management system that is commonly used in Linux, Apache, and PHP (LAMP) stack apps. 2 - a hybrid relational-object database. You can store data in relational tables, but a _________ database also enables you to store custom data types, with their own non-relational properties. 3 - a newer database management system, created by the original developers of MySQL. The database engine has since been rewritten and optimized to improve performance. Offers compatibility with Oracle Database (another popular commercial database management system).
1 - A 2 - C 3 - B
A high-performance database and analytics service that is optimized for ingesting and querying batch or streaming data with a time-series element.
Azure Data Explorer
On Azure, large-scale data ingestion is best implemented by creating pipelines that orchestrate ETL processes. You can create and run pipelines using ____________________, or you can use the same pipeline engine in _______________________ if you want to manage all of the components of your data warehousing solution in a unified workspace.
Azure Data Factory Azure Synapse Analytics
You use Azure Table Storage as a non-relational data store. You need to optimize data retrieval. You should use ______________________________ as query criteria. A. only partition keys B. only row keys C. partition keys and row keys D. only properties
C. partition keys and row keys
Match the Cosmos DB API with the correct data storage format. 1. Core (SQL) API 2. Mongo DB API 3. Table API 4. Cassandra API 5. Gremlin API A. Key-Value B. Column-family C. Graph D. BSON E. JSON
Core (SQL) API - JSON Mongo DB API - BSON Table API - Key-Value Cassandra API - Column-family Gremlin API - Graph
Blob storage allows you to maintain your data inside ___________.
containers
You are designing an Azure Cosmos DB database that will support vertices and edges. Which Cosmos DB API should you include in the design? SQL Cassandra Gremlin Table
Gremlin The Azure Cosmos DB Gremlin API can be used to store massive graphs with billions of vertices and edges.
A ______________ defines SQL statements that can be run on command.
stored procedure
In a relational database, you model collections of entities from the real world as ______________.
tables
Which deployment option offers the best compatibility when migrating an existing SQL Server on-premises solution? Azure SQL Database (single database) Azure SQL Database (elastic pool) Azure SQL Managed Instance
Azure SQL Managed Instance
____________________ is a real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.
Azure Stream Analytics
Azure _________________ is a standalone service that offers the same high-performance querying of log and telemetry data as the Azure Synapse ________________ runtime in Azure Synapse Analytics.
Data Explorer
What is a data warehouse? A non-relational database optimized for read and write operations A relational database optimized for read operations A storage location for unstructured data files
A relational database optimized for read operations
What is an index? A structure that enables queries to locate rows in a table quickly A virtual table based on the results of a query A pre-defined SQL statement that modifies data
A structure that enables queries to locate rows in a table quickly
A SQL engine that is optimized for Internet-of-things (IoT) scenarios that need to work with streaming time-series data.
Azure SQL Edge
You use ________ statements to manipulate the rows in tables. These statements enable you to retrieve (query) data, insert new rows, or modify existing rows. You can also delete rows if you don't need them anymore.
Data Manipulation Language (DML)
________ seems similar to ETL at first glance but is better suited to big data scenarios since it leverages the scalability and flexibility of MPP engines like Azure Synapse Analytics, Azure Databricks, or Azure HDInsight.
ELT
______ is a traditional approach and has established best practices. It is more commonly found in on-premises environments since it was around before cloud platforms. It is a process that involves a lot o data movement, which is something you want to avoid on the cloud if possible due to its resource-intensive nature.
ETL
Imagine that you're part of a team that is analyzing house price data. The dataset that you receive contains house price information for several regions. Your team needs to report on how the house prices in each region have varied over the last few months. To achieve this, you need to ingest the data into Azure Synapse Analytics. You've decided to use Azure Datawarehouse to perform this task ? Y/N
No To achieve this, you need to ingest the data into Azure Synapse Analytics. You've need to use Azure Data Factory to perform this task.
How is data in a relational table organized? Rows and Columns Header and Footer Pages and Paragraphs
Rows and Columns
Which SQL statement is used to query tables and return data? QUERY READ SELECT
SELECT
You need to select the Online transaction processing (OLTP) properties for your solution. Given below are some of the properties for Typical traits of transactional data and you need to select the correct ones for OLTP ? Schema on write Highly normalized Light write Heavy write Denormalized Schema on read
Schema on write Highly normalized Heavy write
This data is usually well organized and easy to understand. Data stored in relational databases is an example, where table rows and columns represent entities and their attributes.
Structured Data
Data with no explicit data model falls in this category. Good examples include binary file formats (such as PDF, Word, MP3, and MP4), emails, and tweets.
Unstructured Data
Azure SQL Database is an example of ________________ -as-a-service. A. platform B. infrastructure C. software D. application
A. platform
Objects in which things about data should be captured and stored are called: ____________. A. tables B. entities C. rows D. columns
B. entities
You have a requirement to process data once in a month which processing strategy will you recommend ? Batch Processing Stream Processing Analytical Processing
Batch Processing
Azure SQL managed instance supports cross db Queries Y/N
Yes
You need to recommend a security solution for containers in Azure Blob storage. The solution must ensure that only read permissions are granted to a specific user for a specific container. What should you include in the recommendation? shared access signatures (SAS) an RBAC role in Azure Active Directory (Azure AD) public read access for blobs only access keys
shared access signatures (SAS) You can delegate access to read, write, and delete operations on blob containers, tables, queues, and file shares that are not permitted with a service SAS.
A ___________ displays attribute members on rows and measures on columns. A simple ____________ is generally easy for users to understand, but it can quickly become difficult to read as the number of rows and columns increases.
table
Modern data warehousing architecture can vary, as can the specific technologies used to implement it; but in general, which elements are included in most implementations?
- Data ingestion and processing - Analytical data store - Analytical data model - Data visualization
Which among the following statements is true with respect to the ETL process? Select the correct option. - ETL process reduces the resource contention on the source systems - ETL process has very high load times - ETL process requires target systems to transform the data being loaded - ETL process has low load times
- ETL process has very high load times Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store. Because of huge data, the process usually takes time to execute.
Which of the below statements are true? - ETL process has very high load times - ELT process require target data store powerful enough to transform data - ELT process require target systems to transform the data being loaded
- ETL process has very high load times - ELT process require target data store powerful enough to transform data - ELT process require target systems to transform the data being loaded
You need to design and model a database by using a graphical tool that supports project-oriented offline database development. What should you use? - Microsoft SQL Server Data Tools (SSDT) - Microsoft SQL Server Management Studio (SSMS) - Azure Data Studio - Azure Databricks
- Microsoft SQL Server Data Tools (SSDT) SQL Server Data Tools (SSDT) is a modern development tool for building SQL Server relational databases, databases in Azure SQL, Analysis Services (AS) data models, Integration Services (IS) packages, and Reporting Services (RS) reports. Using SSDT, you can create an offline database project and implement schema changes by adding, modifying, or deleting the definitions of objects (represented by scripts) in the project, without a connection to a server instance.
Which objects can be added to a Microsoft Power BI dashboard? Select all correct options. - an image - a visualization from a report - a Microsoft PowerPoint slide - a report page - a dataflow - a text
- an image - a visualization from a report - a report page - a text
You manage an application that stores data in a shared folder on a Windows server. You need to move the shared folder to Azure Storage. Which type of Azure Storage should you use? - file - table - queue - blob
- file Keywords: Shared folder, windows server => use File storage Correct answer is option file
How much data can be stored in a single Table Storage account? Select the correct option. - 100 TB - 500 TB - 5 PB - Unlimited
500 TB
You are deploying a SaaS (Software as a Service) application that requires a relational database for Online Transaction Processing (OLTP). Which Azure service should you use to support the application? Azure Cosmos DB Azure Synapse Analytics Azure SQL Database Azure HDInsight
Azure SQL Database Azure SQL Database is a new compute tier that optimizes price-performance and simplifies performance management for databases with intermittent and unpredictable usage. The serverless compute tier enjoys all the fully managed, built-in intelligence benefits of SQL Database and helps accelerate application development, minimize operational complexity, and lower total costs.
A platform-as-a-service (PaaS) solution that you can use to define streaming jobs that ingest data from a streaming source, apply a perpetual query, and write the results to an output.
Azure Stream Analytics
Apart from the way in which batch processing and streaming processing handle data, there are other differences: Data scope: _________ processing can process all the data in the dataset. __________ processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example).
Batch Stream
__________________________ focuses on moving and transforming data at rest.
Batch processing
The process of splitting an entity into more than one table to reduce data redundancy is called: _____________. A. deduplication B. denormalization C. normalization D. optimization
C. normalization
Which Azure Cosmos DB API should you use to work with data in which entities and their relationships to one another are represented in a graph using vertices and edges? - MongoDB API - Core (SQL) API - Gremlin API
Gremlin API
The airline company needs to update the prices of tickets based on customer feedback, fuel price, and other factors. It analyses the data which gives the new prices of tickets. What type of analysis does this come under? - Descriptive analytics - Diagnostic analysis - Cognitive analysis - Predictive analysis - Prescriptive analysis
Prescriptive analysis
Azure Files supports what two common network file sharing protocols?
Server Message Block (SMB) Network File System (NFS)
You need to query a table named Products in an Azure SQL database. Which three requirements must be met to query the table from the internet? Each correct answer presents part of the solution. (Choose three.) You must be assigned the Reader role for the resource group that contains the database. You must have SELECT access to the Products table. You must have a user in the database. You must be assigned the Contributor role for the resource group that contains the database. Your IP address must be allowed to connect to the database.
Your IP address must be allowed to connect to the database. You must have SELECT access to the Products table. You must have a user in the database.
In ________ processing, newly arriving data elements are collected and stored, and the whole group is processed together as a _________.
batch
Your company is designing a database that will contain session data for a website. The data will include notifications, personalization attributes, and products that are added to a shopping cart. Which type of data store will provide the lowest latency to retrieve the data? document key/value columnar graph
columnar A column-family database organizes data into rows and columns. In its simplest form, a column-family database can appear very similar to a relational database, at least conceptually. Some examples are • Recommendations • Personalization • Sensor data • Telemetry • Messaging • Social media analytics • Web analytics • Activity monitoring • Weather and other time-series data
A _____________________ collaborates with stakeholders to design and implement data-related workloads, including data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads. They use a wide range of data platform technologies, including relational and non-relational databases, file stores, and data streams.
data engineer
A _________________ is a file store, usually on a distributed file system for high performance data access.
data lake
A ____________________ is responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database systems. They're responsible for the overall availability and consistent performance and optimizations of databases.
database administrator
You are building a system that monitors the temperature throughout a set of office blocks and sets the air conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has to manage the air conditioning in several thousand buildings spread across the country or region, and each building typically contains at least 100 air-conditioned rooms. What type of NoSQL datastore is most appropriate for capturing the temperature data to enable it to be processed quickly? - A column family database - Write the temperatures to a blob in Azure Blob storage - A key-value store
- A key-value store
Which Azure SQL offering supports automatic database scaling and automatic pausing of the database during inactive periods? Select the correct option. - Azure SQL Database serverless - Azure SQL managed instance - Azure SQL Database Hyperscale
- Azure SQL Database serverless SQL Database serverless is a new compute tier that optimizes price-performance and simplifies performance management for databases with intermittent and unpredictable usage. The serverless compute tier enjoys all the fully managed, built-in intelligence benefits of SQL Database and helps accelerate application development, minimize operational complexity and lower total costs.
Data ingestion is _______________________ - the process of transforming raw data into models containing meaningful information - analyzing data for anomalies - capturing raw data streaming from various sources and storing it
- capturing raw data streaming from various sources and storing it The purpose of data ingestion is to receive raw data and save it as quickly as possible. The data can then be processed and analyzed.
What are the characteristics of an Online Transaction Processing (OLTP) workload? Select three correct options. - denormalized data - light writes and heavy reads - normalized data - heavy writes and moderate reads - schema on write - schema on read
- normalized data - heavy writes and moderate reads - schema on write
A. Tables B. Indexes C. Views D. Keys 1. Create relationships. 2. Improve processing speed for data searches. 3. Store instances of entities as rows. 4. Display data from predefined queries.
1 - D 2 - B 3 - A 4 - C
Which kind of visualization should you use to analyze pass rates for multiple exams over time? - A pie chart - A scatter plot - A line chart
A line chart
A highly scalable storage service that is often used in batch processing scenarios, but which can also be used as a source of streaming data.
Azure Data Lake Store Gen 2
Match the types of activities to the appropriate Azure Data Factory activities. 1 - Control 2 - Data movement 3 - Data transformation A - Mapping Data Flow B - Copy C - Until
Control - Until Data Movement - Copy Data Transformation - Mapping Data Flow
Name the Blob storage access tier. The ______ tier has lower performance and incurs reduced storage charges compared to the Hot tier. Use the ______ tier for data that is accessed infrequently.
Cool
Which API should you use to store and query JSON documents in Azure Cosmos DB? - Core (SQL) API - Cassandra API - Table API
Core (SQL) API
You have data that consists of JSON-based documents. You need to store the data in an Azure environment that supports efficient non-key, field-based searching. You should use _______________________ as the data store. A. Azure Table Storage B. Azure Blob Storage C. Azure File Storage D. Azure Cosmos DB
D. Azure Cosmos DB
Microsoft __________________ is a platform for analytical data modeling and reporting that data analysts can use to create and share interactive data visualizations.
Power BI
Which tool should you use to import data from multiple data sources and create a report? - Power BI Desktop - Power BI Phone App - Azure Data Factory
Power BI Desktop
What type of analysis answers the question "What will happen?"
Predictive Analysis
True or False Any content uploaded to an Azure storage account is encrypted.
True
_________________________ supports role-based access control (RBAC) at the file and folder level. - Azure Queue storage - Azure Disk storage - Azure Data Lake storage - Azure Blob storage
- Azure Data Lake storage Azure Data Lake Storage implements an access control model that supports both Azure role-based access control (Azure RBAC) and POSIX-like access control lists (ACLs).
Which of the below options are true? - Azure Data Studio can be used to restore a database. - Microsoft SQL Server Management Studio (SSMS) enables users to create and use SQL notebooks. - Azure Data Studio can be used to query an Azure SQL Database from a device that runs macOS.
- Azure Data Studio can be used to restore a database. - Azure Data Studio can be used to query an Azure SQL Database from a device that runs macOS.
Which among the following statement(s) is/are true with respect to the Azure SQL database? Select all correct options. Azure SQL Database allows independent provisioning of throughput and storage (Correct) Azure SQL Database has built in high availability (Correct) Azure SQL Database includes a fully managed backup service (Correct) Azure SQL Database is fully compatible with Microsoft SQL server
- Azure SQL Database allows independent provisioning of throughput and storage - Azure SQL Database has built in high availability - Azure SQL Database includes a fully managed backup service
Which are the services in Azure that can be used to ingest the data? Select two correct options. - Stream Analytics Job - Event hub - Synapse Analytics - Azure Databricks - IoT hub
- Event hub - IoT hub
_____________ is a cloud service from the creators of Apache Spark, combined with a great integration with the Azure platform.
Azure Databricks
You develop data engineering solutions for a company. You must integrate the company's on-premises Microsoft SQL Server data with Microsoft Azure SQL Database. Data must be transformed incrementally. You need to implement the data integration solution. Which tool should you use to configure a pipeline to copy data? A. Use the Copy Data tool with Blob storage linked service as the source B. Use Azure PowerShell with SQL Server linked service as a source C. Use Azure Data Factory UI with Blob storage linked service as a source D. Use the .NET Data Factory API with Blob storage linked service as the source
B. Use Azure PowerShell with SQL Server linked service as a source
Azure ________________ is an Azure-integrated version of the popular __________________ platform, which combines the Apache Spark data processing platform with SQL database semantics and an integrated management interface to enable large-scale data analytics.
Databricks
This kind of storage uses the principle of a hash table, also called a dictionary. No matter what the value content is, it must be just one value, and it can be matched with a unique key.
Key-value store
Which Power BI tool helps build paginated reports? Select the correct option. - Power BI Desktop - Power BI Report Builder - Power BI Server - Power BI Apps
Power BI Report Builder Power BI Report Builder is a tool that allows you to build paginated reports.
An open-source library that enables you to develop complex streaming solutions on Apache Spark based services, including Azure Synapse Analytics, Azure Databricks, and Azure HDInsight.
Spark Structured Streaming
You would like to consume data in Azure Data bricks from different services. Azure Data bricks can consume data from Azure Event Hub ? Y/N
Yes
One final thing worth considering about analytical models is the creation of attribute __________________ that enable you to quickly drill-up or drill-down to find aggregated values at different levels in a dimension.
hierarchies
If you set up your SQL Database networking with "public endpoint", without any other actions, which type of user can connect to the database? Select the correct option. - Public users with the correct access username and password - No users - Private users - Admin users
- No users Even with a public endpoint, SQL Database needs to have it's firewall configured to allow anyone in. By default, all access attempts are denied unless explicitly added to the firewall access list.
You have a large amount of data held in files in Azure Data Lake storage. You want to retrieve the data in these files and use it to populate tables held in Azure Synapse Analytics. Which processing option is most appropriate? - Use Azure Synapse Link to connect to Azure Data Lake storage and download the data - Synapse Spark pool - Synapse SQL pool
- Synapse SQL pool You can use PolyBase from a SQL pool to connect to the files in Azure Data Lake as external tables, and then ingest the data.
What is the primary purpose of an index on a relational data table? Select the correct option. - It is a simplified view of a table, returning the same data but fewer columns for instance - It is a child table that uses a foreign key to refer to the primary table - To speed up INSERT statements so data is written to the table faster - To speed up SELECT queries so that they return faster
- To speed up SELECT queries so that they return faster
Azure Data Factory supports a trigger that is scheduled at a predetermined time but can pretend it is running at another time. For instance, a job runs every day at NOON, but only processes the data received until midnight last night. What type of trigger is this called? - Time-series interval - Scheduled trigger - Tumbling window trigger - Manual trigger
- Tumbling window trigger
When should you use a block blob, and when should you use a page blob? Select the correct option. - Use a block blob for active data stored using the Hot data access tier. Use a page blob for data stored using the Cool or Archive data access tiers. - Use a block blob for unstructured data that requires random access to perform reads and writes. Use a page blob for discrete objects that rarely change - Use a block blob for discrete objects that change infrequently. Use a page block for blobs that require random read and write access.
- Use a block blob for discrete objects that change infrequently. Use a page block for blobs that require random read and write access.
Azure Data Studio is supported on ______________________________ platforms. - Windows, MacOS, & Linux - Windows & Linux - Windows - Windows, MacOS, Linux, iOS, & Android
- Windows, MacOS, Linux, iOS, & Android
Which statements are true? - You can use blob, table, queue, and file storage in the same Azure Storage account. - You implement Azure Data Lake Storage by creating an Azure Storage account. - When ingesting data from Azure Data Lake Storage across Azure regions, you will incur cost for bandwidth.
- You can use blob, table, queue, and file storage in the same Azure Storage account. - You implement Azure Data Lake Storage by creating an Azure Storage account. - When ingesting data from Azure Data Lake Storage across Azure regions, you will incur cost for bandwidth.
A key/value data store is best suitable for _____________________ - enforcing constraints - transactions - table joins - simple lookups
- simple lookups A key/value store associates each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations. To modify a value (either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation. An application can store arbitrary data as a set of values. Any schema information must be provided by the application. The key/value store simply retrieves or stores the value by key.
When you create an Azure SQL database, which account can always connect to the database? Select the correct option. - the server admin login account of the logical server - the Sys Admin (SA) account - the Azure Active Directory (Azure AD) administrator account - the Azure Active Directory (Azure AD) account that created the database
- the server admin login account of the logical server When you first deploy Azure SQL, you specify an admin login and an associated password for that login. This administrative account is called Server admin. You can use this account to always connect to the database.
What must you define to implement a pipeline that reads data from Azure Blob Storage? - A linked service for your Azure Blob Storage account - A dedicated SQL pool in your Azure Synapse Analytics workspace - An Azure HDInsight cluster in your subscription
A linked service for your Azure Blob Storage account
Which of the following Azure Service uses the hierarchical namespace to store the data? Select the correct option. - Azure Data Factory - Azure Data Lake Storage - Azure Synapse Analytic - Azure Cosmos DB
Azure Data Lake Storage A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace.
Which database service is the simplest option for migrating a LAMP application to Azure? Azure SQL Managed Instance Azure Database for MySQL Azure Database for PostgreSQL
Azure Database for MySQL
___________________ is essentially a way to create cloud-based network shares, such as you typically find in on-premises organizations to make documents and other files available to multiple users.
Azure Files
A data ingestion service that is similar to Azure Event Hubs, but which is optimized for managing event data from Internet-of-things (IoT) devices.
Azure IoT Hub
Which service would you use to continually capture data from an IoT Hub, aggregate it over temporal periods, and store results in Azure SQL Database? - Azure Cosmos DB - Azure Stream Analytics - Azure Storage
Azure Stream Analytics
Which Azure services can you use to create a pipeline for data ingestion and processing? - Azure SQL Database and Azure Cosmos DB - Azure Synapse Analytics and Azure Data Factory - Azure HDInsight and Azure Databricks
Azure Synapse Analytics and Azure Data Factory
____________________ is a NoSQL storage solution that makes use of tables containing key/value data items. Each item is represented by a row that contains columns for the data fields that need to be stored.
Azure Table Storage
Which Azure Data Factory component should you use to represent data that you want to ingest for processing? A. Linked services B. Datasets C. Pipelines D. Notebooks
B. Datasets
What are three characteristics of non-relational data? Each correct answer presents a complete solution. A. Forced schema on data structures B. Flexible storage of ingested data C. Entities are self-describing D. Entities may have different fields E. Each row has the exact same columns
B. Flexible storage of ingested data C. Entities are self-describing D. Entities may have different fields
Which definition of stream processing is correct? - Data is processed continually as new data records arrives - Data is collected in a temporary store, and all records are processed together as a batch. - Data is incomplete and cannot be analyzed.
Data is processed continually as new data records arrives
You need to analyze data to know what occurred with the daily moving average inventories for the past 90 days of stock financial data. Which type of analytics will you perform? Diagnostic Analytics Cognitive Analytics Predictive Analytics Descriptive Analytics Prescriptive Analytics
Descriptive Analytics Descriptive analytics helps answer questions about what has happened, based on historical data. Descriptive analytics techniques summarize large datasets to describe outcomes to stakeholders.
The production workload is facing technical issues with one of the servers. You need to collect the logs and analyze the logs to determine the root cause of the issue. What type of analysis would you perform? - Predictive analysis - Cognitive analysis - Descriptive analytics - Prescriptive analysis - Diagnostic analysis
Diagnostic analysis
Match them up... 1 - Why did sales increase last month? 2 - How do I allocate my budget to buy different inventory items? 3 - The rise of COVID19 cases in a country. Descriptive analytics Prescriptive analytics Diagnostic analytics Cognitive analytics Predictive analytics
Diagnostic analytics - Why did sales increase last month Prescriptive analytics - How do I allocate my budget to buy different inventory items? Predictive analytics - The rise of COVID19 cases in a country
__________________ tables represent the entities by which you want to aggregate numeric measures - for example product or customer. Each entity is represented by a row with a unique key value. The remaining columns represent attributes of an entity - for example, products have names and categories, and customers have addresses and cities.
Dimension
How can you enable globally distributed users to work with their own local replica of a Cosmos DB database? - Create an Azure Cosmos DB account in each region where you have users. - Use the Table API to copy data to Azure Table Storage in each region where you have users. - Enable multi-region writes and add the regions where you have users.
Enable multi-region writes and add the regions where you have users.
You have data stored in ADLS in text format. You need to load the data in Azure Synapse Analytics in a table in one of the databases. Which are the two components that you should define in order to use Polybase? Internal format Internal source External format External source
External format External source 1. Create External data source - External data sources are used to connect to storage accounts. The complete documentation is outlined here. 2. Create an External File Format - Creates an external file format object that defines external data stored in Azure Blob Storage or Azure Data Lake Storage. Creating an external file format is a prerequisite for creating an external table. The complete documentation is here.
Your company plans to load data from a customer relationship management (CRM) system to a data warehouse by using an extract, load, and transform (ELT) process. Where does data processing occur for each stage of the ELT process? 1 - Extract 2 - Load 3 - Transform A - An in-memory digestion tool B - The CRM system. C - The data warehouse.
Extract - The CRM system. Load - The data warehouse. Transform - The data warehouse.
Name the Blob storage access tier. The ____ tier is the default. You use this tier for blobs that are accessed frequently. The blob data is stored on high-performance media.
Hot
In a data warehousing workload, data _______________ - from a single source is distributed to multiple locations - is used to train machine learning models - from multiple sources is combines in a single location - is added to a queue for multiple systems to process
In a data warehousing workload, data from multiple sources is combines in a single location A data warehouse workload refers to all operations that transpire in relation to a data warehouse. The depth and breadth of these components depend on the maturity level of the data warehouse. The data warehouse workload encompasses: - The entire process of loading data into the warehouse - Performing data warehouse analysis and reporting - Managing data in the data warehouse - Exporting data from the data warehouse
Match the types of data stores to the appropriate scenarios: 1 - Application Users and their Default Language 2 - Medical images and their associated metadata. 3 - Employee data that show relationship between employees. A - Graph B - Key/Value C - Column Family D - Object
Key/Value - Application Users and their Default Language Object - Medical images and their associated metadata. Graph - Employee data that show relationship between employees.
Which of the following statements is true about Azure SQL Database? Most database maintenance tasks are automated You must purchase a SQL Server license It can only support one database
Most database maintenance tasks are automated
You have a transactional workload running on a relational database system. You need to remove all DML anomalies which hamper the integrity of the databases. What would you do in such a scenario? - Remove relationships in the tables - De-normalize the tables as much as possible - Block all DML queries - Normalize the tables as much as possible
Normalize the tables as much as possible
Which of the below items are true? - Partition keys are used in Azure Cosmos DB to optimize queries. - Items contained in the same Azure Cosmos DB logical partition can have different partition keys. - The Azure Cosmos DB API is configured separately for each database in an Azure Cosmos DB account.
Partition keys are used in Azure Cosmos DB to optimize queries.
Apart from the way in which batch processing and streaming processing handle data, there are other differences: Analysis: ______________ processing is used for simple response functions, aggregates, or calculations such as rolling averages. You typically use __________ processing to perform complex analytics.
Stream batch
Why might you use Azure File storage? - To share files that are stored on-premises with users located at other sites. - To enable users at different sites to share files. - To store large binary data files containing images or other unstructured data.
To enable users at different sites to share files.
What should you do to an existing Azure Storage account in order to support a data lake for Azure Synapse Analytics? - Add an Azure Files share - Create Azure Storage tables for the data you want to analyze - Upgrade the account to enable hierarchical namespace and create a blob container
Upgrade the account to enable hierarchical namespace and create a blob container
Scaling Azure SQL Database (up or down) does not negatively affect applications using that database. The users might not even notice. Select the correct option. - Yes - No
Yes Scaling Azure SQL Database does not affect any applications or users using it at the time.
Apart from the way in which batch processing and streaming processing handle data, there are other differences: Performance: Latency is the time taken for the data to be received and processed. The latency for _________ processing is typically a few hours. __________ processing typically occurs immediately, with latency in the order of seconds or milliseconds.
batch Stream
The two main kinds of workloads are ______________ and _________________.
extract-transform-load (ETL) extract-load-transform (ELT)
You need to recommend a non-relational data store that is optimized for storing and retrieving text files, videos, audio streams, and virtual disk images. The data store must store data, some metadata, and a unique ID for each file. Which type of data store should you recommend? key/value columnar document object
object Object storage is optimized for storing and retrieving large binary objects (images, files, video and audio streams, large application data objects and documents, virtual machine disk images). Large data files are also popularly used in this model, for example, delimited file (CSV), parquet, and ORC. Object stores can manage extremely large amounts of unstructured data.
You use _________ statements to create, modify, and remove tables and other objects in a database (table, stored procedures, views, and so on).
Data Definition Language (DDL)
SQL statements are grouped into what three main logical groups?
Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML)
A _____________ is a more sophisticated table. It allows for attributes also on columns and can auto-calculate subtotals.
matrix
Which of the following is based on column family database ? Gremlin Apache Cassandra SQL Table API
Apache Cassandra The most widely used column family database management system is Apache Cassandra. Azure Cosmos DB supports the column-familiy approach through the Cassandra API.
The technique that provides recommended actions that you should take to achieve a goal or target is called _____________ analytics. A. descriptive B. diagnostic C. predictive D. prescriptive
D. prescriptive
The act of increasing or decreasing the resources that are available for a service is called: _____________. A. computing B. provisioning C. networking D. scaling
D. scaling
What type of analysis answers the question "What happened?"
Descriptive Analysis
What type of analysis answers the question "How can we make it happen?"
Prescriptive Analysis
You would like to consume data in Azure Data bricks from different services. Azure Data bricks can consume data from Azure cosmos db ? Y/N
Yes
You need to use JavaScript Object Notation (JSON) files to provision Azure storage. What should you use? A. Azure portal B. Azure command-line interface (CLI) C. Azure PowerShell D. Azure Resource Manager (ARM) templates
D. Azure Resource Manager (ARM) templates
Database administrators generally use _______ statements to manage access to objects in a database by granting, denying, or revoking permissions to specific users or groups.
Data Control Language (DCL)
An open-source data ingestion solution that is commonly used together with Apache Spark.
Apache Kafka
Which open-source distributed processing engine does Azure Synapse Analytics include? - Apache Hadoop - Apache Spark - Apache Storm
Apache Spark
Name the Blob storage access tier. The __________ tier provides the lowest storage cost, but with increased latency. The ____________tier is intended for historical data that mustn't be lost, but is required only rarely. Blobs in the ______________tier are effectively stored in an offline state.
Archive
____________________ is a global-scale non-relational (NoSQL) database system that supports multiple application programming interfaces (APIs), enabling you to store and manage data as JSON documents, key-value pairs, column-families, and graphs.
Azure Cosmos DB
You are designing an application that will store petabytes of medical imaging data. When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes. You need to select a storage strategy for the data. The solution must minimize costs. Which storage tier should you use for each time frame? If you would like to access the data after one year, which storage strategy you will use ? Archive Hot Cool
Cool After one year: Cool -
Which role is most likely to use Azure Data Factory to define a data pipeline for an ETL process? Database Administrator Data Engineer Data Analyst
Data Engineer
You design a data ingestion and transformation solution by using Azure Data Factory service. You need to get data from an Azure SQL database. Which two resources should you use? Each correct answer presents part of the solution. A. Linked service B. Copy data activity C. Dataset D. Azure Databricks notebook
A. Linked service C. Dataset
_____________________ is an Azure service that enables you to define and schedule data pipelines to transfer and transform data. You can integrate your pipelines with other Azure services, enabling you to ingest data from cloud data stores, process the data using cloud-based compute, and persist the results in another data store.
Azure Data Factory
A company plans to use Apache Spark Analytics to analyze intrusion detection data. You need to recommend a solution to monitor network and system activities for malicious activities and policy violations. Reports must be produced in an electronic format and sent to management. The solution must minimize administrative efforts. What should you recommend? Azure Data Factory Azure Data Lake Azure Databricks Azure HDInsight
Azure Databricks Recommendation engines, churn analysis, and intrusion detection are common scenarios that many organizations are solving across multiple industries. They require machine learning, streaming analytics, and utilize massive amounts of data processing that can be difficult to scale without the right tools. Companies like Lennox International, E.ON, and renewables.AI are just a few examples of organizations that have deployed Apache Spark™ to solve these challenges using Microsoft Azure Databricks.
You need to suggest a telemetry data solution that supports the analysis of log files in real time. Which two Azure services should you include in the solution? Azure Databricks Azure Data Factory Azure Event Hubs Azure SQL DB
Azure Databricks Azure Event Hubs
____________________ provides a solution for enterprise-wide data governance and discoverability. You can use __________________ to create a map of your data and track data lineage across multiple data sources and systems, enabling you to find trustworthy data for analysis and reporting.
Azure Purview
You are designing a data storage solution for a database that is expected to grow to 50 TB. The usage pattern is singleton inserts, singleton updates, and reporting. Which storage solution should you use? Azure SQL Database elastic pools Azure SQL Data Warehouse Azure Cosmos DB that uses the Gremlin API Azure SQL Database Hyperscale
Azure SQL Database Hyperscale A Hyperscale database is an Azure SQL database in the Hyperscale service tier that is backed by the Hyperscale scale-out storage technology. A Hyperscale database supports up to 100 TB of data and provides high throughput and performance, as well as rapid scaling to adapt to the workload requirements. Scaling is transparent to the application connectivity, query processing, etc. work like any other Azure SQL database.
____________ is the new name for Azure SQL Data Warehouse, but it extends it in many ways. It aims to be the comprehensive analytics platform, from data ingestion to presentation, bringing together one-click data exploration, robust pipelines, enterprise-grade database service, and report authoring.
Azure Synapse Analytics
You are designing an application that will store petabytes of medical imaging data. When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes. You need to select a storage strategy for the data. The solution must minimize costs. Which storage tier should you use for each time frame? If you would like to access the data after first month, which storage strategy you will use ? Archive Hot Cool
Cool First week: Hot -Hot - Optimized for storing data that is accessed frequently. After one month: Cool -Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days.
You need to process data that is generated continuously and near real-time responses are required. You should use _________. A. batch processing B. scheduled data processing C. buffering and processing D. streaming data processing
D. streaming data processing
A company is developing a mission-critical line of business app that uses Azure SQL Database Managed Instance. You must design a disaster recovery strategy for the solution. You need to ensure that the database automatically recovers when full or partial loss of the Azure SQL Database service occurs in the primary region. What should you recommend? Failover-group Azure SQL Data Sync SQL Replication Active geo-replication
Failover-group Auto-failover groups is a SQL Database feature that allows you to manage replication and failover of a group of databases on a SQL Database server or all databases in a Managed Instance to another region (currently in public preview for Managed Instance). It uses the same underlying technology as active geo- replication. You can initiate failover manually or you can delegate it to the SQL Database service based on a user-defined policy.
_____________ is a feature of SQL Server and Azure SQL Database that enables you to run Transact-SQL queries that read data from external data sources. ______ makes these external data sources appear like tables in a SQL database. A. SSMS B. Data Tools C. Polybase D. Analysis Service
Polybase PolyBase is a feature of SQL Server and Azure SQL Database that enables you to run Transact-SQL queries that read data from external data sources. PolyBase makes these external data sources appear like tables in a SQL database. Using PolyBase, you can read data managed by Hadoop, Spark, and Azure Blob Storage, as well as other database management systems such as Cosmos DB, Oracle, Teradata, and MongoDB.
_______________ is a cloud service that lets you implement, manage, and monitor a cluster for Hadoop, Spark, HBase, Kafka, Store, Hive LLAP, and ML Service in an easy and effective way.
Azure HDInsight
Data engineers can use ________________ to create a unified data analytics solution that combines data ingestion pipelines, data warehouse storage, and data lake storage through a single service.
Azure Synapse Analytics
Which single service would you use to implement data pipelines, SQL analytics, and Spark analytics? Azure SQL Database Microsoft Power BI Azure Synapse Analytics
Azure Synapse Analytics
A. Extract, Transform, Load (ETL) B. Extract, Load, Transform (ELT) 1. Optimize data privacy. 2. Provide support for Azure Data Lake.
1 - A 2 - B Extract, Transform, Load (ETL) is the correct approach when you need to filter sensitive data before loading the data into an analytical model. It is suitable for simple data models that do not require Azure Data Lake support. Extract, Load, Transform (ELT) is the correct approach because it supports Azure Data Lake as the data store and manages large volumes of data.
You need to recommend a storage solution for a sales system that will receive thousands of small files per minute. The files will be in JSON, text, and CSV formats. The files will be processed and transformed before they are loaded into a data warehouse in Azure Synapse Analytics. The files must be stored and secured in folders. Which storage solution should you recommend? A. Azure Data Lake Storage Gen2 B. Azure Cosmos DB C. Azure SQL Database D. Azure Blob storage
A. Azure Data Lake Storage Gen2 Azure provides several solutions for working with CSV and JSON files, depending on your needs. The primary landing place for these files is either Azure Storage or Azure Data Lake Store.1 Azure Data Lake Storage is an optimized storage for big data analytics workloads. For folders its Data lake
Which two Azure data services support Apache Spark clusters? Each correct answer presents a complete solution. A. Azure Synapse Analytics B. Azure Cosmos DB C. Azure Databricks D. Azure Data Factory
A. Azure Synapse Analytics C. Azure Databricks
You are designing reports by using Microsoft Power BI. For which three scenarios can you create Power BI reports as paginated reports? Each correct answer presents a complete solution. A. a report that has a table visual with an ability to print all the data in the table B. a report that has a table visual with an ability to see all the data in the table C. a report with a repeatable header and footer D. a report that is formatted to fit well on a printed page E. a report that uses only Power BI visuals
A. a report that has a table visual with an ability to print all the data in the table C. a report with a repeatable header and footer D. a report that is formatted to fit well on a printed page When a Power BI report that has a table visual contains multiple rows, printed, only records that can are displayed will be printed. All records print if you design the report by using Report Builder as a paginated report, all records print. Only paginated report supports repeatable headers and footers. You cannot create paginated reports by using Power BI visuals. You must use Report Builder instead.
You need to create a graph database. Which Azure data service should you use? A. Azure Table B. Azure Cosmos DB C. Azure Blob D. Azure File
B. Azure Cosmos DB Only Azure Cosmos DB supports creating graph databases. Azure Table Storage, Azure Blob Storage, and Azure File Storage do not support graph databases.
For which reason should you deploy a data warehouse? A. Record daily sales transactions. B. Perform sales trend analyses. C. Print sales orders. D. Search status of sales orders.
B. Perform sales trend analyses.
A company is designing a solution that uses Azure Databricks. The solution must be resilient to regional Azure datacenter outages. You need to recommend the redundancy type for the solution. What should you recommend? A. Read-access geo-redundant storage B. Locally-redundant storage C. Geo-redundant storage D. Zone-redundant storage
C. Geo-redundant storage
Databricks can process data held in many different types of storage, including Azure Blob storage, Azure Data Lake Store, Hadoop storage, flat files, SQL databases, and data warehouses, and Azure services such as Cosmos DB ? Y/N
Yes
You would like to consume data in Azure Data bricks from different services. Azure Data bricks can consume data from Azure SQL db ? Y/N
Yes