DP-900

Ace your homework & exams now with Quizwiz!

What should you create in a data model to allow users to drill up and drill down in a report?

A hierarchy enables drill up and drill down in a dimension. A dimension enables navigation, but hierarchy is used to drill up and down in a dimension. Fact tables have values. Cubes are not created in Microsoft PowerBI.

Page blob

A page blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines. Used for VHDs.

Which data model element represents the entities by which you want to aggregate measures in Microsoft Power BI?

Dimensions

Apache Spark and Hadoop

Hadoop and Spark, both developed by the Apache Software Foundation, are widely used open-source frameworks for big data architectures. Each framework contains an extensive ecosystem of open-source technologies that prepare, process, manage and analyze big data sets.27/05/2021

Which two attributes are characteristics of a transactional data workload?

Transactional databases are highly normalized and are optimized for CRUD operations.

Cosmos DB for Apache Cassandra

· Apache Cassandra data is stored as column-family structures (not every row needs to have the same columns) · Based on SQL syntax

Cosmos DB for Gremlin

· Gremlin stores data in graph structure. Queried using Graph.

Azure Stream Analytics

· Ingest data from an input, such as an Azure event hub, Azure IoT Hub, or Azure Storage blob container. · Process the data by using a perpetual query to select, project, and aggregate data values. · Write the results to an output, such as Azure Data Lake Gen 2, Azure SQL Database, Azure Synapse Analytics, Azure Functions, Azure event hub, Microsoft Power BI, or others.

Cosmos DB for Table

· Works with data in key-value tables (like Azure Table storage) · The Table API is queried by using OData and LINQ queries.

Azure Data Explorer

Data Explorer is a standalone service that can analyze data from IoT devices and more. For example, by outputting Azure Stream Analytics logs to Azure Data Explorer, you can complement Stream Analytics low latency alerts handling with Data Explorer's deep investigation capabilities.

Which part of the data pipeline is Azure Event Hubs used for?

Data ingestion (at the beginning, or "front door", of the pipeline).

Which service allows you to perform on-demand analysis of large volumes of data from text logs, websites and IoT devices by using a common querying language for all the data sources?

Data Explorer is used for the analysis of large amounts of text log data, websites, and IoT devices and uses a common querying language. Data Lake Storage Gen2 is a data source, Azure Stream Analytics is used to define streaming jobs, apply a perpetual query, and write the results to an output. Azure Cosmos DB stores data.

Which data integration service allows you to orchestrate data flow without coding?

Data Factory

Atomicity

Each transaction is treated as a single unit, which succeeds completely or fails completely. For example, a transaction that involved debiting funds from one account and crediting the same amount to another account must complete both actions. If either action can't be completed, then the other action must fail.

What should you use to process large amounts of data by using Apache Hadoop?

HDInsight. a. Provides clusters for Apache open-source big data processing technologies, including Spark, Hadoop, HBase, and Kafka. b. Data engineers can support big data analytics workloads that depend on multiple open-source technologies

SQL Server on Azure VMs

IaaS Fully compatible with on-premises physical and virtualized installations. Applications and databases can easily be "lift and shift" migrated without change. SQL Server instances are installed in a virtual machine. Each instance can support multiple databases. You must manage all aspects of the server, including operating system and SQL Server updates, configuration, backups, and other maintenance tasks. Use this option when you need to migrate or extend an on-premises SQL Server solution and retain full control over all aspects of server and database configuration. (Remember, the most important contrast between IaaS and PaaS is that IaaS gives administrators more full control over operating systems. This means more management required on behalf of administrators. )

Four common types of non-relational databases

Key-value, document, column-family, graph

Kusto Query Language

Language used to query real-time log data in Azure Synapse Data Explorer

Azure offers services for open-source databases. Which three open-source RELATIONAL database systems does it offer support for?

MariaDB MySQL PostgreSQL

Between an OLAP and OLTP system, which of the two would be denormalized?

OLAP. Denormalization would allow for duplicate values, but would also make queries perform faster. An OLTP system would need to be normalized because it uses the least amount of storage and optimizes for updates, inserts, and deletes.

Which type of processing system is used to support live and line-of-business applications?

OLTP (online transactional processing)

Azure SQL Managed Instance

PaaS Near-100% compatibility with SQL Server. Most on-premises databases can be migrated with minimal code changes by using the Azure Database Migration service' Each managed instance can support multiple databases. Additionally, instance pools can be used to share resources efficiently across smaller instances. Fully automated updates, backups, and recovery. Use this option for most cloud migration scenarios, particularly when you need minimal changes to existing applications.

Azure SQL Database

Paas You can provision a single database in a dedicated, managed (logical) server; or you can use an elastic pool to share resources across multiple databases and take advantage of on-demand scalability. Fully managed. Use this option for new cloud solutions, or to migrate applications that have minimal instance-level dependencies. Serverless options are available.

Which two types of file store data in columnar format?

Parquet and ORC

The name of a a hybrid relational-object database

PostgreSQL

Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used?

SQL Database

What is the best option for migrating SQL Server applications to the cloud?

SQL Managed Instance

Which data service allows you to use every feature of Microsoft SQL Server in the cloud?

SQL Server on an Azure Virtual Machines running Windows

What is the native API in Cosmos DB, and what data format does it manage?

SQL is the native API in Cosmos DB. It manages data in the JSON format. You can use it to process many JSON files every minute, while keeping the data from the files accessible by using native queries.

Azure Event Hubs Azure Data Lake Store Gen 2, or Azure blob storage Azure SQL Database or Azure Synapse Analytics, or Azure Databricks PowerBI

Sinks for stream processing · Azure Event Hubs: Used to queue the processed data for further downstream processing. · Azure Data Lake Store Gen 2 or Azure blob storage: Used to persist the processed results as a file. · Azure SQL Database or Azure Synapse Analytics, or Azure Databricks: Used to persist the processed results in a database table for querying and analysis. · Microsoft Power BI: Used to generate real time data visualizations in reports and dashboards.

Azure Event Hub Azure IoT Hub Azure Data Lake Storage Gen 2 Apache Kafka

Sources for stream processing · Azure Event Hubs: A data ingestion service that you can use to manage queues of event data, ensuring that each event is processed in order, exactly once. · Azure IoT Hub: A data ingestion service that is similar to Azure Event Hubs, but which is optimized for managing event data from Internet-of-things (IoT) devices. · Azure Data Lake Store Gen 2: A highly scalable storage service that is often used in batch processing scenarios, but which can also be used as a source of streaming data. · Apache Kafka: An open-source data ingestion solution that is commonly used together with Apache Spark. You can use Azure HDInsight to create a Kafka cluster.

Consistency

Transactions can only take the data in the database from one valid state to another. To continue the debit and credit example above, the completed state of the transaction must reflect the transfer of funds from one account to the other.

True or False: Batch processing is used to handle complex analysis.

True!

Which two DML statements are used to modify the existing data in a table?

Update and merge

Data warehouse

A relational database that is optimized for read operations

Which type of database can be used for semi-structured data that will be processed by an Apache Spark pool in Azure Synapse Analytics?

Column-family

Which open-source database has built-in support for temporal data?

MariaDB

Cosmos DB for MongoDB

· MongoDB Query Language (MQL) uses object-oriented syntax · MongoDB stores data as Binary JSON (BSON)

Durability

When a transaction has been committed, it will remain committed. After the account transfer transaction has completed, the revised account balances are persisted so that even if the database system were to be switched off, the committed transaction would be reflected when it is switched on again.

Which two services allow you to pre-process a large volume of data by using Scala?

a serverless Apache Spark pool in Azure Synapse Analytics and Databricks

Which type of Azure Storage is the least expensive option that allows you to store and edit image files?

block blobs in the Archive tier. Append blobs cannot be updated.

PostgreSQL

hybrid relational-object database. You can store data in relational tables, but a PostgreSQL database also enables you to store custom data types, with their own non-relational properties. The database management system is extensible; you can add code modules to the database, which can be run by queries. Another key feature is the ability to store and manipulate geometric data, such as lines, circles, and polygons.

MariaDB

newer database management system, created by the original developers of MySQL. The database engine has since been rewritten and optimized to improve performance. MariaDB offers compatibility with Oracle Database (another popular commercial database management system). One notable feature of MariaDB is its built-in support for temporal data. A table can hold several versions of data, enabling an application to query the data as it appeared at some point in the past.

Azure Files

o Azure Files lets you create a cloud-based network share (this is what my company is moving towards! We won't have the network drives anymore, which are on premises!). o 100TB of data, across an unlimited number of files, max size of which can be 1 TB. 2000 people can be connected per file. II. Tiers o Standard tier: hard disk-based hardware in a data centre o Premium: solid-state disks · Server Message Block (SMB): · Network File System (NFS)

Partition in Azure Tables

o Grouping related rows based on a partition key. Rows with the same partition key will be stored together. o Partitions are independent from each other, and rows can be added/removed from a partition o Partition keys can be included in search criteria o RowKey is unique within a partition, not within a table.

Which type of database should you use to store sequential data in the fastest way possible?

time series

Cosmos DB for NoSQL

· Non-relational service for JSON document data · Uses SQL syntax to retrieve JSON documents

Cosmos DB for PostgreSQL

· PostgreSQL is a relational database management system that organizes tables of data · Highly scalable

MySQL

's the leading open source relational database for Linux, Apache, MySQL, and PHP (LAMP) stack apps. It's available in several editions; Community, Standard, and Enterprise. The Community edition is available free-of-charge, and has historically been popular as a database management system for web applications, running under Linux. Versions are also available for Windows. Standard edition offers higher performance, and uses a different technology for storing data. Enterprise edition provides a comprehensive set of tools and features, including enhanced security, availability, and scalability. The Standard and Enterprise editions are the versions most frequently used by commercial organizations, although these versions of the software aren't free.

Describe the four steps of a common architecture for an OLAP system

1. Data files may be stored in a central data lake for analysis. 2. An extract, transform, and load (ETL) process copies data from files and OLTP databases into a data warehouse that is optimized for read activity. Commonly, a data warehouse schema is based on fact tables that contain numeric values you want to analyze (for example, sales amounts), with related dimension tables that represent the entities by which you want to measure them (for example, customer or product), 3. Data in the data warehouse may be aggregated and loaded into an online analytical processing (OLAP) model, or cube. Aggregated numeric values (measures) from fact tables are calculated for intersections of dimensions from dimension tables. For example, sales revenue might be totaled by date, customer, and product. 4. The data in the data lake, data warehouse, and analytical model can be queried to produce reports, visualizations, and dashboards.

Azure SQL Edge

A SQL engine that is optimized for Internet-of-things (IoT) scenarios that need to work with streaming time-series data.

Append blobs

A block blob optimized to support append operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.

Which type of data structure should you use to optimize create, read, update, and delete (CRUD) operations for data saved in a multi-column tabular format?

A relational database is the best option for CRUD operations and uses the least amount of storage space. A key/value store is used for simple lookups based on a single key to obtain a single value. A document database uses unstructured data such as JSON, and is optimized for retrieval, not CRUD operations. A graph database is used to store hierarchical data, such as organizational charts that have nodes and edges.

You need to recommend a solution that meets the following requirements: Encapsulates a business logic that can rename the products in a database Adds entries to tables What should you include in the recommendation?

A stored procedure can encapsulate any type of business logic that can be reused in the application. A stored procedure can modify existing data as well as add new entries to tables. A stored procedure can be run from an application as well as from the server. An inline function cannot be used to complete the task because it cannot modify nor create objects. It can be used to query a database. A view cannot be used to complete the task because it cannot modify nor create objects. It can be used to query a database. A table-valued function cannot be used to complete the task because it cannot modify or create objects. It can be used to query a database.

Which Azure Cosmos DB APIs are queried using SQL?

Apache Cassandra

Which two services allow you to pre-process a large volume of data by using Scala?

Azure Databricks A serverless Apache Spark pool in Azure Synapse Analytics

Which service is built on Apache Spark and is compatible with other cloud providers?

Azure Databricks. Remember, even though Synapse Analytics is also built on Spark, it is not available, say, on AWS. Databricks is available on other cloud providers.

Which RDMS has serverless options?

Azure SQL Database

Four relational database services in Azure

Azure SQL Database, Azure SQL Managed Instance, Azure SQL Virtual Machine, and Azure Edge

Which service allows you to perform near real-time analytics on the operational data stored in Azure Cosmos DB?

Azure Synapse

You have an Azure Cosmos DB service running the SQL API. One of the operational databases has a lot of transactions. Which service allows you to perform near real-time analytics on the operational data stored in Azure Cosmos DB?

Azure Synapse

Which data integration service allows you to orchestrate data flow without coding?

Azure data Factory

Block blob

Can contain up to 50,000 blocks, each up to 100MB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently.

Azure DataLake Storage Gen2

Combines the scalability of blob storage and tiers, but also is a hierarchical file system. Used for big data. Hadoop, Azure Databricks, Azure Synapse Analytics can mount a distributed file system in Gen2. Storage account must first be upgraded to be a Hierarchical Namespace in order to create Gen2. You can't go back to flat namespace after doing this.

Isolation

Concurrent transactions cannot interfere with one another, and must result in a consistent database state. For example, while the transaction to transfer funds from one account to another is in-process, another transaction that checks the balance of these accounts must return consistent results - the balance-checking transaction can't retrieve a value for one account that reflects the balance before the transfer, and a value for the other account that reflects the balance after the transfer.


Related study sets

Laboratory related nongovernmental organizations

View Set

PrepU ch 39 oxygenation and perfusion

View Set

Test 2 Chapter 6: Effects on material misstatement increase or decrease part 2

View Set

****BA109 Module 6 Exam (Chapters 1-5)****

View Set

Skills USA Knowledge questions 2024 study guide

View Set

Med Surg Chapter 15: Cancer Study Questions, Lewis Chapter 15 Cancer, Chapter 15: Cancer

View Set

Writing - Organization and Form - Definition

View Set