Azure Data Fundamentals DP-900

Ace your homework & exams now with Quizwiz!

Which Azure Cosmos DB API should you use for data in a graph structure?

Apache Gremlin. The Gremlin API is used for graph databases. The MongoDB API stores data in the BSON format. The Table API is used to retrieve key/value pairs. The Cassandra API is used to retrieve tabular data.

Blobs

Binary large objects for massive amounts of unstructured data.

Which job role is resonpoisble for designing database solutions, creating databases, and developing stored procedures?

Database engineer.

What are two characteristics of Azure Table storage?

Each RowKey value is unique within a table partition. Items in the same partitions are stored in a row key order.

Archive tier

Lowest storage cost, increased latency. Can take hours for data to become available. Must rehydrate.

Which two DML statements are used to modify the existing data in a table?

UPDATE, MERGE

What should you create to improve the performance of a query accessing a table that contains millions of rows?

an Index.

Apache Spark in Azure

distributed processing framework. Azure Synapse Analytics Azure Databricks Azure HDInsight

Hot tier

the default for blob storage access. Use this for blobs that are accessed frequently. The blob data is stored on high-performance media.

durability

when a transaction ahs been committed, it will remain committed.

Which SQL clause can be used to copy all the rows from one table to a new table?

SELECT - INTO

Which three services can be used to ingest data for stream processing?

Azure Data Lake Storage. Azure Event Hubs. Azure IoT Hub.

Which type of Azure Storage is used for VHDs and is optimized for random read and write operations?

page blob. Page blobs are optimized for random access and used for VHDs. Append blobs cannot be updated. Block blobs are not used for VHDs.

Consistency

transactions can only take the data in the database from one valid state to another.

What should you create first for an integration process that copies data from Microsoft Excel files to Parquet files by using Azure Data Factory?

A linked service must be created first. Pipelines use existing linked services to load and process data. Datasets are the input and output, and activities can be defined as the data flow.

Page blobs

A page blob is organized as a collection of fixed size 512-byte pages. Optimized to support random read and write operations; fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines.

Azure Cosmos DB for NoSQL

Azure Cosmos DB for NoSQL is Microsoft's native non-relational service for working with the document data model. It manages data in JSON document format, and despite being a NoSQL data storage solution, uses SQL syntax to work with the data.

Azure Cosmos DB for Table

Azure Cosmos DB for Table is used to work with data in key-value tables, similar to Azure Table Storage. It offers greater scalability and performance than Azure Table Storage.

Append blobs

An append blob is a block blob optimized to support ammend operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported.

Which service can you use to perpetually retrieve data from a Kafka queue, process the data, and write the data to Azure Data Lake?

Azure Stream Analytics

Azure Cosmos DB for PostgreSQL

Azure Cosmos DB for PostgreSQL is a native PostgreSQL, globally distributed relational database that automatically shards data to help you build highly scalable apps. You can start building apps on a single node server group, the same way you would with PostgreSQL anywhere else. As your app's scalability and performance requirements grow, you can seamlessly scale to multiple nodes by transparently distributing your tables. PostgreSQL is a relational database management system (RDBMS) in which you define relational tables of data

XML

Extensible Markup Language. Older but was superseded by JSON

Which two storage solutions can be mounted in Azure Synapse Analytics and used to process large volumes of data?

Azure Blob storage. Azure Data lake storage.

Azure Cosmos DB for Apache Cassandra

Azure Cosmos DB for Apache Cassandra is compatible with Apache Cassandra, which is a popular open source database that uses a column-family storage structure. Column families are tables, similar to those in a relational database, with the exception that it's not mandatory for every row to have the same columns. Can be queried using SQL

Azure Cosmos DB for Apache Gremlin

Azure Cosmos DB for Apache Gremlin is used with data in a graph structure; in which entities are defined as vertices that form nodes in connected graph. Nodes are connected by edges that represent relationships, like this:

You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort. Which storage solution should you implement?

Azure Cosmos DB has a SQL API that is optimized to store and process (transform) JSON files. The SQL API allows you to query the documents by using SQL-like language. There is no additional learning curve here to complete the task. Azure Files is used as storage for any type of file. There are no built-in methods to query the file and aggregate data. Blob storage allows you to store any type of data. You must use Azure Synapse Analytics or Azure Databricks to be able to query and aggregate the data. Azure SQL Database is a relational database that keeps data in tables. You must create a process that queries the JSON files and stores them in a relational format.

Which two services allow you to pre-process a large volume of data by using Scala? Each correct answer presents a complete solution.

Azure Databricks, a serverless Apache Spark pool in Azure Synapse Analytics

You need to replace an existing on-premises SMB shared folder with a cloud solution. Which storage option should you choose?

Azure Files. Azure Files allows you to create cloud-based network shares to make documents and other files available to multiple users.

Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used?

Azure SQL Database. Azure SQL Database is a serverless platform as a service (PaaS) SQL instance. SQL Managed Instance is a PaaS service, but databases are maintained in the same SQL Managed Instance cluster. SQL Server on Azure Virtual Machines running Windows or Linux are not serverless options.

Azure streaming Analytics

Azure Stream Analytics: PaaS solution to define streaming jobs Spark Structured Streaming: Open-source library on to develop streaming solutions on Apache Spark based services including Azure Synapse Analytics, Azure Databricks and Azure HDInsight Azure Data Explorer. High-performance database and analytics service that is optimized for ingesting and querying batch or streaming data with a time-series element

Sinks (output)

Azure event hubs Azure Data lake store gen 2 or azure blob storage Azure SQL Database or Azure synapse Analytics, or Azure Databricks Power BI

Sources for stream processing (ingestion)

Azure event hubs Azure IoT hub Azure Data lake Store Gen 2 Apache Kafka. (ujsed with Apache Spark

What is a characteristic of batch processing

Batch processing is used to execute complex analysis. Batch processing handles a large amount of data at a time. Batch processing is usually measured in minutes and hours.

Delimited text files

CSV, TSV

Parquet

Columnar data format. contains row groups.

Which type of data store uses star schemas, fact tables, and dimension tables?

Data warehouses use fact and dimension tables in a star/snowflake schema. Relational databases do not use fact and dimension tables. Cubes are generated from a data warehouse but are a table themselves. Data lakes store files.

Delta Lake

Delta Lake is an open-source storage layer that adds support for transactional consistency, schema enforcement, and other common data warehousing features to data lake storage. It also unifies storage for streaming and batch data, and can be used in Spark to define relational tables for both batch and stream processing. When used for stream processing, a Delta Lake table can be used as a streaming source for queries against real-time data, or as a sink to which a stream of data is written. The Spark runtimes in Azure Synapse Analytics and Azure Databricks include support for Delta Lake. Delta Lake combined with Spark Structured Streaming is a good solution when you need to abstract batch and stream processed data in a data lake behind a relational schema for SQL-based querying and analysis.

Azure Cosmos DB

Fully managed and serverless distributed database for applications of any size or scale, with support for both relational and non-relational workloads. Use their own open source database engines.

Which is the best type of database to use for an organizational chart?

Graph. Graph databases are the best option for hierarchical data. Azure SQL Database is the best option for create, read, update, and delete (CRUD) operations and uses the least amount of storage space, but our solution does not require a database management system (DBMS). Object storage is the best option for file storage, not hierarchical databases. Table storage is not suited for files.

Which feature of transactional data processing guarantees that concurrent processes cannot see the data in an inconsistent state?

Isolation. Isolation in transactional data processing ensures that concurrent transactions cannot interfere with one another and must result in a consistent database state.

JSON

JavaScript Object Notation. Structured and semi-structured. Hierarchical document schema.

Which open-source database has built-in support for temporal data?

MariaDB

Which three open-source databases are available as platform as a service (PaaS) in Azure?

MariaDB, PostgreSQL, MySQL All in Azure Database.

Azure Cosmos DB for MongoDB

MongoDB is a popular open source database in which data is stored in Binary JSON (BSON) format. Azure Cosmos DB for MongoDB enables developers to use MongoDB client libraries and code to work with data in Azure Cosmos DB. MongoDB Query Language (MQL) uses a compact, object-oriented syntax in which developers use objects to call methods. For example, the following query uses the find method to query the products collection in the db object:

You design an application that needs to store data based on the following requirements: Store historical data from multiple data sources Load data on a scheduled basis Use a denormalized star or snowflake schema Which type of database should you use?

OLAP (online analytical processing). OLAP databases are used for snowflake schemas with historical data.

ORC

Optimized Row Columnar format. organizes data into columns rather than rows.

Which two types of file store data in columnar format? Each correct answer presents a complete solution.

Parquet, ORC

Avro

Row-based format created by Apache. Header is JSON, data is binary

Which data service allows you to control the amount of RAM, change the I/O subsystem configuration, and add or remove CPUs?

SQL Server on Azure Virtual Machines.

You need to recommend a solution that meets the following requirements: Encapsulates a business logic that can rename the products in a database Adds entries to tables What should you include in the recommendation?

a stored procedure. A stored procedure can encapsulate any type of business logic that can be reused in the application. A stored procedure can modify existing data as well as add new entries to tables. A stored procedure can be run from an application as well as from the server.

Block blobs

a set of blocks. Each block can vary in size up to 100 MB. A block blob can contain up to 50,000 blocks, giving a max size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently.

Which type of database can be used for semi-structured data that will be processed by an Apache Spark pool in Azure Synapse Analytics?

column-family. Column-family databases are used to store unstructured, tabular data comprising rows and columns. Azure Synapse Analytics Spark pools do not directly support graph or relational databases.

Isolation

concurrent transactions cannot interfere with one another and must result in a consistent database state.

Atomicity

each transaction is treated as a single unit, which succeeds completely or fails completely.

Cool tier

lower performance and incurs reduced storage charges compared to the hot tier. Use for data accessed infrequently. Can start hot and move to cool.

Which two attributes are characteristics of an analytical data workload?

optimized for read operations. highly denormalized. (denormalized means that it's put together in one table to be queried fast)


Related study sets

Fitness Assessment/Prescription: EXSC U455

View Set

Ch 19 quiz: Appraisal of Property

View Set

Chapter 7 - Investment Vehicle Characteristics

View Set

Chapter 37 Respiratory adaptive quizzing

View Set

Chapter 7 - Access Control Lists

View Set

ACCT 200 - Test 1 (Reading Notes)

View Set

Nurs 220 (Nutrition) CoursePoint Chapter 8

View Set

Fundamentals: Urinary System & Elimination, Test 3

View Set