Microsoft Azure Data Fundamentals - DP-900

¡Supera tus tareas y exámenes ahora con Quizwiz!

One to One, One to Many, Many to One, and Many to Many (join) Tables

(see graphic to study this concept; it's just easier visually)

What should you create first for an integration process that copies data from Microsoft Excel files to Parquet files by using Azure Data Factory?

A linked service must be created first.

Which type of data structure should you use to optimize create, read, update, and delete (CRUD) operations for data saved in a multi-column tabular format?

A relational database is the best option for CRUD operations and uses the least amount of storage space.

What is query language?

A scripting or programming language designed as the format to submit a request or actions to the database. Notable query languages: SQL Graph SQL Kusto Xpath Gremlin

Which Azure Cosmos DB API below is queried by using a syntax based on SQL? Apache Cassandra This answer is correct. Apache Gremlin MongoDB Table

Apache Cassandra

Azure Databricks is built on:

Apache Spark and is compatible with other cloud providers

Transactions are "ACID" based:

Atomicity — each transaction is treated as a single unit of work, which succeeds completely or fails completely Consistency — transactions can only take the data in the database from one valid state to another Isolation — concurrent transactions cannot interfere with one another Durability — when a transaction has succeeded, the data changes are persisted in the database

Which Azure data service allows you to: store document, graph, and column-family databases?

Azure Cosmos DB can handle all the databases needed.

Which Azure Cosmos DB API allows you to implement a non-relational database and model nodes that have relationships between them?

Azure Cosmos DB for Apache Gremlin allows you to implement a non-relational database to model nodes and the relationships between them.

You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort. What to use?

Azure Cosmos DB has a SQL API that is optimized to store and process (transform) JSON files

You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort.

Azure Cosmos DB has a SQL API that is optimized to store and process (transform) JSON files.

Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used?

Azure SQL Database is a serverless platform as a service (PaaS) SQL instance.

Which data service provides a fully managed database with close to 100 percent feature parity with Microsoft SQL Server?

Azure SQL Managed Instance is fully managed

Which service allows you to aggregate data over a specific time window before the data is written to a data lake?

Azure Stream Analytics

Which service can you use to perpetually retrieve data from a Kafka queue, process the data, and write the data to Azure Data Lake?

Azure Stream Analytics

Which native Azure service can you use to process data by running Apache Spark jobs?

Azure Synapse Analytics is an Azure native service built on Spark. Azure Data Lake Storage is used to store data. Azure Cosmos DB does not use a Spark platform.

You are processing data by using batches and streams. What should you use to ensure that transactional consistency is enforced during the processing?

Delta Lake is an open-source storage layer that adds support to Data Lake Storage for transactional consistency.

Which two types of applications are used in transactional systems?

Live applications are used in transactional data processing systems. LOB applications are a type of live application that processes business data.

Which open-source database has built-in support for temporal data?

MariaDB has built-in support for temporal data. It enables applications to query data as the data appeared in previous points in time.

You need to process many JSON files every minute, while keeping the data from the files accessible by using native queries. Which Azure Cosmos DB API should you use?

NoSQL

Which two types of file store data in columnar format?

ORC and Parquet are columnar data formats. Avro is row-based, and CSV is used for delimited data.

Which type of visual in Microsoft Power BI should you use to compare categorized values as the proportions of a total value?

Pie charts visually compare different values as a proportion of a total.

Which open-source database is a hybrid relational-object database?

PostgreSQL

Which data service allows you to use every feature of Microsoft SQL Server in the cloud?

SQL Server on an Azure Virtual Machines running Windows

Types of Cloud Computing

SaaS - Software as a Service (for customers) PaaS - Platform as a Service (for developers) IaaS - Infrastructure as a Service (for admins)

Cubes are generated from:

a data warehouse but are a tables themselves.

A view is:

a virtual table based on the results of a SELECT query. It allows you to view specific rows in one or more tables

Tables in relational databases do not need:

all the columns to contain values or require that there be only one foreign key.

Azure Synapse Analytics is:

an Azure native service built on Apache Spark.

A linked service must be created for:

an integration process that copies data from Microsoft Excel files to Parquet files by using Azure Data Factory

Azure Storage blobs can store

any type of data, such as structured, semi-structured, or unstructured data. It provides hot, cool, and archive storage tiers for different use cases It can be accessed from Hadoop (available through HDInsight) or Azure Synapse Analytics using PolyBase3.

A stored procedure can:

be run from an application as well as from the server.

Azure Data Lake Storage Gen1

can store any type of big data, such as structured, semi-structured, or unstructured data. It provides a Hadoop-compatible file system that enables scalable analytics on the cloud. It can be accessed from various Azure services and tools, such as Azure Databricks, Azure Data Factory, Azure HDInsight, Azure Machine Learning, or Azure Synapse Analytics.

UPDATE:

changes the value of the columns in a table.

JOIN is used to:

combine data from two tables based on a shared key.

Bar charts are used to:

compare different values for discrete categories.

An inline function cannot be used to:

complete the task because it cannot modify nor create objects. It can be used to query a database.

ELT - Extract Transform and Load process:

copies data from files and OLTP databases into a data warehouse that is optimize for read activity

Semi-structured data is:

data that does not follow the tabular data structure models associated with relational databases or other data table forms, but has some organizational properties that make it easier to process than unstructured data. It often contains tags or markers that separate the elements and enforce hierarchies. Examples of semi-structured data are JSON, XML, CSV, or HTML files12.

Unstructured data is

data that is a collection of many varied data types that are stored in their native formats. It is also referred to as qualitative data as it can be observed but not measured. It does not have a fixed structure or schema and is not easily searchable or analyzable. Examples of unstructured data are text documents, images, videos, audio files, social media posts, or web pages.

Notebooks are:

data that is arranged in pages, designed for easy consumption

Structured data is

data that is stored in a predefined format and is highly specific. It is also referred to as quantitative data as it can be measured. It is stored in rows and columns and is related to each other. Examples of structured data are data stored in Microsoft Excel, Google Docs, relational databases, or other data table forms.

A stored procedure:

defines SQL statements that can be run on command.

Scatter plots are used to:

determine a relationship or correlation between two numeric values

UNION:

displays the content of two sets of columns from two tables but is not based on a shared key.

SELECT and INSERT:

do not affect the existing data in a table.

SELECT - INTO:

does an insert into a table.

A stored procedure can:

encapsulate any type of business logic that can be reused in the application.

Line charts are used to:

examine trends, usually over time.

Data warehouses use:

fact and dimension tables in a star/snowflake schema.

Relational databases do not use:

fact and dimension tables.

HAVING is used to:

filter content from a GROUP BY command.

SELECT - HAVING:

filters data.

HBase cannot handle:

graph databases.

Azure SQL Database cannot:

handle column-family databases.

Analytical data workloads are:

highly denormalized and are optimized for read operations.

Data Lake Storage, Event Hubs, and IoT Hub are sources commonly used to:

ingest data for stream processing

INSERT - VALUES:

inserts values into a single row

Datasets — a _______________________grouping of data

logical

A stored procedure can:

modify existing data as well as add new entries to tables.

A view cannot:

modify nor create objects. It can be used to query a database.

A table-valued function cannot be used to:

modify or create objects. It can be used to query a database.

UPDATE and MERGE:

modify the existing data in a table.

HDInsight is used to:

process large amounts of data by using Hadoop.

Databricks is used for:

processing large amounts of data, which is supported by multiple cloud providers.

Data warehouses feature structured or semi-structured data for creating:

reports & analytics

A view, a function, and a stored procedure allow you to:

reuse the query definition for a complex query that selects data from multiple tables.

Data Factory is used to:

run ETL pipelines.

Data Factory is used to:

run ETL pipelines. Databricks is used for processing large amounts of data, which is supported by multiple cloud providers. HDInsight is used to process large amounts of data.

Creating an index allows you to:

search for data in a table more efficiently.

HBase on HDInsight can store

semi-structured or unstructured data using a NoSQL database model.

Azure Cosmos DB can store

semi-structured or unstructured data using various APIs and models, such as SQL (Core), MongoDB, Cassandra, Gremlin (graph), or Azure Table.

INTERSECT:

shows only values that exist in both tables.

A key/value store is used for:

simple lookups based on a single key to obtain a single value.

OLAP databases are used for:

snowflake schemas with historical data

A graph database is used to:

store hierarchical data, such as organizational charts that have nodes and edges.

Column-family databases are used to:

store unstructured, tabular data comprising rows and columns.

Databases ______________________data that can be quickly access and searched

structured

Azure Data Explorer can store

structured or semi-structured data using a columnar database model.

Block blobs are used for files and:

the Archive tier is the least expensive option. The Cool tier is not the least expensive option. Append blobs cannot be update

Each row in a table in a relational database has:

the same number of columns, can have a primary key build on multiple columns, and must have a data type.

A document database uses:

unstructured data such as JSON, and is optimized for retrieval, not CRUD operations.

Datastores feature

unstructured or semi- structured data to housing data

Which two services allow you to pre-process a large volume of data by using Scala?

Databricks and the Spark pool in Azure Synapse Analytics run data processing for large amounts of data by using Scala.

In a stream processing architecture, what can you use to persist the processed results as files?

Data Lake Storage Gen2 can be used to store files

Describe features of analytical workloads

Data files may be stored in a central data lake for analysis An extract, transform, and load (ETL) process copies data from files and OLTP databases into a data warehouse that is optimized for read activity Data in the data warehouse may be aggregated and loaded into an online analytical processing (OLAP) model, or cube The data in the data lake, data warehouse, and analytical model can be queried to produce reports and dashboards

Row Store

Data is organized in rows Traditional relational databases are row-stores Good for general purpose databases Suited for Online transaction processing (OLTP) Great when needing all possible columns in a row is important in a query Not the best at analytics or massive amounts of data

Column Store

Data is organized into columns Faster at aggerating values for analytics NoSQL store or SQL-Like databases Great for vast amount of data (millions of records) Suited for Online analytical processing (OLTP) Great when you only need a few columns

Which type of data store uses star schemas, fact tables, and dimension tables?

Data warehouses use fact and dimension tables in a star/snowflake schema

Online Transactional Processing (OLTP)is stored in a database that is optimized for online transactional processing (OLTP) operations that support applications:

Transactions can be read or write: Data is stored using transactions


Conjuntos de estudio relacionados

CFA Level 1 - Financial Reporting & Analysis

View Set

PN Nursing Care of Children Online Practice 2020 B with NGN

View Set

Contemporary Biology - Quiz 9-11

View Set

Bio/Ensci 251 Ch. 4 Practice Problems

View Set

Art 100 Ch 15 Christianity and the Formation of Europe

View Set

Astronomy Part 4 Reading Assignments

View Set