Microsoft Azure Data Fundamentals - DP-900
One to One, One to Many, Many to One, and Many to Many (join) Tables
(see graphic to study this concept; it's just easier visually)
What should you create first for an integration process that copies data from Microsoft Excel files to Parquet files by using Azure Data Factory?
A linked service must be created first.
Which type of data structure should you use to optimize create, read, update, and delete (CRUD) operations for data saved in a multi-column tabular format?
A relational database is the best option for CRUD operations and uses the least amount of storage space.
What is query language?
A scripting or programming language designed as the format to submit a request or actions to the database. Notable query languages: SQL Graph SQL Kusto Xpath Gremlin
Which Azure Cosmos DB API below is queried by using a syntax based on SQL? Apache Cassandra This answer is correct. Apache Gremlin MongoDB Table
Apache Cassandra
Azure Databricks is built on:
Apache Spark and is compatible with other cloud providers
Transactions are "ACID" based:
Atomicity — each transaction is treated as a single unit of work, which succeeds completely or fails completely Consistency — transactions can only take the data in the database from one valid state to another Isolation — concurrent transactions cannot interfere with one another Durability — when a transaction has succeeded, the data changes are persisted in the database
Which Azure data service allows you to: store document, graph, and column-family databases?
Azure Cosmos DB can handle all the databases needed.
Which Azure Cosmos DB API allows you to implement a non-relational database and model nodes that have relationships between them?
Azure Cosmos DB for Apache Gremlin allows you to implement a non-relational database to model nodes and the relationships between them.
You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort. What to use?
Azure Cosmos DB has a SQL API that is optimized to store and process (transform) JSON files
You need to aggregate and store multiple JSON files that contain records for sales transactions. The solution must minimize the development effort.
Azure Cosmos DB has a SQL API that is optimized to store and process (transform) JSON files.
Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used?
Azure SQL Database is a serverless platform as a service (PaaS) SQL instance.
Which data service provides a fully managed database with close to 100 percent feature parity with Microsoft SQL Server?
Azure SQL Managed Instance is fully managed
Which service allows you to aggregate data over a specific time window before the data is written to a data lake?
Azure Stream Analytics
Which service can you use to perpetually retrieve data from a Kafka queue, process the data, and write the data to Azure Data Lake?
Azure Stream Analytics
Which native Azure service can you use to process data by running Apache Spark jobs?
Azure Synapse Analytics is an Azure native service built on Spark. Azure Data Lake Storage is used to store data. Azure Cosmos DB does not use a Spark platform.
You are processing data by using batches and streams. What should you use to ensure that transactional consistency is enforced during the processing?
Delta Lake is an open-source storage layer that adds support to Data Lake Storage for transactional consistency.
Which two types of applications are used in transactional systems?
Live applications are used in transactional data processing systems. LOB applications are a type of live application that processes business data.
Which open-source database has built-in support for temporal data?
MariaDB has built-in support for temporal data. It enables applications to query data as the data appeared in previous points in time.
You need to process many JSON files every minute, while keeping the data from the files accessible by using native queries. Which Azure Cosmos DB API should you use?
NoSQL
Which two types of file store data in columnar format?
ORC and Parquet are columnar data formats. Avro is row-based, and CSV is used for delimited data.
Which type of visual in Microsoft Power BI should you use to compare categorized values as the proportions of a total value?
Pie charts visually compare different values as a proportion of a total.
Which open-source database is a hybrid relational-object database?
PostgreSQL
Which data service allows you to use every feature of Microsoft SQL Server in the cloud?
SQL Server on an Azure Virtual Machines running Windows
Types of Cloud Computing
SaaS - Software as a Service (for customers) PaaS - Platform as a Service (for developers) IaaS - Infrastructure as a Service (for admins)
Cubes are generated from:
a data warehouse but are a tables themselves.
A view is:
a virtual table based on the results of a SELECT query. It allows you to view specific rows in one or more tables
Tables in relational databases do not need:
all the columns to contain values or require that there be only one foreign key.
Azure Synapse Analytics is:
an Azure native service built on Apache Spark.
A linked service must be created for:
an integration process that copies data from Microsoft Excel files to Parquet files by using Azure Data Factory
Azure Storage blobs can store
any type of data, such as structured, semi-structured, or unstructured data. It provides hot, cool, and archive storage tiers for different use cases It can be accessed from Hadoop (available through HDInsight) or Azure Synapse Analytics using PolyBase3.
A stored procedure can:
be run from an application as well as from the server.
Azure Data Lake Storage Gen1
can store any type of big data, such as structured, semi-structured, or unstructured data. It provides a Hadoop-compatible file system that enables scalable analytics on the cloud. It can be accessed from various Azure services and tools, such as Azure Databricks, Azure Data Factory, Azure HDInsight, Azure Machine Learning, or Azure Synapse Analytics.
UPDATE:
changes the value of the columns in a table.
JOIN is used to:
combine data from two tables based on a shared key.
Bar charts are used to:
compare different values for discrete categories.
An inline function cannot be used to:
complete the task because it cannot modify nor create objects. It can be used to query a database.
ELT - Extract Transform and Load process:
copies data from files and OLTP databases into a data warehouse that is optimize for read activity
Semi-structured data is:
data that does not follow the tabular data structure models associated with relational databases or other data table forms, but has some organizational properties that make it easier to process than unstructured data. It often contains tags or markers that separate the elements and enforce hierarchies. Examples of semi-structured data are JSON, XML, CSV, or HTML files12.
Unstructured data is
data that is a collection of many varied data types that are stored in their native formats. It is also referred to as qualitative data as it can be observed but not measured. It does not have a fixed structure or schema and is not easily searchable or analyzable. Examples of unstructured data are text documents, images, videos, audio files, social media posts, or web pages.
Notebooks are:
data that is arranged in pages, designed for easy consumption
Structured data is
data that is stored in a predefined format and is highly specific. It is also referred to as quantitative data as it can be measured. It is stored in rows and columns and is related to each other. Examples of structured data are data stored in Microsoft Excel, Google Docs, relational databases, or other data table forms.
A stored procedure:
defines SQL statements that can be run on command.
Scatter plots are used to:
determine a relationship or correlation between two numeric values
UNION:
displays the content of two sets of columns from two tables but is not based on a shared key.
SELECT and INSERT:
do not affect the existing data in a table.
SELECT - INTO:
does an insert into a table.
A stored procedure can:
encapsulate any type of business logic that can be reused in the application.
Line charts are used to:
examine trends, usually over time.
Data warehouses use:
fact and dimension tables in a star/snowflake schema.
Relational databases do not use:
fact and dimension tables.
HAVING is used to:
filter content from a GROUP BY command.
SELECT - HAVING:
filters data.
HBase cannot handle:
graph databases.
Azure SQL Database cannot:
handle column-family databases.
Analytical data workloads are:
highly denormalized and are optimized for read operations.
Data Lake Storage, Event Hubs, and IoT Hub are sources commonly used to:
ingest data for stream processing
INSERT - VALUES:
inserts values into a single row
Datasets — a _______________________grouping of data
logical
A stored procedure can:
modify existing data as well as add new entries to tables.
A view cannot:
modify nor create objects. It can be used to query a database.
A table-valued function cannot be used to:
modify or create objects. It can be used to query a database.
UPDATE and MERGE:
modify the existing data in a table.
HDInsight is used to:
process large amounts of data by using Hadoop.
Databricks is used for:
processing large amounts of data, which is supported by multiple cloud providers.
Data warehouses feature structured or semi-structured data for creating:
reports & analytics
A view, a function, and a stored procedure allow you to:
reuse the query definition for a complex query that selects data from multiple tables.
Data Factory is used to:
run ETL pipelines.
Data Factory is used to:
run ETL pipelines. Databricks is used for processing large amounts of data, which is supported by multiple cloud providers. HDInsight is used to process large amounts of data.
Creating an index allows you to:
search for data in a table more efficiently.
HBase on HDInsight can store
semi-structured or unstructured data using a NoSQL database model.
Azure Cosmos DB can store
semi-structured or unstructured data using various APIs and models, such as SQL (Core), MongoDB, Cassandra, Gremlin (graph), or Azure Table.
INTERSECT:
shows only values that exist in both tables.
A key/value store is used for:
simple lookups based on a single key to obtain a single value.
OLAP databases are used for:
snowflake schemas with historical data
A graph database is used to:
store hierarchical data, such as organizational charts that have nodes and edges.
Column-family databases are used to:
store unstructured, tabular data comprising rows and columns.
Databases ______________________data that can be quickly access and searched
structured
Azure Data Explorer can store
structured or semi-structured data using a columnar database model.
Block blobs are used for files and:
the Archive tier is the least expensive option. The Cool tier is not the least expensive option. Append blobs cannot be update
Each row in a table in a relational database has:
the same number of columns, can have a primary key build on multiple columns, and must have a data type.
A document database uses:
unstructured data such as JSON, and is optimized for retrieval, not CRUD operations.
Datastores feature
unstructured or semi- structured data to housing data
Which two services allow you to pre-process a large volume of data by using Scala?
Databricks and the Spark pool in Azure Synapse Analytics run data processing for large amounts of data by using Scala.
In a stream processing architecture, what can you use to persist the processed results as files?
Data Lake Storage Gen2 can be used to store files
Describe features of analytical workloads
Data files may be stored in a central data lake for analysis An extract, transform, and load (ETL) process copies data from files and OLTP databases into a data warehouse that is optimized for read activity Data in the data warehouse may be aggregated and loaded into an online analytical processing (OLAP) model, or cube The data in the data lake, data warehouse, and analytical model can be queried to produce reports and dashboards
Row Store
Data is organized in rows Traditional relational databases are row-stores Good for general purpose databases Suited for Online transaction processing (OLTP) Great when needing all possible columns in a row is important in a query Not the best at analytics or massive amounts of data
Column Store
Data is organized into columns Faster at aggerating values for analytics NoSQL store or SQL-Like databases Great for vast amount of data (millions of records) Suited for Online analytical processing (OLTP) Great when you only need a few columns
Which type of data store uses star schemas, fact tables, and dimension tables?
Data warehouses use fact and dimension tables in a star/snowflake schema
Online Transactional Processing (OLTP)is stored in a database that is optimized for online transactional processing (OLTP) operations that support applications:
Transactions can be read or write: Data is stored using transactions