DP-900: Azure Data Fundamentals

¡Supera tus tareas y exámenes ahora con Quizwiz!

How do you create Azure Table Storage table?

Create resource>>StorageAccount>> Set settings and deploy>> select table in overview of that storage account>> + table

How can I get more control?

Install db on virtual machine in azure

What are the advantages of using Azure Table Storage?

It is simpler to scale. Same time to insert data in empty table or table with billions of entries. Can hold up to 500 TB of data. Row insertion is fast Data retrieval is fast if you specify partition and row keys Table can hold semi structured data.

Drawback to normalization?

It isn't as intuitive, because of all the joins to get data back together queries may be more expensive.

Some formats of semi structured data

JSON, Avro (row based format), ORC Optimized Row Columnar Format (columns rather than rows), Parquet another columnar format

Eventual Consistency

Least Consistent Changes will sync to all locations eventually. Could appear out of order.

What is a clustered index?

Like an index afore mentioned except the table is sorted by that index, no copy needed, table is sorted by the clustered index.

What are cognitive analytics?

ML recommendations based on a model. AI

What is Azure Cosmos DB?

Multi-model No SQL database management system. Manages data as partitioned set of documents. Which is a collection of fields identified by a key. It is highly scalable. Cosmos DB automatically allocates space in a container for your partitions and each partition can grow up to 10GB in size. Indexes created and maintained automatically

How is data in a relational database organized?

Rows and Columns

What are key characteristics of storing non relational data?

Store the data without structure. No schema imposed on the data. You store and consume it in its natural format.

Describe Streaming Data

Streaming data is continuous no start or stop. Examples would be IOT, logs, etc. Other examples: sensor data, event or IOT hub, blob storage for logs, apache kafka, netflix, youtube, course video.

How many ways can you classify data?

Structured (tabular, csv represents rows and columns) Semi-Structured (JSON, key-value, graph db) Unstructured (audio or video files)

What is provisioning?

The act of setting up the database is provisioning.

What is change feed option in Blob Storage Service?

The change feed for a blob provides an ordered, read-only, record of the updates made to the blob

`Can you combine analysis services with Synapse Analytics?

Yes, example use Synapse to clean and shave raw data into well processed data. Then use Analysis services to do detailed interrogation of data. Synapse is a more powerful processor, will process the data faster and spit out a smaller, meaner and cleaner data set that Analysis Services can use.

When to use block blob vs page blob

block blob - infrequent changes page blob - when require random read/write access

In Azure Databricks, how do you change the language a cell uses?

first line %language example: %python

What is Azure table storage key comprised of?

partition identifier + row key Items in same partition are stored in row key order

What are snapshots

read only version of a blob at a particular point in time

What is azure data lake storage?

repo for large quantity of raw data. Think of it as a staging point for your ingested data before it's massaged and converted into format suitable for analytics

You can define several levels of access in Azure for your data, is this true?

yes, read-only access, read-write access, owner privilege.

How does azure data services cost you?

(Base price covers infrastructure and licensing) + (Administration costs)

What is ELT process?

(ELT) a form of data processing that stands for extract, load then transform. Example of how this would be done Extract data and load it to data lake then perform transformations with databricks and move to synapse data warehouse Data is available before transformations

Azure Relational Database Options

(IAAS) Infrastructure as a service - (PAAS) Platform as a service (SAAS) Software as a service

What is IaaS and when should you use it?

(Iaas) Infrastructure as a service. IaaS is when you create a virtual infrastructure in the cloud that mirrors the way an on premises data center may work. You are responsible for day-to-day adminstrative operations such as configuration, backups patching and restoring data. Best for migrations where you want to lift your on premise system into the cloud without any changes. May have to make small configuration changes but that is it

What is SQL Managed Instance

(Isn't used much) Close to 100% compatability to on premise. Fully managed service (Azure manages) 4-80 vCores 32GB to 8TB

What are two kinds of data processing solutions?

(OLTP) Online Transaction Processing (OLAP) Online Analytics Processing

What is a security principle?

An object that represents a user, group, service, or managed identity that is requesting access to Azure resources.

How many access tiers does blob storage provide

3 Hot teir - default, used for blobs that are accessed frequently, high performance Cool Tier - lower performance, reduced storage charges compared to hot tier, used for data accessed infrequently. Can migrate cool tier back to hot tier. Archive Tier - lowest cost, increased latency. intended for historical data that mustn't be lost. Can switch to cool or hot tier, but the blob must be rehydrated first.

What is the availability of Azure Data Services?

99.99 (4 9s)

Eventual Consistency

A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually. Minimizes latency but will lead to temporary inconsistent data between locations.

What is Azure Data Factory?

An orchestration tool (like ssis) Data integration service retrieve data and convert it to format that you process.

What is Azure Synapse Analytics?

An analytics engine, designed to process large amounts of data very quickly. Ingest data from external sources such as flat files, Azure Data Lake, or other DB managment systems. Transform and aggregate this data into a format suitable for analytics processing, perform queries, reports, graphs, and charts.

What is a strong consistency option?

All writes are only visible to clients after changes are confirmed as written successfully to all replicas. Unavailable if you need to distribute your data across multiple global regions

A ___ is a collection of preset, ready-made visuals and reports that are shared with an entire organization

An App

What is Azure Databricks?

Apache spark environment running on azure to provide big data processing, streaming and machine learning

A collection of ready-made visuals prearranged in dashboards and reports is called what in PowerBi

App

What is the advantage of using multi-region replication with Cosmos DB

Availability is increased

What is a command-line utility optimized for transferring large files (and blobs) between your local computer and Azure File storage?

AzCopy

Use __________ for smaller volumes of data (a few terabytes) and multiple sources that can be correlated.

Azure Analysis Services

What are the roles in the world of data?

Azure DBA Role Data Engineer Data Analyst

What are several tools that you can use to provision services

Azure Portal Azure Command-line interface (cli) Azure Powershell (ARM) Azure Resource Manager Templates

What are the options for protecting Azure resources such as storage account, Azure cosmos DB, etc.

Azure Private Endpoint Firewalls and virtual networks Configure Authentication Configure Access Control Configure advanced security

What are currently available relational databases on azure?

Azure SQL Database Azure Database MySQL Azure Database PostgreSQL Azure Database MariaDB

Use ____________ for very high volumes of data (multi-terabyte to petabyte size) and very complex queries and aggregations.

Azure Synapse Analytics

_________ leverages a massively parallel processing (mpp) architecture.

Azure Synapse Analytics

What is Polybase in the above design?

Azure Synapse Analytics uses tech named polybase. Polybase enables you to retrieve data from relational and non-relational sources, such as delimited text files, Azure blob storage, data lake storage. You can save the data read in as sql tables within the synapse analytics service. Converts read data from all sources into sql tables in synapse.

What is the Azure service that implements the NoSQL key-value model?

Azure Table Storage

What is Azure Table Storage?

Azure Table Storage is a scalabel key-value store held in the cloud. You create a table using an Azure storage account. In Azure table storage items are referred to as rows and fields are known as columns.

What is Azure RBAC?

Azure role-based access control.

What is Azure Table Storage?

Azure table storage is a scalable key-value store held in the cloud. You can use it to store semi-structured data.

What is Batch Data?

Batch data includes CSV, TSV, Json, XML, Parquet, Blob files, another database, cache for offline viewing

What is batch data?

Batch data is any load of data that has a beginning and end, is not continuous.

Modern Data Warehouse

Big Data Collected at escalating volumes Higher velocities Greater variety of formats May be historical or streamed from source Have to use this data to provide insight

What is the cheapest way to store data in Azure?

Blob Storage

What are different types of blobs that Azure Blob Service supports?

Block Blobs - blocks of up to 100 mb grouped for a maximum size of over 4.7TB Page Blobs - collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations. Can hold up to 8 TB of data. Azure uses these to implement virtual disk storage for virtual machines Append Blobs - block blob optimized to support append operations. can only add blocks to end of append blob. Can't update or delete existing blocks. Each block can vary in size up to 4 MB, max size of append b lob is just over 195 GB

What should you do if you want to maintain and restore earlier versions of a blob.

Versioning

What is the purpose of data ingestion?

Capture data as quickly as possible

What is Azure SQL DB

Close to 100% compatibility to on premise Many options for provisioned and serverless DB Pay for performance or pay for hardware 2-80 vCores 5GB to 4TB Starting at $5 per month Uses SQL Server Engine Underneath

What are disadvantages of using Azure Table Storage?

Consistency needs to be given consideration as transactional updates across multiple entities aren't guaranteed. No referential integrity difficult to filter, queries based on non-key fields result in full table scans.

Azure Synapse Analytics includes a control node and a pool of compute nodes. Explain the architecture

Control node is brain of architecture. Its the front end that interacts with all applications. MPP engine runs on control node to optimize and coordinate parallel queries. When you submit a request the control node transforms it into smaller requests that run against distinct subsets of the data in parallel. Compute nodes provide the computational power. The data to be processed is distributed evenly across the nodes. Users and applications send processing requests to the control node, node sends queries to compute nodes. results are sent to control node and combined to result.

Give me example of Non-Relational DB options?`

Cosmos DB, Table Storage (up to 5PB), Blob (up to 5 PB, cheapest), File Storage.

SQL Database supports two purchasing models, what are they?

DTUs and vCore

Two important stages of data analytics

Data Ingestion and Data Processing

What are the tasks that are involved in an Analytical System?

Data Ingestion: Process of ingesting data, could be batch or stream data. Also could be structured, semi-structured or Unstructured data. Data Transformation/Processing: Process of organizing and cleaning data, may also add calculations. This results in a usable data set for analytics and insight. Data Querying: Query data for insight. Data Visualization: Allows you to understand data more intuitively, tool to better understand data and share data.

What is data?

Data is a collection of facts such as numbers, descriptions and observations used in decision making.

Difference between data lake and data warehouse

Data lake is raw unstructured data, data warehouse is structured and organized

What are Prescriptive Analytics?

Describe what to do to fix problem. We need to drop prices 10% to encourage customers to make bigger purchases since less people are outside.

What are Predictive Analytics?

Describe what will happen. Next time we have a pandemic we can expect revenue to drop 10% based on history.

What are Diagnostic Analytics?

Describes why it happened. For example revenue is down 10% year over year due to corona virus which resulted in less purchases industry wide.

What are the 5 types of Analytics?

Descriptive (What Happened), Diagnostic (Why it Happened), Predictive (What will Happen), Prescriptive (What should I do), Cognitive (Machine learning predictions based on model)

What are Descriptive Analytics?

Descriptive Analytics describe what happened. For example revenue is down 10% year over year.

Which consistency option of CosmosDB provides the lowest latency and least consistency?

Eventual Consistency

ETL

Extract, transform, load. Perform extract and transform logic before loading data to be available. Data is not available before transformations are performed

What is the overarching principle for network security in Azure SQL Datbase?

Grant minimum permissions to successfully complete task.

Why use SQL Server in a VM?

Guaranteed to be compatible to on premise sql server. No data limitations (run above 4 TB) Pay for server and licensing not per DB (could be pos/neg)

Some use cases for non relational data?

IOT and telematics Retail and marketing - microsoft uses cosmos DB for its own ecommerce platforms (windows store and xbox live) Gaming - requires low latency Web and Mobile Application - well suited for modeling social interactions and more

Scenarios where CosmosDB is Suitable?

IOT and telematics Retail and marketing Gaming Mobile applications

Session Option

If application makes number of changes, they'll be visible to that application and in order. Other apps may see old data, although any changes that are seen will be in order as they would be in consistent prefix

Why creating indexes make inserts or updates or deletes slow?

Index requires more memory which incurs additional processing charges for these statements.

What is Azure Blob Storage?

Store massive amounts of unstructured data in the cloud

How Azure handles Security?

Must have public network access and add IP to get through firewall. If you don't have public access no one can access. Only servers on azure network could access. TLS version 1.2 min Must check allow other azure services to access server if you want diff apps to access Can list ip range in firewall, or list ips individually Also can give access through the firewall through virtual network. Database permissions managed at database level not firewall and security level

What are use cases of Azure SQL Database?

Need modern stable sql server features Applications require high availability Systems w/variable load and need to scale quickly

If you set up your sql Database networking with "public endpoint" without any other actions, which type of user can connect to the database?

None, you must configure firewall to allow anyone in. Must update the firewall access list.

Limitations to Azure Data Services

Not all features of a database management system are available because azure manages many of them for you. You have no control over platform on which services run

What are NoSQL databases?

Not only SQL data stores. Generally fall into 4 categories - Key Value: very fast - Graph - Columnar - Document

What are ACID principles?

OLTP must adhere to ACID priciples Atomicity - each transaction is single unit fails completely or succeeds completely. Consistency - A consistent database should never lose or create data from original transaction. It is consistent through out. Isolation - Ensures that concurrent execution of transactions leaves the database in the same state it was obtained. Durability - Ensures once committed transaction remains committed even if there is a system failure or crash.

PowerBI consists of three main elements. What are those elements?

Power BI Desktop Power BI Service Power BI Mobile

Which power BI tool helps build paginated reports?

Power BI Report Builder

Which category (IaaS,PaaS,SaaS) does azure data services fall into?

PaaS

What is PaaS and when should you use it?

PaaS stands for Platform-as-a-service. Don't have to create the infrastructure, azure does this for you. You specify resources that you require and azure automatically creates the necessary virtual machines, networks and other devices for you. PaaS is more scalable allows you to scale without purchasing your own hardware.

Which component of an Azure Data Factory can be triggered to run data ingestion tasks?

Pipelines

What are point queries and range queries in Azure Table Storage?

Point Query - retrieves a single row Range Query - returns a range of rows

Which port doze Azure SQL Database use?

Port 1433, must be opened by your networks firewall if you want to connect Azure SQL Database server.

Common flow of activity in PowerBI

PowerBI Desktop (create report)>>(Share to) PowerBI Service and PowerBi Mobile

What is the common flow of powerBI

PowerBI Desktop create report Publish to PowerBI Service create visuals (reports,dasboards,etc.) Share dashboards with others View and interact through mobile apps

What is normalization?

Process of separating data into separate tables that can be connected. Normalization reduces the number of rows and storage as you have less duplicated information. It does require more joins and connection to get the data back together.

Use cases of Azure Table Storage?

Product Catalog Logging

What is change feed on a blob?

Provides ordered, read only, record of the updates made to a blob. You can use it to monitor blob changes.

Session

Read your own writes If application makes a number of changes they all be visible to the application in that order. Other apps may see old data though any changes that appear will appear in order. Sometimes known as read your own writes.

What are the benefits of normalization

Reduces data duplication Reduces common typos and error on inputs

What is Business Intelligence?

Refers to the technologies, applications, and practices for the collection, integration, analysis, and presentation of business information. The purpose of business intelligence is to support better decision making.

When you configure CosmosDB for replication what is the default behavior?

Region which created account supports write ops, replicas are all read-only.

Describe characteristics of relational data

Relational data is structured, has a schema, is rigid. Databases composed of tables with rows and columns. Data integrity based on keys, datatypes, relations. Components include: - tables - views - Primary: unique row - Foreign Keys: child parent relationship Schema - layout of the database including table names, column names and their data types Databases enforce integrity

What is replication and how many options we have when provisioning a storage account?

Replication - Azure storage account replicated 3x in region you specify as primary location for account Locally Redundant Storage (LRS) - Copies data synchronously three times within single physical location in region. LRS is least expensive, but isn't recommended for applications that require high availability Geo Redundant Storage (GRS) - copies data synchronously 3x in single physical location (primary). Then copies your data asynchronously to a single physical location in the secondary region, protects you against regional outages. Read-Access geo-redundant storage (RA-GRS) - Similar to GRS but you only have read access in secondary location. Used to recover from failure and reduce latency for customers that need read access to data in that area.

Reporting vs Visualization

Reporting - informational summaries Visualization - graphs and charts

What are the different APIs that Cosmos supports

SQL API Table API MongoDB API - document Cassandra API - columnar Gremlin API - graph

Synapse spark vs Synapse SQL Pool

SQL Pool for sql tables Spark for datascience AI stuff

Azure Synapse Analytics supports two computational models. What are those?

SQL Pools Spark Pools

What are Microsoft Azure's 4 Relational Databases?

SQL Server in VM SQL Managed Instance Azure SQL Database Azure SQL Database for MySQL, PostgreSQL or Maria DB

What are the 3 elements of a RBAC role?

Security principal, role definition, scope Security principal - an object that represents a user, group service principal, or managed identity that is requesting access to Azure resources. Role definition - owner: has full access - contributor: near full, can't delegate access to others - reader: can view existing Azure resources - user access administrator - lets you manage user access to Azure resources Scope - lists set of resources access applies to

Use cases for Azure Blob Storage

Serve images Store files Stream video or audio Store data for backup, restore, disaster, or archive Store data for analysis by an on-premises or Azure-hosted service

What are the features of Azure File Storage?

Share up to 100 TB of data in a single storage accfount max size of file is 1TB supports 2000 concurrent connections per shared file upload through azure portal, use tools such as AZCopy utility. Azure file sync to sync locally

What is the difference between analysis services and synapse analytics?

Significant overlap, but it's suited for processing on a smaller scale.

What is Azure HD Insight?

Similar to Synapse but the nodes are running the spark processing engine rather than Azure SQL Database

What are the options available when selecting Azure SQL Database?

Single Database Elastic Pool - multiple databases can share the same resources such as memory, data storage space. Resources are referred to as a pool. Helpful if you have databases whose resources vary over time. Managed Instance - runs fully controllable instance of sql server in the cloud.

Consistent Prefix

Slow right order Changes will appear in order though there may be a delay before they become visible.

Which feature should you enable if you want to recover a blob that has been removed or overwritten by accident?

Soft Delete

Non-Relational databases allow you to store data in a format that more closely matches the original structure. What is the disadvantage to this?

Some of the data is duplicated. This increases storage requirement and makes maintenance more complex.

In a Spark pool, nodes are replaced with ______?

Spark clusters

How many tiers are there for Azure File Storage?

Standard & Premium

What is bounded staleness option?

There is a lag between writing and then reading the updated data. You specify this staleness either as a period of time or number of previous versions the data will be inconsistent for.

Consistent Prefix Option

This option ensures that changes will appear in order though there may be a delay before they are visible

When to use Azure Synapse Analytics?

To perform complex queries and aggregations

Azure Table Storage is intended to support very large volumes of data up to several hundred TBs in size? T/F?

True

Azure data factory is essentially an extension of Azure Blob storage, organized as a near-infinite file system. True?

True

Cosmos DB guarantees less then 10 ms latencies for reads and rights at the 99th percentile around the world. T/F

True

Non relational systems such as azure cosmos db (non relational) support indexing when the sstructure of the indexed data can vary from record to record? T/F?

True

T/F. A container provides a convenient way of grouping blobs, you can organize blobs in hierarchy of folders inside a container.

True

T/F. Before you can use AZCopy you need to create a shared access signature?

True

You control who can read and write blobs inside a container at the container level? T/F?

True

You need to create a storage account before creating an Azure Table Storage? T/F?

True

What is shared Access Signature?

Use it to grant limited rights to resources in an azure storage account for a specified time period. Enables applications to access resources such as blobs and files, without requiring that they're authenticated first.

When might you use Polybase?

Use it when you are ingesting data with Azure Synapse

How to migrate from hot tier to archive after 6 months?

Use lifecycle management policies. Automatically move from hot>>cool>>archive as something ages

What does VLAN and ACL stand for?

VLAN - Virtual Local Area Network ACL - Access Control List

Building blocks of Power Bi

Visualization, datasets, reports, dashboards, tiles

What are the basic building blocks of powerBI?

Visualizations - visual representation of data, sometimes called visuals DataSets - collection of data that PBI uses to create visualization Reports - collection of visualization spanning 1+ pages Dashboards - single page collection a visualizations Tiles - single visualization

When use managed instance?

When all you want is lift and shift.

What is an index?

When you create an index in the database, you specify the column to be indexed. An index contains a copy of this column data in a sorted order with pointers to the corresponding rows in the table, when the user runs a query that specifies this column in the where clause the database system can use the index to fetch the data more quickly then scanning through the entire table row by row.

Strong

Writes are only visible to clients after changes are confirmed as written successfully to all replicas. Unavailable globally

Is a csv file an example of unstructured data?

Yes

What are potential disadvantages to SQL Server in a VM?

You have to do all updates, pick install your sql server version. You manage everything.

Bounded Staleness

lag between writing and reading the updated data. specify staleness as period of time, or number of previous versions the data will be inconsistent for

Consistency Spectrum

low latency, least consistent --> high latency, most consistent Eventual --> Strong


Conjuntos de estudio relacionados

APES: Air Pollution and Climate Change

View Set

Life and Health: Guaranteed Missed Questions

View Set

CompTIA A+ 1101 section 2 networking part 3

View Set

EAQ fluid and electrolyte balance

View Set

Convection and Mantle: Online Content, Practice and Quiz

View Set

CH3-Analyzing Transactions Using T-Accounts

View Set