DP-900

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What tool would you use for real time visualization of streamed data?

Stream analytics

What open source big data processing technologies does HDInsight support?

- Apache Spark - Apache Hadoop - Apache HBase - Apache Kafka - Apache Storm

Non-Relational Databases

- Key-value databases - Document databases, key value with value being json document - Column family, which stores tabular data comprisong of rows and columns but you can divide colums into groups known as column family - Graph databases, which stores entities as nodes withl inks to define relationships between them

Apache Avro

Apache Avro is a row-based optimized file format where each record contains a header that describes the structure of data in the record. Header is JSON, data is binary. Good for compressing data and minimizing storage and network bandwidth

Benefits of SQL Managed Instance

Automated backups, software update management, other maintenance done by Azure, near 100% compatibility with on-prem SQL Server, reduces management

What tool ingests big data and deposits it into a Data Store?

Azure Data Factory

What data services are PaaS?

Azure SQL Database Cosmos DB Azure DB for Open Source (SQL, MariaDB, Postgres, MySQL)

What provides an interface to interact with Azure Synapse Analytics and allows the creation of interactive notebooks in which synapse code and markdown content can be combined?

Azure Synapse Studio

Data engineers use __to host data lakes - blob storage with a hierarchical namespace that enables files to be organized in folders in a distributed file system

Azure storage

Azure data factory

Big Data Orchestration tool that consists of pipelines that are triggered either manually, scheduled, etc., and collects data from a source and transforms it through various tasks and then sinks it into another data source

Azure datalake sits on one kind of storage?

Blob storage

Data sharding

Breaking up semi structured data into partitions that can be processed by different computes for better performance. EX: Break up data into months of the year

Append blobs size limits and restrictions

Can only add blocks to end of append blob, not update or delete existing blobs Each block up to 4mb Max append blob is 195gb

OLTP (online transaction processing)

Capturing high volume of small transactions constantly coming in and being stored as they are happening. Need fast access, normalize them. SQL Server, Azure DB (Postgres, MySQL, MariaDB)

Cassandra is an example of a ___ model type database

Column

Apache Parquet is an example of what kind of data storage type?

Columnar, where data a is stored as columns. (get all ages by looking at a single column). Parquet is a columnar data file format created by cloudera and twitter, contains row groups. Data for each column is stored together in same row group. Each row group contains one or more chunks of data. File includes metadata that describes set of rows found in each chunk and app can use metadata to quickly locate the correct chunk for a given set of rows then retrieves data in the specified columns for rows. Very efficient compression and encoding shemes.

Data Definition Language (DDL)

Create modify remove tables and other objects in a database such as tables, stored procedures, and views. CREATE ALTER DROP RENAME

Data Manipulation Language (DML)

DBMS language that changes database content, including data element creations, updates, insertions, and deletions

What tool do you use to check compatibility with on-prem SQL server with a migration to managed instance?

Data Migration Assistant (DMA). This tool analyzes your databases on SQL Server and reports any issues that could block migration to a managed instance.

batch processing

Data collected and stored somewhere and processed on an interval, meant for large volumes of data, high latency

Steam processing

Data is processed as it arrives in real time. Need to scale correctly as no resources means data is lost

Types of analysis

Descriptive - Tell me what happened Diagnostic - Why it happened Predictive - What is going to happen based on history Prescriptive - What should I do if I want this outcome? Cognitive - Conclusions based on knowledge All feed into powerbi

SQL and MongoDB are an example of a ___ model type database

Document

Block blobs size and limit

Each block can vary in size up to 100mb Block blob consists of 50k blocks, max size 4.7TB

What does ETL and ELT mean?

Extract load transform Extract transform Load Methods for processing big data

Page blob size and limits

Fixed size 512 byte pages Page blob can hold up to 8tb of data

Azure Synapse Analytics

Fully managed data warehouse with integral security at every level of scale at no extra cost. End to end solution for data analytics, high performance SQL server with data lake flexibility and open source apache spark. Data pipleiens for ingestion and transformation Native support for log and telemetry analytics with azure synapse data explorer pools.

What data structure is good for knowing relationships between data nodes?

Graph, which breaks data into nodes and creates relationship between nodes that can be queried

What tool would you use to migrate existing hadoop based solution to the cloud?

HDInsight

What must you enable to allow a gen2 storage account to be optimized as a datalake for HDinsight, Azure Databricks, Synapse Analytics, and more?

Hiearchical namespace must be enabled for the storage account, irreversible, and create a blob container

Data lakehouse

Hybrid approach that combines features of data lakes and data warehouses. Raw data stores as files in a datalake and relational storage layer abstracts underlying files and exposes them as tables meant for querying.

Data Control Language (DCL)

Manage access to objects in a database by granting, denying, or revoking permissions to specific users or groups GRANT DENY REVOKE GRANT SELECT, INSERT, UPDATE ON Product TO user1;

Azure Data Warehouse

Relational database built on SQL Server where data is stored in a schema optimized for data analytics rather than transactional workloads

What SQL solution is best for "Lift and Shift"?

SQL Server on a VM, IaaS, or SQL MI if you don't want a VM overhead

What does PaaS data options offer?

Scalability High availability natively No OS Access Constant updates and patches

What two modes can SQL Database run in?

Single Database - single database charged per hour and can scale by adding more space memory and processing power Elastic Pool - Purchase a pool of resources (ram space processing power) that you can run multiple database instances on which will use resources in pool and release when no longer needed

Azure Data Lake

a means of storing and analyzing massive amounts of unstructured data. Runs on general purpose v2 storage, block blobs, meant for high performance data access.

TSQL

Transact SQL, create table drop table

Azure data bricks does what in the ELT/ETL process?

Transforms and cleans the data and then deposits it into a structured data sink such as Azure Synapse Analytics

How many logical partitions can be in a Cosmos DB container and how big can they be?

Unlimited, 20gb

partition key

Used to identify and separate data into logical partitions in semi structured data

OLAP (online analytical processing)

Very large amount of volume, stored in data warehouse for analytics such as Synapse. Manipulation of information to create business intelligence in support of strategic decision making


संबंधित स्टडी सेट्स

Business x100 - Chapter 18. Assigned Reading and Pre-class Quiz on Financial Management

View Set

practice ASWB exam 3 - Human Development, Diversity, & Behavior in the Environment

View Set

Government Intervention in International Business - Chapter 7

View Set

Chapter 1. Natural Disasters and the Human Population Questions

View Set

PSY 108 6a) Basic Concepts of Sensation and Perception

View Set

Life, Accident and Health test 2

View Set

BOC Urinalysis final, BOC Molecular Pathology final, Microbiology BOC final, BOC Immunology final, Hematology final, Chemistry BOC final, Blood Bank BOC final

View Set