Azure Data Fundamentals

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Describe stream processing

Process that allows you to almost instantaneously from one source (i.e. device) to another (i.e. log file).

Describe batch data

Running process to completion one after another, at once. Batch processes are not interactive.

key-value store

Similar to a relational table, except that each row can have any number of columns

Analytical workloads

Typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels

SaaS

Typically specific software packages that are installed and run on virtual hardware in the cloud

lifecycle management policy

automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is used less frequently

Bar and column charts

enable you to see how a set of variables changes across different categories

ELT

Extract, Load, and Transform. The process differs from ETL in that the data is stored before being transformed

Azure Database for MySQL

is a PaaS implementation of MySQL in the Azure cloud, based on the MySQL Community Edition

entity

is described as a thing about which information needs to be known or held.

Block blobs

is handled as a set of blocks. Each block can vary in size, up to 100 MB

Page blobs

is organized as a collection of fixed size 512-byte pages

Archive tier

provides the lowest storage cost, but with increased latency.

Describe a bar chart

representation of categorical data using rectangles

Describe a pie chart

representation of data using a circle divided into proportional slices

Avro

row-based format. It was created by Apache. Each record contains a header that describes the structure of the data in the record.

Power BI visualization

a visual representation of data, like a chart, a color-coded map, or other interesting things you can create to represent your data visually

scaling

act of increasing (or decreasing) the resources used by a service

data warehouse

also stores large quantities of data, but the data in a warehouse has been processed to convert it into a format for efficient analysis.

Atomicity

guarantees that each transaction is treated as a single unit, which either succeeds completely, or fails completely

Synapse Spark pool

is a cluster of servers running Apache Spark to process data. You write your data processing logic using one of the four supported languages: Python, Scala, SQL, and C# (via .NET for Apache Spark). Spark pools support Azure Machine Learning through integration with the SparkML and AzureML packages.

Synapse SQL pool

is a collection of servers running Transact-SQL. Transact-SQL is the dialect of SQL used by Azure SQL Database, and Microsoft SQL Server. You write your data processing logic using Transact-SQL.

SQL Database server

is a logical construct that acts as a central administrative point for multiple single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups

Synapse Pipelines

is a logical grouping of activities that together perform a task.

NoSQL

is a rather loose term that simply means non-relational.

Azure Table Storage

is a scalable key-value store held in the cloud. You create a table using an Azure storage account.

Range index

is based on an ordered tree-like structure

transactional system (OLTP)

is often what most people consider the primary function of business computing, it records transactions

dot plot chart

is similar to a bubble chart and scatter chart, but can plot categorical data along the X-Axis

Provisioning

is the act of running series of tasks that a service provider, such as Azure SQL Database, performs to create and configure a service

Hot tier

is the default. You use this tier for blobs that are accessed frequently

Azure Cosmos DB

schema-agnostic database that allows you to iterate on your application without having to deal with schema or index management. Automatically indexes every property for all items in your container without having to define any schema or configure secondary indexes

scatter chart

shows the relationship between two numerical values

Analysis

You typically use batch processing for performing complex analytics. Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.

Azure Database for PostgreSQL

a PaaS implementation of PostgreSQL in the Azure Cloud. This service provides the same availability, performance, scaling, security, and administrative benefits as the MySQL service.

Azure HDInsight

a big data processing service, that provides the platform for technologies such as Spark in an Azure environment.

Change Feed

a blob provides an ordered, read-only, record of the updates made to a blob.

Azure Data Factory

a cloud-based data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale

Power BI datasets

a collection of data that Power BI uses to create its visualizations.

Power BI report

a collection of visualizations that appear together on one or more pages

column family database

a column family database can appear very similar to a relational database, real power lies in its denormalized approach to structuring sparse data

Cassandra API

a column family database management system

distributed database

a database in which data is stored across different physical locations

Azure Cosmos DB

a multi-model NoSQL database management system

Snapshots

a read-only version of a blob at a particular point in time.

data lake

a repository for large quantities of raw data.

Azure Virtual Network

a representation of your own network in the cloud.

Azure Blob storage

a service that enables you to store massive amounts of unstructured data, or blobs, in the cloud

Power BI tile

a single visualization on a report or a dashboard

document key

a unique identifier for the document

view

a virtual table based on the result set of a query

Parquet

another columnar data format. It was created by Cloudera and Twitter. Contains row groups. Data for each column is stored together in the same row group. Each row group contains one or more chunks of data.

Describe cognitive analytics

applying human like intelligence to tasks

Treemaps

are charts of colored rectangles, with size representing the relative value of each item

Cognitive analytics

attempts to draw inferences from existing data and patterns, derive conclusions based on existing knowledge bases, and then add these findings back into the knowledge base for future inferences--a self-learning feedback loop

unstructured data

audio and video files, and binary data files might not have a specific structure

stream processing

each new piece of data is processed when it arrives

SQL Database managed instance

effectively runs a fully controllable instance of SQL Server in the cloud. You can install multiple databases on the same instance. You have complete control over this instance, much as you would for an on-premises server

Line charts

emphasize the overall shape of an entire series of values, usually over time

Spatial indices

enable efficient queries on geospatial objects such as - points, lines, polygons, and multipolygon. used on correctly formatted GeoJSON objects. Points, LineStrings, Polygons, and MultiPolygons are currently supported

Azure File Storage

enables you to create files shares in the cloud, and access these file shares from anywhere with an internet connection

Azure Database Migration Service (DMS)

enables you to restore a backup of your on-premises databases directly to databases running in Azure Data Services

in-place updates

enabling an application to modify the values of specific fields in a document without rewriting the entire document

Consistency

ensures that a transaction can only take the data in the database from one valid state to another

Isolation

ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially

Data processing solutions

fall into one of two broad categories: analytical systems, and transaction processing systems

Describe diagnostic analytics

form of analytics that answers the question, "Why did it happen?"

Filled map

geographical data, you can use a filled map to display how a value differs in proportion across a geography or region

Owner privilege

gives full access to the data including managing the security like adding new users and removing access to existing users.

Read/write access

gives users the ability to view and modify existing data.

Durability

guarantees that once a transaction has been committed, it will remain committed even if there's a system failure such as a power outage or crash

Cool tier.

has lower performance and incurs reduced storage charges compared to the Hot tier

Prescriptive analytics

helps answer questions about what actions should be taken to achieve a goal or target

Descriptive analytics

helps answer questions about what has happened, based on historical data

Predictive analytics

helps answer questions about what will happen in the future

Diagnostic analytics

helps answer questions about why things happened

index

helps you search for data in a table

Gremlin API

implements a graph database interface to Cosmos DB.

Composite indexes

increase the efficiency when you are performing operations on multiple fields

Describe descriptive analytics

interpretation of historical data to better understand changes that have occurred

Append blobs

is a block blob optimized to support append operations

Synapse Studio

is a web user interface that enables data engineers to access all the Synapse Analytics tools.

Azure Databricks

is an Apache Spark environment running on Azure to provide big data processing, streaming, and machine learning

Azure Synapse Analytics

is an analytics engine. It's designed to process large amounts of data very quickly

Azure Database for MariaDB

is an implementation of the MariaDB database management system adapted to run in Azure. It's based on the MariaDB Community Edition.

MongoDB API

is another well-known document database, with its own programmatic interface

Data Querying

looking in data for trends, or attempting to determine the cause of problems in your systems

Describe the concepts of data processing

manipulation of data by a computer

Read-only access

means the users can read data but can't modify any existing data or create new data.

modern data warehouse

might contain a mixture of relational and non-relational data, including files, social media streams, and Internet of Things (IoT) sensor data

Power BI dashboard

must fit on a single page, often called a canvas (the canvas is the blank backdrop in Power BI Desktop or the service, where you put visualizations).

Optimized Row Columnar format (ORC)

organizes data into columns rather than rows

clustered index

physically reorganizes a table by the index key. This arrangement can improve the performance of queries still further, because the RDBMS system doesn't have to follow references from the index to find the corresponding data in the underlying table

Business Intelligence

refers to technologies, applications, and practices for the collection, integration, analysis, and presentation of business information. The purpose of business intelligence is to support better decision making

document database

represents the opposite end of the NoSQL spectrum from a key-value store. Each document has a unique ID, but the fields in the documents are transparent to the database management system.

Single Database

resource type creates a database in Azure SQL Database with its own set of resources and is managed via a server

Azure database administrator

responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions built on Azure data services and SQL Server

Elastic Pool

similar to Single Database, except that by default multiple databases can share the same resources, such as memory, data storage space, and processing power.

Azure Database for PostgreSQL single-server

single-server deployment option for PostgreSQL provides similar benefits as Azure Database for MySQL. You choose from three pricing tiers: Basic, General Purpose, and Memory Optimized. Each tier supports different numbers of CPUs, memory, and storage sizes—you select one based on the load you expect to support

ETL

stands for Extract, Transform, and Load. The raw data is retrieved and transformed before being saved

Data Visualization

the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to spot and understand trends, outliers, and patterns in data

Soft delete

This feature enables you to recover a blob that has been removed or overwritten, by accident or otherwise.

Table API

This interface enables you to use the Azure Table Storage API to store and retrieve documents.

SQL API

This interface provides a SQL-like query language over documents, enable to identify and retrieve documents using SELECT statements

Azure Database for PostgreSQL Hyperscale

Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to support large database loads. Your database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key. Consider using this deployment option for the largest database PostgreSQL deployments in the Azure Cloud.

Disadvantages of batch

The time delay between ingesting the data and getting the results. All of a batch job's input data must be ready before a batch can be processed. Even minor data errors, such as typographical errors in dates, can prevent a batch job from running.

Azure Data Services

These services are a series of DBMSs managed by Microsoft in the cloud. Each data service takes care of the configuration, day-to-day management, software updates, and security of the databases that it hosts. All you do is create your databases under the control of the data service.

batch processing

Arriving data elements are collected into a group. The whole group is then processed at a future time

Azure Analysis Services

Enables you to build tabular models to support online analytical processing (OLAP) queries

Versioning

can maintain and restore earlier versions of a blob.

matrix

visual is a tabular structure that summarizes data

Data Size

Batch processing is suitable for handling large datasets efficiently. Stream processing is intended for individual records or micro batches consisting of few records.

PaaS

Allows you to specify the resources that you require (based on how large you think your databases will be, the number of users, and the performance you require), and Azure automatically creates the necessary virtual machines, networks, and other devices for you

Data Scope

Batch processing can process all the data in the dataset. Stream processing typically only has access to the most recent data received, or within a rolling time window

Describe differences between batch and stream data

Batch data requires a dataset to be collected over time, stream data depends on an analytic tool that consumes data.

data analyst

Enables businesses to maximize the value of their data assets, responsible for designing and building scalable models, cleaning and transforming data, and enabling advanced analytics capabilities through reports and visualizations.

IaaS

Enables you to create a virtual infrastructure in the cloud that mirrors the way an on-premises data center might work. You can create a set of virtual machines, connect them together using a virtual network, and add a range of virtual devices

Describe data visualization

Field that deals with the graphical representation of data.

Data

Is a collection of facts such as numbers, descriptions, and observations used in decision making.

Semi-structured data

Is information that doesn't reside in a relational database but still has some structure to it. Examples include documents held in JavaScript Object Notation (JSON) format.

Advantages of batch

Large volumes of data can be processed at a convenient time. It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.

normalized

Splitting tables out into separate groups of columns

Performance

The latency for batch processing is typically a few hours. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds.

data engineer

collaborates with stakeholders to design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads

Azure Data Lake Storage

combines the hierarchical directory structure and file system semantics of a traditional file system with security and scalability provided by Azure

relational database

comprises a set of tables. A table can have zero (if the table is empty) or more rows. Each table has a fixed set of columns. You can define relationships between tables using primary and foreign keys, and you can access the data in tables using SQL

Data analytics

concerned with examining, transforming, and arranging data so that you can study it and extract useful information

container

content is projected as a JSON document, then converted into a tree representation.

Azure Data Factory

described as a data integration service

analytical system (OLAP)

designed to support business users who need to query data and gain a big picture view of the information held in a database

key influencer chart

displays the major contributors to a selected result or value

Data Ingestion

the process of capturing the raw data

Reporting

the process of organizing data into informational summaries to monitor how different areas of an organization are performing

Describe predictive analytics

the use of historical data, statistical models and machine learning techniques to identify the likelihood of future outcomes.

Data Transformation/Data Processing

to do some cleaning operations and remove any questionable or invalid data, or perform some aggregations such as calculating profit, margin, and other Key Performance Metrics (KPIs)

Describe ETL

type of data integration refers to extract, transform and load to blend data from multiple sources

Describe ELT

type of data integration, takes advantage of the target system to do the data transformation

graph database

used to store and query information about complex relationships, contains nodes (information about objects), and edges (information about the relationships between objects).

Describe prescriptive analytics

using machine learning techniques to help businesses decide on a course of action to taken

non-relational system

you store the information for entities in collections or containers rather than relational tables. Two entities in the same collection can have a different set of fields rather than a regular set of columns found in a relational table


Kaugnay na mga set ng pag-aaral

chap 40 child with neuromuscular

View Set

Finance Exam Wrong Answered Questions Chapter 4

View Set

Mod 8 SLAAC and DHCPv6 HW Homework in Homework

View Set

Module 5 MGMT 417 Affirmative Action

View Set

3rd trimester practice exam help

View Set

GBS 151 Intro to business lesson 1 - 6

View Set

VSIM Josephine Morrow Pre-Sim & Post-Sim Answers

View Set

capitulo 8 human anatomy and Physiology examen 2

View Set

French B IB p.24 Le language muet de l'habit

View Set