Azure Data Fundamentals

Ace your homework & exams now with Quizwiz!

what process is inherently a streaming process

data ingestion

common data engineering tools

SQL Server Management Studio

what comprises a relational database

a set of tables

what extensions act like once loaded in the database

built-in features

SQL does NOT provide this prompt.

"are you sure?"

examples of store large, binary data objects

, such as images and video streams. Microsoft Azure virtual machines use blob storage for holding virtual machine disk images. These objects can be several hundreds of GB in size.

characteristics of analytical workloads

- read-only systems - vast volumes of historical data or business metrics - used for data analysis and decision making - can be based on a snapshot at a given point in time, or a series of snapshots

ways that relational databases are used

- track inventories - process ecommerce transactions - manage huge amounts of mission-critical customer information ...and much more

characters start a comment in Transact-SQL.

--

compare Data Lake data to Data Warehouse data

A data warehouse also stores large quantities of data, but the data in a warehouse has been processed to convert it into a format for efficient analysis. A data lake holds raw data, but a data warehouse holds structured information.

What report timing is needed for a typical large-scale business?

A typical large-scale business requires a combination of up-to-the-second data, and historical information.

enables you to connect virtual machines and Azure services together, in much the same way that you might use a physical network on-premises.

A virtual network

Apache Hive

Apache Hive provides interactive SQL-like facilities for querying, aggregating, and summarizing data. The data can come from many different sources. Queries are converted into tasks, and parallelized. Each task can run on a separate node in the HDInsight cluster, and the results are combined before being returned to the user.

Apache Kafka

Apache Kafka is a clustered streaming service that can ingest data in real time. It's a highly scalable solution that offers publish and subscribe features.

a highly efficient data processing engine that can consume and process large amounts of data very quickly. There are a significant number of libraries you can use to perform tasks such as SQL processing, aggregations, and to build and train machine learning models using your data.

Apache Spark

enables you to specify who, or what, can access your resources

Azure AD

what service provides superior security and ease of use over access key authorization?

Azure Active Directory (Azure AD) provides superior security and ease of use over access key authorization. Microsoft recommends using Azure AD authorization when possible to minimize potential security vulnerabilities inherent in using access keys.

You can control access to shares in Azure File Storage using authentication and authorization services available through what services?

Azure Active Directory Domain Services

A service you can use for storing semi-structured data

Azure Cosmos DB

acts as a hub holding clean business data

Azure Synapse services

The process of analyzing streaming data and data from the Internet is known as what?

Big Data analytics.

The term blob is an acronym for

Binary Large OBject.

used in SQL to remove rows

DELETE FROM

When you use this SQL statement, all the rows in that table are lost.

DROP

the two performance models for Azure SQL Database

DTU and vCore

this type of SQL statement is used to create, modify, and remove tables and other objects in a database (table, stored procedures, views, and so on)

Data Definition Language (DDL)

the first part of any data warehousing solution

Data ingestion

the process used to load data from one or more sources into a data store. At that point, the data becomes available for use.

Data ingestion

a reason why organizations don't move their data to the cloud

Data is arguably one of the most valuable assets that an organization has, and some companies aren't willing or able to hand over responsibility for protecting this data to a third party.

What types of data to data warehouses have to handle?

Data warehouses have to handle big data. Big data is the term used for large quantities of data collected in escalating volumes, at higher velocities, and in a greater variety of formats than ever before. It can be historical (meaning stored) or real time (meaning streamed from the source). Businesses typically depend on their big data to help make critical business decisions.

It actively tracks failed logins from IP addresses. If there are multiple failed logins from a specific IP address within a period of time, the IP address is blocked from accessing any resources in the service for a short while.

DoSGuard

the name of the service designed to reduce Denial of Service (DoS) attacks

DoSGuard

How does Azure Synapse Analytics use a clustered architecture?

Each cluster has a control node that is used as the entry point to the system. When you run Transact-SQL statements or start Spark jobs from a notebook, the request is sent to the control node. The control node runs a parallel processing engine that splits the operation into a set of tasks that can be run concurrently. Each task performs part of the workload over a subset of the source data. Each task is sent to a compute node to actually do the processing. The control node gathers the results from the compute nodes and combines them into an overall result.

specifies the table to use

FROM clause

an example of how databases make it easy to store information so it's quick and easy to find

For example, an ecommerce system might use a database to record information about the products an organization sells, and the details of customers and the orders they've placed. A relational database provides a model for storing the data, and a query capability that enables you to retrieve data quickly.

After you've provisioned a resource, you'll often need to configure it to meet the needs of your applications and environment. Give examples of what you might need to configure.

For example, you might need to set up network access, or open a firewall port to enable your applications to connect to the resource. enable network access to your resources prevent accidental exposure of your resources to third parties. use authentication and access control to protect the data managed by your resources.

In CosmosDB, can applications override the default consistency level for individual read operations? However, they can't increase the consistency above that specified on this page; they can only decrease it.

However, they can't increase the consistency above that specified on the Default consistency page; they can only decrease it.

Azure Database for PostgreSQL Hyperscale (Citus)

Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to support large database loads. Your database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key. Consider using this deployment option for the largest database PostgreSQL deployments in the Azure Cloud.

when to use Azure SQL database

If you don't want to incur the management overhead associated with running SQL Server on a virtual machine,

when you can benefit from combine Synapse Analytics with Analysis Services

If you have large amounts of ingested data that require preprocessing, you can use Synapse Analytics to read this data and manipulate it into a model that contains business information rather than a large amount of raw data. The scalability of Synapse Analytics gives it the ability to process and reduce many terabytes of data down into a smaller, succinct dataset that summarizes and aggregates much of this data. You can then use Analysis Services to perform detailed interrogation of this information, and visualize the results of these inquiries with Power BI.

when your connections have a connection policy of Redirect by default

If you're connecting from within another Azure service, such as a web application running under Azure App Service

tasks performed by the Azure SQL Database gateway

It validates all connections to the database servers, to ensure that they are from genuine clients. It encrypts all communications between a client and the database servers. It inspects each network packet sent over a client connection. The gateway validates the connection information in the packet, and forwards it to the appropriate physical server based on the database name that's specified in the connection string.

used to retrieve data from multiple tables

JOIN clause

What language do many document databases use to represent the document structure?

JSON (JavaScript Object Notation)

advantages of batch processing

Large volumes of data can be processed at a convenient time. It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.

multi-tenancy

Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers. Each customer is called a tenant. Tenants may be given the ability to customize some parts of the application, such as the color of the user interface (UI) or business rules, but they cannot customize the application's code.

uses information (data) from Azure Analysis Services or Databricks to generate reports

Power BI

Power BI components

Power BI consists of a Microsoft Windows desktop application called Power BI Desktop, an online SaaS (Software as a Service) service called the Power BI service, and mobile Power BI apps that are available on any device, with native mobile BI apps for Windows, iOS, and Android.

If running stored procedures and scripts depend on features that are restricted by following a PaaS approach, what should you do?

SQL Server on a Virtual Machine

Azure SQL Database options (image)

Single Database Elastic Pool Managed Instance

What is Databricks based on?

Spark

To connect to Azure MySQL Server by using MySQL Workbench,

Start MySQL Workbench on your computer. On the Welcome page, select Connect to Database. In the Connect to Database dialog box, enter the following information on the Parameters tab: Stored connection Leave blank Connection Method Standard (TCP/IP) Hostname Specify the fully qualified server name from the Azure portal Port 3306 Username Enter the server admin login username from the Azure portal, in the format <username><databasename> Password Select Store in Vault, and enter the administrator password specified when the server was created Select OK to create the connection. If the connection is successful, the query editor will open.

standard language used to communicate with a relational database

Structured Query Language (SQL)

the language most relational databases support

Structured Query Language (SQL)

What makes Azure Databricks an ideal platform for performing complex data ingestion and analytics tasks?

The scalability

How is historical data commonly generated?

This historical data can be generated by batch processes at regular intervals, based on the live sales data that might be captured continually.

using the Azure portal to provision

This is the most convenient way to provision a service for most users. The Azure portal displays a series of service-specific pages that prompt you for the settings required, and validates these settings, before actually provisioning the service.

what the row ordering within a partition in Azure Table Storage enables

This scheme enables an application to quickly perform Point queries that identify a single row, and Range queries that fetch a contiguous block of rows in a partition.

Data Migration Assistant (DMA)

To check compatibility with an existing on-premises system, you can install Data Migration Assistant (DMA). This tool analyzes your databases on SQL Server and reports any issues that could block migration to a managed instance.

Two SQL keywords used to update records

UPDATE SET

How are a CosmosDB container and Azure Table Storage table different?

Unlike Azure Table storage, documents in a Cosmos DB partition aren't sorted by ID. Instead, Cosmos DB maintains a separate index.

analyze this PBI sample

What does the report examine? What comparisons are made?

how to centrally manage the identities of database users and other Microsoft services in one central location.

With Azure Active Directory (AD) authentication, you can centrally manage the identities of database users and other Microsoft services in one central location. Central ID management provides a single place to manage database users and simplifies permission management. You can use these identities and configure access to your relational data services.

How do you set up Azure AD?

You add users and other security principals (such as an application) to a security domain managed by Azure AD.

Other psql commands include:

\l to list databases. \dt to list the tables in the current database.

What is Azure Database for MySQL

a PaaS implementation of MySQL in the Azure cloud, based on the MySQL Community Edition

what does an index contain

a copy of the column's data in a sorted order, with pointers to the corresponding rows in the table

What types of service is Azure Data Factory?

a data integration service.

distributed database

a database in which data is stored across different physical locations. It may be held in multiple computers located in the same physical location (for example, a datacenter), or may be dispersed over a network of interconnected computers.

Examples of repositories for ingested data

a file store a document database a relational database

A modern data warehouse might contain what?

a mixture of relational and non-relational data, including files, social media streams, and Internet of Things (IoT) sensor data.

Azure Cosmos DB is what type of database management system?

a multi-model NoSQL database management system

Azure Private Endpoint

a network interface that connects you privately and securely to a service powered by Azure Private Link. Private Endpoint uses a private IP address from your virtual network, effectively bringing the service into your virtual network. The service could be an Azure service such as Azure App Service, or your own Private Link Service.

Spark

a parallel-processing engine that supports large-scale analytics.

transaction

a small, discrete unit of work

entity

a thing about which information needs to be known or held

defines what a user or application can do with your resources after they've been authenticated.

access control

protection provided in addition to authentication and authorization

advanced data security

Apart from authentication and authorization, many services provide additional protection through what?

advanced security.

In SQL, this calculates a single result across a set of rows or an entire table

aggregate function

What is Azure Database for MariaDB?

an implementation of the MariaDB database management system adapted to run in Azure. It's based on the MariaDB Community Edition. The database is fully managed and controlled by Azure. Once you've provisioned the service and transferred your data, the system requires almost no additional administration.

what each row in a table represents

an instance of an entity

OLTP applications that use relational databases

banking solutions, online retail applications, flight reservation systems, and many online purchasing applications.

examples of visualizations

bar charts line charts plot results on geographical maps pie charts

how analytics are generated

by aggregating the the raw data into: - summaries - trends - other business information

how you design a relational database

by creating a data model

Elastic Pool option for Azure SQL Server

by default multiple databases can share the same resources, such as memory, data storage space, and processing power. The resources are referred to as a pool. You create the pool, and only your databases can use the pool.

all rows in a table have the same what

columns

Documents in a Cosmos DB database are organized into what?

containers.

Inside an Azure storage account, you create blobs inside these

containers.

How SQL virtual machines are lift-and-shift

copy your on-premises solution directly to a virtual machine in the cloud. The system should work more or less exactly as before in its new location, except for some small configuration changes (changes in network addresses, for example) to take account of the change in environment.

examples of entities

customers, products, and orders

What are the two scenarios that you can implement in Azure to perform analysis of business data?

data can be left in its raw, ingested format, or the data can be processed and saved to a specially designed data store or data warehouse.

common tasks of analytical processing systems

data ingestion data transformation data querying data visualization

What is the value of visualizing data?

data represented in tables such as rows and columns, or as documents, aren't always intuitive. Visualizing the data can often be useful as a tool for examining data

Azure Synapse Analytics combines what two data processes

data warehousing and Big Data analytics.

the default connectivity for Azure relational data services

disable access to the world

How fast transaction processing is supported. Give an example.

dividing data into small pieces. For example, if you're using a relational system each table involved in a transaction only contains the columns necessary to perform the transactional task. In the bank transfer example, a table holding information about the funds in the account might only contain the account number and the current balance. Other tables not involved in the transfer operation would hold information such as the name and address of the customer, and the account history.

How normalization speeds transaction throughput

enables a transactional system to cache much of the information required to perform transactions in memory

Isolation

ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. A concurrent process can't see the data in an inconsistent state (for example, the funds have been deducted from one account, but not yet credited to another.)

primary activities of the data analyst

explore and analyze data to create visualizations and charts to enable organizations to make informed decisions.

maximum number of replicas for Azure SQL Database for PostgreSQL or MySQL

five

advantages of cloud computing

flexibility for enterprises, opportunities for saving time and money, and improving agility and scalability.

encryption in Azure SQL Database

for data in motion, it uses transport layer security for data at rest, it uses always encrypted

What querying normalized tables involves

frequently need to join the data held across several tables back together again.

what Azure Data Studio provides

graphical user interface for managing many different database systems

Atomicity

guarantees that each transaction is treated as a single unit, which either succeeds completely, or fails completely. If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. The system must guarantee this property in each and every situation, including power failures, errors, and crashes.

Durability

guarantees that once a transaction has been committed, it will remain committed even if there's a system failure such as a power outage or crash.

a common scenario for read replicas

have BI and analytical workloads use read replicas as the data source for reporting.

for which industries might a decision of cloud versus on-premise already be made?

highly regulated ones

What does a trend show?

how data changes over time.

When provisioning, what do you specify?

how much: - disk space - memory - computing power - network bandwidth

What option does Azure Database for PostgreSQL offer for ultra-high performance workloads?

hyperscale

When to use an Elastics Pool in Azure SQL Server

if you have databases with resource requirements that vary over time, and can help you to reduce costs. For example, your payroll database might require plenty of CPU power at the end of each month as you handle payroll processing, but at other times the database might become much less active. You might have another database that is used for running reports. This database might become active for several days in the middle of the month as management reports are generated, but with a lighter load at other times. Elastic Pool enables you to use the resources available in the pool, and then release the resources once processing has completed.

what happens as you add rows to a table in Azure Table Storage

it automatically manages the partitions in a table and allocates storage as necessary. You don't need to take any additional steps yourself.

what some columns are use to do

maintain relationships between tables

downside to normalization

makes querying more complex

primary tasks of the database administrator

manage databases, assigning permissions to users, storing backup copies of data and restore data in case of any failures.

What is SQL Server on a Virtual Machine suitable for?

migrations and applications requiring access to operating system features that might be unsupported at the PaaS level.

When creating a table, these are the items between the parentheses that specify the details of each column

name, data type whether the column must always contain a value (NOT NULL) whether the data in the column is used to uniquely identify a row (PRIMARY KEY) Note: Each table should have a primary key, although SQL doesn't enforce this rule

Do Azure Table Storage tables have schemas

no You can use tables to hold flexible datasets such as user data for web applications, address books, device information, or other types of metadata your service requires.

Cosmos DB manages data as a what?

partitioned set of documents.

what SQL Server Management Studio is used for

provides a graphical interface enabling you to: - query data - perform general database administration tasks - generate scripts for automating database maintenance and support operations.

Before you can use a service, you must do what?

provision an instance of that service.

what can you do with a view

query the view and filter the data in much the same way as a table

what SQL Server running on an Azure virtual machine effectively does.

replicates the database running on real on-premises hardware.

all services and Azure resources are collected into these

resource groups

what a table contains

rows

how to load extensions in Azure SQL Database for PostgreSQL

run the CREATE EXTENSION command from psql tool to load the packaged objects into your database

The act of increasing (or decreasing) the resources used by a service is called what

scaling

the act of increasing (or decreasing) the resources used by a service

scaling

what an index helps you do

search for data in a table

areas in which extensions provide functionality not covered by the SQL standard

security management and programmability

three elements of a role assignment in Azure RBAC

security principal role definition scope

Give examples of an organization's range of sources, for which you might be asked to determine how best to store this information, so that it can be analyzed quickly, and queried easily.

such as real-time data monitoring the status of production line machinery, product quality control data, historical production logs, product volumes in stock, and raw materials inventory data. This information is critical to the operation of the organization.

characteristics of data in a Data Lake

the data is raw and unprocessed, so it's very fast to load and update, but the data hasn't been put into a structure suitable for efficient analysis. a staging point for your ingested data, before it's massaged and converted into a format suitable for performing analytics.

How is MariaDB different than MySQL

the database engine has been rewritten and optimized to improve performance

latency

the time taken for the data to be received and processed.

use cases for Azure Table Storage

to hold flexible datasets such as user data for web applications, address books, device information, or other types of metadata your service requires

structured data

typically tabular data that is represented by rows and columns in a database.

What do the Spark libraries provided with Azure Synapse Analytics enable you to do?

read data from external sources, and also write out data in a variety of different formats if you need to save your results for further analysis.

type of server you can replicate data from an Azure SQL Database for PostgreSQL to

read-only replica

one advantage of RDBS in the cloud

scalability

a query utility that runs from the command line and is also available in the Cloud Shell.

sqlcmd

What you must do to ingested data before you can process it

store it in a repository

You can change the default consistency for a Cosmos DB account using what?

the Default consistency page in the Azure portal.

What is provisioning?

the act of running a series of tasks that a service provider, such as Azure SQL Database, performs to create and configure a service.

the format of non-relational databases more closely resembles what?

the original data structure

Who was MariaDB created by?

the original developers of MySQL

what a security principal can represent

user group service principal managed identity

speed at which transactional data must be accessed

very quickly

how to provision Azure SQL Database using the Azure Portal

video at: https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-relational-database-offerings-azure/3-describe-provision-sql-database

primary activities of the data engineer

vital in working with data, applying data cleaning routines, identifying business rules, and turning data into useful information.

transactional system

what most people consider the primary function of business computing.

How you control access to resources using Azure RBAC to create role assignments. A role assignment consists of three elements: a security principal, a role definition, and a scope.

you control access to resources using Azure RBAC to create role assignments. A role assignment consists of three elements: a security principal, a role definition, and a scope.

how replicas are managed and billed

you manage similar to regular Azure Database for PostgreSQL servers. For each read replica, you're billed for the provisioned compute in vCores and storage in GB/month.

what happens if you have a connection policy of Redirect, and you lose connectivity

your application will have to reconnect through the gateway, when it might be directed to a different copy of the database running on another server in the cluster. (image)

the firewall rule that enables all Azure services to pass through the server-level firewall rule and attempt to connect to a single or pooled database through the server

0.0.0.0

port for connections to your Azure Database for PostgreSQL server communicate

5432

provides the information needed for Data Factory to connect to a source or destination. For example, you can use an Azure Blob Storage one of these to connect a storage account to Data Factory, or the Azure SQL Database one of these to connect to a SQL database.

A linked service

How can Azure Data Factory incorporate Azure Databricks notebooks into a pipeline?

A pipeline can pass parameters to a notebook. These parameters can specify which data to read and analyze. The notebook can save the results, which the Azure Data Factory pipeline can use in subsequent activities.

Three elements of a role assignment in Role Based Access Control (RBAC)

A role assignment consists of three elements: a security principal, a role definition, and a scope.

enables you to store large volumes of data quickly and easily prior to analyzing it

Azure Data Lake storage

an Apache Spark environment running on Azure to provide big data processing, streaming, and machine learning.

Azure Databricks

Azure Synapse Analytics

Azure Synapse Analytics provides a suite of tools to analyze and process an organization's data. It incorporates SQL technologies, Transact-SQL query capabilities, and open-source Spark tools to enable you to quickly process very large amounts of data.

tools you can use to provision services

Azure portal Azure command-line interface (CLI) Azure PowerShell Azure Resource Manager templates

Describe CosmosDB performance

Cosmos DB guarantees less than 10-ms latencies for both reads (indexed) and writes at the 99th percentile, all around the world. This capability enables sustained ingestion of data and fast queries for highly responsive apps.

SQL used to to sort the data

ORDER BY clause

how to create a linked service

The image shows the graphical user interface provided by Azure Data Factory for creating linked services.

the command terminator in psql

You can enter commands across several lines. The semi-colon character acts as the command terminator.

How to provision File storage in a storage account using the Azure portal

You provision File storage by creating one or more file shares in the storage account. In the Azure portal, select File shares on the Overview page for the account. Using the File shares page, create a new file share. Give the file share a name, and optionally set a quota to limit the size of files on the share. The total size of all files across all file shares in a storage account can't exceed 5120 GB. After you've created the file share, applications can read and write shared files using the file share.

Hive

a SQL-like query facility that you can use with an HDInsight cluster to examine data held in a variety of formats. You can use it to create, load, and query external tables, in a manner similar to PolyBase for Azure Synapse Analytics

Azure HDInsight

a big data processing service, that provides the platform for technologies such as Spark in an Azure environment. It implements a clustered model that distributes processing across a set of computers. This model is similar to that used by Synapse Analytics, except that the nodes are running the Spark processing engine rather than Azure SQL Database. can use it in conjunction with, or instead of, Azure Synapse Analytics.

data

a collection of facts such as numbers, descriptions, and observations used in decision making. You can classify data as structured, semi-structured, or unstructured.

A document in Azure CosmosDB is a what?

a collection of fields, identified by a key

examples of data that don't fit well into a relational database

a collection of music, video, or other media files

role definition

a collection of permissions lists the operations that can be performed, such as read, write, and delete. can be built-in or custom

what do you specify when creating an index

a column from the table

example of streaming data use case

a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.

view

a virtual table based on the result set of a query.

data role orientations

business engineering research

version of SQL Server that can be run on Virtual Machines

full

where to find SQL Server Data Tools

in the Tools menu of Visual Studio

What are the key characteristics of Azure Data Lake?

- Data Lake Storage organizes your files into directories and subdirectories for improved file organization. Blob storage can only mimic a directory structure. - Data Lake Storage supports the Portable Operating System Interface (POSIX) file and directory permissions to enable granular Role-Based Access Control (RBAC) on your data. - Azure Data Lake Storage is compatible with the Hadoop Distributed File System (HDFS). Hadoop is highly flexible and programmable analysis service, used by many organizations to examine large quantities of data. All Apache Hadoop environments can access data in Azure Data Lake Storage Gen2.

port on which Azure SQL Database communicates

1433

When a cluster of servers is used within in a single region, how is data stored?

A copy of all data is held in each server in the cluster.

Options for storing data in a document in Azure CosmosDB

A document can hold up to 2 MB of data, including small binary objects. If you need to store larger blobs as part of a document, use Azure Blob storage, and add a reference to the blob in the document.

two organizations that standardized SQL

American National Standards Institute (ANSI) in 1986 International Organization for Standardization (ISO) in 1987

enables you to build tabular models to support online analytical processing (OLAP) queries. You can combine data from multiple sources, including Azure SQL Database, Azure Synapse Analytics, Azure Data Lake store, Azure Cosmos DB, and many others. You use these data sources to build models that incorporate your business knowledge. A model is essentially a set of queries and expressions that retrieve data from the various data sources and generate results. The results can be cached in-memory for later use, or they can be calculated dynamically, directly from the underlying data sources.

Azure Analysis Services

enables you to dig deeply into the data that for example was stored using Azure Synapse services, and generate insights from the information

Azure Analysis Services

performs a detailed analysis of data that has been stored using Azure Synapse services

Azure Analysis Services

The data generated in Azure Synapse Analytics can be used as input to further analytical processing, using what?

Azure Analysis Services.

Common use cases for this service: - Serving images or documents directly to a browser, in the form of a static website. Visit Static website hosting in Azure storage for detailed information. - Storing files for distributed access - Streaming video and audio - Storing data for backup and restore, disaster recovery, and archiving - Storing data for analysis by an on-premises or Azure-hosted service

Azure Blob Storage

a service that enables you to store massive amounts of unstructured data, or blobs, in the cloud

Azure Blob Storage

A service in which you can store unstructured data

Azure Blob storage

used as the basis for Azure Data Lake storage. You can use Azure Data Lake storage for performing big data analytics. For more information, visit Introduction to Azure Data Lake Storage Gen2.

Azure Blob storage

Azure Data Lake Store is compatible with what?

Azure Data Lake Store is compatible with the Hadoop Distributed File System (HDFS). You can run Hadoop jobs using Azure HDInsight (see below) that can read and write data in Data Lake Store efficiently.

a graphical utility for creating and running SQL queries from your desktop

Azure Data Studio

The most common options for processing data in Azure include what?

Azure Databricks, Azure Data Factory, Azure Synapse Analytics, and Azure Data Lake.

Synapse link

Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data stored in Azure Cosmos DB. Synapse link uses a feature of Cosmos DB named Cosmos DB Analytical Store. Cosmos DB Analytical Store contains a copy of the data in a Cosmos DB container, but organized as a column store. Column stores group data by column rather than by row. Column stores are a more optimal format for running analytical workloads that need to aggregate data down a column rather than across a row, such as generating sum totals, averages, maximum or minimum values for a column. Cosmos DB automatically keeps the data in its containers synchronized with the copies in the column store. Azure Synapse Link enables you to run workloads that retrieve data directly from Cosmos DB and run analytics workloads using Azure Synapse Analytics. The data doesn't have to go through an ETL (extract, transform, and load) process because the data isn't copied into Synapse Analytics; it remains in the Cosmos DB analytical store. Business analysts, data engineers, and data scientists can now use Synapse Spark pools or Synapse SQL pools to run near real-time business intelligence, analytics, and machine learning pipelines. You can achieve this without impacting the performance of your transactional workloads on Azure Cosmos DB.

characteristics of an Azure Virtual Network

Azure ensures that each virtual network is isolated from other virtual networks created by other users, and from the Internet. Azure enables you to specify which machines (real and virtual), and services, are allowed to access resources on the virtual network, and which ports they can use.

using Azure PowerShell to provision resources

Azure provides a series of commandlets (Azure-specific commands) that you can use in PowerShell to create and manage Azure resources. Like the CLI, PowerShell is available for Windows, macOS, and Linux

helps you manage who has access to Azure resources, and what they can do with those resources

Azure role-based access control (Azure RBAC)

Data Lake storage, Blob storage, and File Storage, all require that you first create a what?

Azure storage account.

illustrate transaction atomicity with an example

Bank transfers are a good example; you deduct funds from one account and credit the equivalent funds to another account. If the system fails after deducting the funds, they must be reinstated in the original account (they mustn't be lost). You can then attempt to perform the transfer again. Similarly, you shouldn't be able to credit an account twice with the same funds.

price tier options for Azure SQL Database for PostgreSQL or MySQL Database configurations

Basic. This tier is suitable for workloads that require light compute and I/O performance. Examples include servers used for development or testing or small-scale, infrequently used applications. General Purpose. Use this pricing tier for business workloads that require balanced compute and memory with scalable I/O throughput. Examples include servers for hosting web and mobile apps and other enterprise applications. Memory Optimized This tier supports high-performance database workloads that require in-memory performance for faster transaction processing and higher concurrency. Examples include servers for processing real-time data and high-performance transactional or analytical apps. You can fine-tune the resources available for the selected tier. You can scale these resources up later, if necessary.

Blob redundancy options

Blobs are always replicated three times in the region in which you created your account, but you can also select geo-redundancy, which replicates your data in a second region (at additional cost).

Benefits of Azure Database for MariaDB

Built-in high availability with no additional cost. Predictable performance, using inclusive pay-as-you-go pricing. Scaling as needed within seconds. Secured protection of sensitive data at rest and in motion. Automatic backups and point-in-time-restore for up to 35 days. Enterprise-grade security and compliance.

Creating SQL pools

By default, an on-demand SQL pool is created in each Azure Synapse Analytics workspace. You can then provision additional pools, either on-demand or provisioned. Note On-demand pools only allow you to query data held in external files. If you want to ingest and load the data into Synapse Analytics, you must create your own SQL pool.

How a JOIN defines the way two tables are related in a query

By: - Specifying the column from each table to be used for the join. A typical join condition specifies a foreign key from one table and its associated primary key in the other table. - Specifying a logical operator (for example, = or <>,) to be used in comparing values from the columns.

most common DDL statements

CREATE Create a new object in the database, such as a table or a view. ALTER Modify the structure of an object. For instance, altering a table to add a new column. DROP Remove an object from the database. RENAME Rename an existing object.

What are the editions of MySQL

Community - free, popular for web applications running under Linux, and some Windows Standard - higher performance, using a different storage technology Enterprise - comprehensive, used by commercial organizations, not free

Cosmos DB has been used by which of Microsoft's products for mission critical applications

Cosmos DB is a foundational service in Azure. Cosmos DB has been used by many of Microsoft's products for mission critical applications at global scale, including Skype, Xbox, Microsoft 365, Azure, and many others.

two main logical groups of SQL statements

Data Manipulation Language (DML) Data Definition Language (DDL)

SQL used to manipulate the rows in a relational table

Data Maniuplation Lanaguge (DML)

differences between batch and streaming data

Data Scope: Batch data can process all the data in the dataset. Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example). Data Size: Batch data is suitable for handling large datasets efficiently. Stream processing is intended for individual records or micro batches consisting of few records. Performance: The latency for batch processing is typically a few hours. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds. Latency is the time taken for the data to be received and processed. Analysis: You typically use batch processing for performing complex analytics. Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.

How Azure Databricks also supports structured stream processing.

In this model, Databricks performs your computations incrementally, and continuously updates the result as streaming data arrives.

indicate how the rows in one table are connected with rows in the other to determine what data to return

JOIN clause

Configure authentication in a storage account

Many services include an access key that you can specify when you attempt to connect to the service. If you provide an incorrect key, you'll be denied access. The image here shows how to find the access key for an Azure Storage account; you select Access Keys under Settings on the main page for the account. Many other services allow you to view the access key in the same way from the Azure portal. If your key is compromised, you can generate a new access key. Azure services actually provide two keys, labeled key1 and key2. An application can use either key to connect to the service. Any user or application that knows the access key for a resource can connect to that resource. However, access keys provide a rather coarse-grained level of authentication. Additionally, if you need to regenerate an access key (after accidental disclosure, for example), you may need to update all applications that connect using that key.

tools for the data analyst

Microsoft Power BI and SQL Server Reporting Services

what is Power BI?

Microsoft Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Whether your data is a simple Microsoft Excel workbook, or a collection of cloud-based and on-premises hybrid data warehouses, Power BI lets you easily connect to your data sources, visualize (or discover) what's important, and share that with anyone or everyone you want.

when you might need SQL Server in a VM instead of Managed Instance on Azure SQL Database

SQL Database managed instance supports linked servers, although some of the other the advanced features required by the database might not be available. If you want a complete match, then running SQL Server on a virtual machine may be your only option, but you need to balance the benefits of complete functionality against the administrative and maintenance overhead required.

common database administrator tools

SQL Server Database Administrators use SQL Server Management Studio for most of their day-to-day database maintenance activities. pgAdmin for PostgreSQL systems MySQL Workbench for MySQL. There are also a number of cross-platform database administration tools available. One example is Azure Data Studio.

What type of utility is SSIS?

SSIS is an on-premises utility. However, Azure Data factory allows you to run your existing SSIS packages as part of a pipeline in the cloud. This allows you to get started quickly without having to rewrite your existing transformation logic. The SSIS Feature Pack for Azure is an extension that provides components that connect to Azure services, transfer data between Azure and on-premises data sources, and process data stored in Azure. The components in the feature pack support transfer to or from Azure storage, Azure Data Lake, and Azure HDInsight. Using these components, you can perform large-scale processing of ingested data.

default security that is required and enforced on your Azure Database for MySQL server

SSL connection security

services that can read from and write to Data Lake Store directly.

Services such as Azure Data Factory, Azure Databricks, Azure HDInsight, Azure Data Lake Analytics, and Azure Stream Analytics can read and write Data Lake Store directly.

when to use a mapping in Azure Data Factory

Sometimes when ingesting data, the data you're bringing in can have different column names and data types to those required by the output. In these cases, you can use a mapping to transform your data from the input format to the output format. The screenshot below shows the mapping canvas for the Copy Data activity. It illustrates how the columns from the input data can be mapped to the data format required by the output.

data formats in Spark pools

Spark pools enable you to process data held in many formats, such as csv, json, xml, parquet, orc, and avro. Spark can be extended to support many more formats with external data sources.

Using the Azure command-line interface (CLI) to provision resources

The CLI provides a set of commands that you can run from the operating system command prompt or the Cloud Shell in the Azure portal. You can use these commands to create and manage Azure resources. The CLI is suitable if you need to automate service creation; you can store CLI commands in scripts, and you can run these scripts programmatically. The CLI can run on Windows, macOS, and Linux computers.

This tier has lower performance and incurs reduced storage charges compared to the Hot tier. Use this tier for data that is accessed infrequently. It's common for newly created blobs to be accessed frequently initially, but less so as time passes. In these situations, you can create the blob in the Hot tier, but migrate it to this tier later. You can migrate a blob from this tier back to the Hot tier.

The Cool tier.

used in front of a string to indicate that the string uses the Unicode character set.

The N character

In CosmosDB, which page enables you to configure replication in more detail. You can replicate to multiple regions, and you select the regions to use. In this way, you can pick the regions that are closest to your consumers, to help minimize the latency of requests made by those consumers.

The Replicate data globally page

tools you can use to query an Azure SQL Database

The query editor in the Azure portal The sqlcmd utility from the command line or the Azure Cloud Shell SQL Server Management Studio Azure Data Studio SQL Server Data Tools

How is Azure Synapse Analytics particularly suitable for ingesting your business's raw data into a data store for analytics using an ELT process

Using Apache Spark, and automated pipelines, Synapse Analytics can run parallel processing tasks across massive datasets, and perform big data analytics.

Here are the basic building blocks in Power BI

Visualizations Datasets Reports Dashboards Tiles

you filter a SQL statement using this keyword

WHERE

apply this to DML statements to specify criteria; only rows that match these criteria will be selected, updated, or deleted.

WHERE clause

What happens when the user runs a query that specifies this column in the WHERE clause, in which the column has an index on it?

When the user runs a query that specifies this column in the WHERE clause, the database management system can use this index to fetch the data more quickly than if it had to scan through the entire table row by row

two ways to explore data in Azure Analysis Services

You can explore this data from within Analysis Services, or you can use a tool such as Microsoft Power BI to visualize the data presented by these models.

types of data you load using Data Factory

You can load static data, but you can also ingest streaming data. Loading data from a stream offers a real-time solution for data that arrives quickly or that changes rapidly. Using streaming, you can use Azure Data Factory to continually update the information in a data warehouse with the latest data.

limitations on cloud-based RDBMSs

You may find that there are some functional restrictions in place, and not every feature of your selected database management system may be available. These restrictions are often due to security issues. For example, they might expose the underlying operating system and hardware to your applications. In these cases, you may need to rework your applications to remove any dependencies on these features.

how you create a SQL database in the cloud

You specify the resources that you require (based on how large you think your databases will be, the number of users, and the performance you require), and Azure automatically creates the necessary virtual machines, networks, and other devices for you

what you use the key SQL commands to do

You use the CREATE TABLE command to create a table, the INSERT statement to store data in a table, the UPDATE statement to modify data in a table the DELETE statement to remove rows from a table. the SELECT statement retrieves data from a table. The example query below finds the details of every customer from the sample database shown above

How do you write Databricks applications?

You write Databricks applications using a Notebook. A notebook contains a series of steps (cells), each of which contains a block of code. For example, one cell might contain the code that connects to a data source, the next cell reads the data from that source and converts it into a model in-memory, the next cell plots a graph, and a final cell saves the data from the in-memory model to a repository.

Azure Files Storage is a fully managed service. What are the replication options?

Your shared data is replicated locally within a region, but can also be geo-replicated to a second region.

telematics

any technology that involves the long-distance transmission of digital information

Use cases for an Elastic Pool option in Azure SQL Database

best option for low cost with minimal administration. It is not fully compatible with on-premises SQL Server installations. It is often used in new cloud projects where the application design can accommodate any required changes to your applications. Azure SQL Database is often used for: - Modern cloud applications that need to use the latest stable SQL Server features. - Applications that require high availability. - Systems with a variable load, that need the database server to scale up and down quickly.

In this format, the fields in a document are enclosed between what and each field is prefixed with what?

between braces, { and }, and each field is prefixed with its name.

It is handled as a set of these. Each one can vary in size, up to 100 MB. It can contain up to 50,000 of these, giving a maximum size of over 4.7 TB. It is the smallest amount of data that can be read or written as an individual unit. best used to store discrete, large, binary objects that change infrequently.

block blob

What is one notable feature of MariaDB

built-in support for temporal data. A table can hold several versions of data, enabling an application to query the data as it appeared at some point in the past.

who analytical systems are designed for

business users who need to query data and gain a big picture view of the information held in a database

What is data ingestion the process of?

capturing data

What analytical systems are concerned with

capturing raw data and using it to generate insights to make business decisions

describe the data engineer role

collaborates with stakeholders to design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads. use a wide range of data platform technologies, including relational and nonrelational databases, file stores, and data streams. responsible for ensuring that the privacy of data is maintained within the cloud and spanning from on-premises to the cloud data stores. own the management and monitoring of data stores and data pipelines to ensure that data loads perform as expected.

after you provision a resource, what will you often need to do?

configure it to meet the needs of your applications and environment. For example, you might need to set up network access, or open a firewall port to enable your applications to connect to the resource

Examples of where ingested data might come from.

control devices measuring environmental information such as temperature and pressure point-of-sale devices recording the items purchased by a customer in a supermarket financial data recording the movement of money between bank accounts weather data from weather stations. Some of this data might come from a separate OLTP system.

example of using insights generated by an analytical system

detailed insights for a manufacturing company might indicate trends enabling them to determine which product lines to focus on, for profitability.

What does a service provider set up during provisioning?

disks memory, CPUs networks ...and more ...required to run the service

Other models, collectively known as NoSQL databases exist. These models store data in other structures, such as what?

documents, graphs, key-value stores, and column family stores.

describe the data analyst role

enables businesses to maximize the value of their data assets. responsible for designing and building scalable models, cleaning and transforming data, and enabling advanced analytics capabilities through reports and visualizations. processes raw data into relevant insights based on identified business requirements to deliver relevant insights.

credentials needed to connect to Azure SQL Database for PostgreSQL

full server name and admin sign-in credentials

access the query editor in Azure portal

go to the page for your database and select Query editor.

volume level of transactional systems

high sometimes handling many millions of transactions in a single day

how many indexes can you create

many

The data types that most database management systems support

numeric types such as INT string types such as VARCHAR (VARCHAR stands for variable length character data).

find the server name and the name of the default administrator account To connect to a PostgreSQL database

on the Overview page for the Azure Database for PostgreSQL instance in the Azure portal. Contact your administrator for the password.

where you can find connection information for Azure SQL Database for MySQL

on the Overview page for your server

where you can find the server name and sign in information for Azure SQL Database.

on the server Overview page in the portal.

maximum number of clustered indexes allowed on a table

one

what rows have

one or more columns that define the properties of the entity, such as the customer name, or product ID.

The documents in a CosmosDB container are grouped together into what?

partitions. A partition holds a set of documents that share a common partition key. You designate one of the fields in your documents as the partition key.

advantages of SQL Server in the cloud

scalability scale up or down (increase or decrease the size and number of resources) quickly, as the volume of data and the amount of work being done varies; Azure handles this scaling for you, and you don't have to manually add or remove virtual machines, or perform any other form of configuration.

which DBMSs support clustered indexes

some of them

What a join operation does

spans the relationships between tables, enabling you to retrieve the data from more than one table at a time.

To help ensure fast access, Azure Table Storage does what?

splits a table into partitions

three classifications of data

structured semi-structured unstructured

in the image here, describe the relationships between tables

the Orders table contains both a Customer ID and a Product ID. The Customer ID relates to the Customers table to identify the customer that placed the order, and the Product ID relates to the Products table to indicate what product was purchased.

Where you can find the server, database, account name, and password for an Azure SQL Database

the Overview page for a database in the Azure portal: select Show database connection strings. The database connection string shown in the Azure portal does not include the password for the account. You must contact your database administrator for this information.

Azure File Storage exposes file shares using what

the Server Message Block 3.0 (SMB) protocol This is the same file sharing protocol used by many existing on-premises applications. These applications should continue to work unchanged if you migrate your file shares to the cloud. The applications can be running on-premises, or in the cloud. You can control access to shares in Azure File Storage using authentication and authorization services available through Azure Active Directory Domain Services.

ways to batch process

you can process data based on a scheduled time interval (for example, every hour), or it could be triggered when a certain amount of data has arrived, or as the result of some other event.

What tool can you use if you have existing MySQL, MariaDB, or PostgreSQL databases running on premises, and you want to move the data to a database running the corresponding data services in Azure,

you can use the Azure Database Migration Service (DMS).

What are the key features of pgsql?

you can write stored procedures that run inside the database.

To connect to Azure Database for PostgreSQL from Azure Data Studio,

you must first install the PostgreSQL extension. On the Extensions page, search for postgresql. Select the PostgreSQL extension, and then select Install. use the extension to connect to PostgreSQL: In Azure Data Studio, go to the SERVERS sidebar, and select New Connection. In the Connection dialog box, in the Connection type drop-down list box, select PostgreSQL. Fill in the remaining fields using the server name, user name, and password for your PostgreSQL server. Server Name The fully qualified server name from the Azure portal. User name The user name you want to sign in with. This must be in the format shown in the Azure portal, <username>@<hostname>. Password The password for the account you're logging in with. Database name Fill this if you want the connection to specify a database. Server Group This option lets you assign this connection to a specific server group you create. Name (optional) This option lets you specify a friendly name for your server. Select Connect to establish the connection. After successfully connecting, your server opens in the SERVERS sidebar. You can expand the Databases node to connect to databases on the server and view their contents. Use the New Query command in the toolbar to create and run queries

The result of normalization

your data is split into a large number of narrow, well-defined tables (a narrow table is a table with few columns), with references from one table to another. See image

number of rows a relational table can have

zero to many

examples of streaming data

- A financial institution tracks changes in the stock market in real time, computes value-at-risk, and automatically rebalances portfolios based on stock price movements. - An online gaming company collects real-time data about player-game interactions, and feeds the data into its gaming platform. It then analyzes the data in real time, offers incentives and dynamic experiences to engage its players. - A real-estate website that tracks a subset of data from consumers' mobile devices, and makes real-time property recommendations of properties to visit based on their geo-location.

the main characteristics of a relational database

- All data is tabular. Entities are modeled as tables, each instance of an entity is a row in the table, and each property is defined as a column. - All rows in the same table have the same set of columns. - A table can contain any number of rows. - A primary key uniquely identifies each row in a table. No two rows can share the same primary key. - A foreign key references rows in another, related table. - For each value in the foreign key column, there should be a row with the same value in the corresponding primary key column in the other table.

Things you can control with Role-Based Access Control

- Allow one user to manage virtual machines in a subscription and another user to manage virtual networks. - Allow a database administrator group to manage SQL databases in a subscription. - Allow a user to manage all resources in a resource group, such as virtual machines, websites, and subnets. - Allow an application to access all resources in a resource group.

services needed for Managed Instance option in Azure SQL Database

- Azure Storage for backups - Azure Event Hubs for telemetry - Azure Active Directory for authentication - Azure Key Vault for Transparent Data Encryption (TDE) - a couple of Azure platform services that provide security and supportability features.

steps for connecting to an Azure SQL Database

- Clients connect to a gateway that has a public IP address and listens on port 1433. - Depending on the effective connection policy, the gateway either redirects traffic to the database cluster, or acts as a proxy for the database cluster. Note Azure SQL Database uses a clustered topology to provide high availability. Each server and database is transparently replicated to ensure that a server is always accessible, even in the event of a database or server failure. - Inside the database cluster, traffic is forwarded to the appropriate Azure SQL database.

list of roles and responsibilities of a data engineer

- Developing, constructing, testing, and maintaining databases and data structures. - Aligning the data architecture with business requirements. - Data acquisition. - Developing processes for creating and retrieving information from data sets. - Using programming languages and tools to examine the data. - Identifying ways to improve data reliability, efficiency, and quality. - Conducting research for industry and business questions. - Deploying sophisticated analytics programs, machine learning, and statistical methods. - Preparing data for predictive and prescriptive modeling. - Using data to discover tasks that can be automated.

The Configuration page for a storage account enables you to modify some general settings of the account. What can you do?

- Enable or disable secure communications with the service. By default, all requests and responses are encrypted by using the HTTPS protocol as they traverse the Internet. You can disable encryption if required, although this isn't recommended. - Switch the default access tier between Cool and Hot. - Change the way in which the account is replicated. - Enable or disable integration with Azure AD for requests that access file shares.

What does the hyperscale option for Azure Database for PostgreSQL support?

- Horizontal scaling across multiple machines. This option enables the service to add and remove computers as workloads increase and diminish. - Query parallelization across these servers. The service can split resource intensive queries into pieces which can be run in parallel on the different servers. The results from each server are aggregated back together to produce a final result. This mechanism can deliver faster responses on queries over large datasets. - Excellent support for multi-tenant applications, real time operational analytics, and high throughput transactional workloads

list the common roles and responsibilities of database administrators

- Installing and upgrading the database server and application tools. - Allocating system storage and planning storage requirements for the database system. - Modifying the database structure, as necessary, from information given by application developers. - Enrolling users and maintaining system security. - Ensuring compliance with database vendor license agreement. - Controlling and monitoring user access to the database. - Monitoring and optimizing the performance of the database. - Planning for backup and recovery of database information. - Maintaining archived data. - Backing up and restoring databases. - Contacting database vendor for technical support. - Generating various reports by querying from database as per need. - Managing and monitoring data replication. - Acting as liaison with users.

list of roles and responsibilities of a data analyst

- Making large or complex data more accessible, understandable, and usable. - Creating charts and graphs, histograms, geographical maps, and other visual models that help to explain the meaning of large volumes of data, and isolate areas of interest. - Transforming, improving, and integrating data from many sources, depending on the business requirements. - Combining the data result sets across multiple sources. For example, combining sales data and weather data provides a useful insight into how weather influenced sales of certain products such as ice creams. - Finding hidden patterns using data. - Delivering information in a useful and appealing way to users by creating rich graphical dashboards and reports.

disadvantages of batch processing

- The time delay between ingesting the data and getting the results. - All of a batch job's input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt. The input data must be carefully checked before the job can be run again. Even minor data errors, such as typographical errors in dates, can prevent a batch job from running.

other features of blobs

- Versioning. You can maintain and restore earlier versions of a blob. - Soft delete. This feature enables you to recover a blob that has been removed or overwritten, by accident or otherwise. - Snapshots. A snapshot is a read-only version of a blob at a particular point in time. - Change Feed. The change feed for a blob provides an ordered, read-only, record of the updates made to a blob. You can use the change feed to monitor these changes, and perform operations such as: -- Update a secondary index, synchronize with a cache, search-engine, or any other content-management scenarios. -- Extract business analytics insights and metrics, based on changes that occur to your objects, either in a streaming manner or batched mode. -- Store, audit, and analyze changes to your objects, over any period of time, for security, compliance or intelligence for enterprise data management. -- Build solutions to back up, mirror, or replicate object state in your account for disaster management or compliance. -- Build connected application pipelines that react to change events or schedule executions based on created or changed objects.

What threat protection provides

- adds security intelligence to your service. This intelligence monitors the service and detects unusual patterns of activity that could be harmful, or compromise the data managed by the service. - identifies potential security vulnerabilities and recommends actions to mitigate them. - see image here

advantages of cloud-based RBDMS

- hosted and managed offsite - no capital expense - regular backups - only pay for what you use - easily connect globally - near-instant provision because everything is already configured

typical database administration tasks performed in Azure Studio

- increasing the database size - creating a new database - deleting an existing database - dynamically manage and adjust resources such as the data storage size and the number of cores available for the database processing.

experience and skills data engineers need

- programming - mathematics - computer science - soft skills to communicate data trends to others in the organization and to help the business make use of the data it collects.

The maximum size of a single file is what size?

1 TiB but you can set quotas to limit the size of each share below this figure.

Describe each of the three elements of a role assignment in Role Based Access Control

1) A security principal is an object that represents a user, group, service, or managed identity that is requesting access to Azure resources. 2) A role definition, often abbreviated to role, is a collection of permissions. A role definition lists the operations that can be performed, such as read, write, and delete. Roles can be given high-level names, like owner, or specific names, like virtual machine reader. Azure includes several built-in roles that you can use, including: Owner - Has full access to all resources including the right to delegate access to others. Contributor - Can create and manage all types of Azure resources but can't grant access to others. Reader- Can view existing Azure resources. User Access Administrator - Lets you manage user access to Azure resources. You can also create your own custom roles. For detailed information, see Create or update Azure custom roles using the Azure portal on the Microsoft website. 3) A scope lists the set of resources that the access applies to. When you assign a role, you can further limit the actions allowed by defining a scope. This is helpful if, for example, you want to make someone a Website Contributor, but only for one resource group.

two routes data in the Data Lake can follow

1) converted into a normalized format suitable for analysis and stored using Azure Synapse Analytics services 2) use Azure Databricks to perform other forms of data preparation, such as additional transformations or cleaning to remove anomalies. You can then store the data using Azure Synapse Analytics service if required

two ways to process raw data

1) process each data item as it arrives (streaming) 2) buffer the raw data and process it in groups (batch processing)

Azure File Storage enables you to share up to how much data in a single storage account?

100 TB This data can be distributed across any number of file shares in the account.

Azure File Storage supports up to how many concurrent connections per shared file?

2000

Azure aims to provide up to what rate of throughput for a single Standard file share,

300 MB/second for a single Standard file share, but you can increase throughput capacity by creating a Premium file share, for additional cost.

port over which connections to Azure SQL Database for MySQL communicate

3306

How does a Shared Access Signature (SAS) work?

A SAS is a token that an application can use to connect to the resource. The application appends the token to the URL of the resource. The application can then send requests to read or write data using this URL and token. You can create a token that grants temporary access to the entire service, containers in the service, or individual objects such as blobs and files.

A common approach to ingesting your business's raw data into a data store for analytics

A common approach that you can use with Azure Synapse Analytics is to: - extract the data from where it's currently stored, - load this data into an analytical data store, and then - transform the data, shaping it for analysis. This approach is known as ELT, for extract, load, and transform.

A common flow of work in Power BI

A common flow of work in Power BI begins in Power BI Desktop, where a report is created. That report is then published to the Power BI service and finally shared, so that users of Power BI Mobile apps can consume the information.

provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders, similar to files in a file system on disk. You control who can read and write blobs at this level

A container

a repository for large quantities of raw data. also, a staging point for your ingested data, before it's massaged and converted into a format suitable for performing analytics.

A data lake

represents the data that you want to ingest (input) or store (output) in Azure Data Factory. If it has a structure, a dataset specifies how the data is structured. Not all datasets are structured. Blobs held in Azure Blob storage are an example of unstructured data.

A dataset

Power BI dataset

A dataset is a collection of data that Power BI uses to create its visualizations. You can have a simple dataset that's based on a single table from a Microsoft Excel workbook, similar to what's shown in the following image. Datasets can also be a combination of many different sources, which you can filter and combine to provide a unique collection of data (a dataset) for use in Power BI. For example, you can create a dataset from three database fields, one website table, an Excel table, and online results of an email marketing campaign. That unique combination is still considered a single dataset, even though it was pulled together from many different sources. Filtering data before bringing it into Power BI lets you focus on the data that matters to you. For example, you can filter your contact database so that only customers who received emails from the marketing campaign are included in the dataset. You can then create visuals based on that subset (the filtered collection) of customers who were included in the campaign. Filtering helps you focus your data—and your efforts. An important and enabling part of Power BI is the multitude of data connectors that are included. Whether the data you want is in Excel or a Microsoft SQL Server database, in Azure or Oracle, or in a service like Facebook, Salesforce, or MailChimp, Power BI has built-in data connectors that let you easily connect to that data, filter it if necessary, and bring it into your dataset. After you have a dataset, you can begin creating visualizations that show different portions of it in different ways, and gain insights based on what you see. That's where reports come in.

the purpose for Azure File Storage

A file share enables you to store a file on one computer, and grant access to that file to users and applications running on other computers. This strategy can work well for computers in the same local area network, but doesn't scale well as the number of users increases, or if users are located at different sites.

This can automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is used less frequently (based on the number of days since modification). This can also arrange to delete outdated blobs.

A lifecycle management policy

a Databricks notebook contains what?

A notebook contains cells, each of which contains a separate block of code. When you run a notebook, the code in each cell is passed to Spark in turn for execution. The image below shows a cell in a workbook that runs a query and generates a graph.

a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data. For example, you might use a copy activity to transform data from a source dataset to a destination dataset. You could include activities that transform the data as it is transferred, or you might combine data from multiple sources together. Other activities enable you to incorporate processing elements from other services. For example, you might use an Azure Function activity to run an Azure Function to modify and filter data, or an Azure Databricks Notebook activity to run a notebook that performs more advanced processing. They don't have to be linear. You can include logic activities that repeatedly perform a series of tasks while some condition is true using a ForEach activity, or follow different processing paths depending on the outcome of previous processing using an If Condition activity.

A pipeline

pipeline

A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task. For example, a pipeline might contain a series of activities that ingests raw data from Azure Blob storage, and then runs a Hive query on an HDInsight cluster to partition the data and store the results in a Cosmos DB database.

Synapse pipeline

A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently. The activities in a pipeline define actions to perform on your data. For example, you may use a copy activity to copy data from Azure Blob Storage into Azure Synapse using a SQL pool. Then, use a data flow activity or a notebook activity using a Spark pool to process and generate a machine learning model. Synapse pipelines use the same Data Integration engine used by Azure Data Factory. This gives you the power in Synapse Studio to create pipelines that can connect to over 90 sources from flat files, databases, or online services. You can create codeless data flows that let you do complex mappings and transformations on data as it flows into your analytic solutions. The example below shows a pipeline with three activities. The pipeline ingests data, and then uses a Spark notebook to generate a machine learning model. The Azure function at the end of the pipeline tests the machine learning model to validate it.

visualization

A visualization (sometimes also referred to as a visual) is a visual representation of data, like a chart, a color-coded map, or other interesting things you can create to represent your data visually. Power BI has all sorts of visualization types, and more are coming all the time. The following image shows a collection of different visualizations that were created in the Power BI service. Visualizations can be simple, like a single number that represents something significant, or they can be visually complex, like a gradient-colored map that shows voter sentiment about a certain social issue or concern. The goal of a visual is to present data in a way that provides context and insights, both of which would probably be difficult to discern from a raw table of numbers or text.

the four properties transactions must adhere to to ensure database consistency

ACID (Atomicity, Consistency, Isolation, Durability)

Cosmos DB provides what that enable you to access these documents using a set of well-known interfaces.

APIs

contains a list of resources, and the objects (users, computers, and applications) that are allowed to access those resources. When an object attempts to use a resource that is protected by this, if it's not in the list, it won't be given access

Access Control List (ACL)

defines what a user or application can do with your resources once they've been authenticated

Access control

requirements for data ingestion

Accuracy. If you lose any data at this point, then any resulting information can be inaccurate, failing to represent the facts on which you might base your business decisions. Speed. In a big data system, data ingestion has to be fast enough to capture the large quantities data that may be heading your way, and have enough compute power to process this data in a timely manner.

default data security level for Azure storage accounts

All data held in an Azure Storage account is automatically encrypted.

What encryption is offered for Azure File Storage?

All data is encrypted at rest, and you can enable encryption for data in-transit between Azure File Storage and your applications.

what you can do with Azure role-based access control (Azure RBAC)

Allow one user to manage virtual machines in a subscription and another user to manage virtual networks. Allow a database administrator group to manage SQL databases in a subscription. Allow a user to manage all resources in a resource group, such as virtual machines, websites, and subnets. Allow an application to access all resources in a resource group.

API

An API is an Application Programming Interface. Database management systems (and other software frameworks) provide a set of APIs that developers can use to write programs that need to access data. The APIs will often be different for different database management systems.

What does provisioning a CosmosDB account provide?

An Azure Cosmos DB account by itself doesn't really provide any resources other than a few pieces of static infrastructure. Databases and containers are the primary resource consumers.

using Azure Resource Manager templates to provision resources

An Azure Resource Manager template describes the service (or services) that you want to deploy in a text file, in a format known as JSON (JavaScript Object Notation). The example here shows a template that you can use to provision an instance of Azure SQL Database. You send the template to Azure using the az deployment group create command in the Azure CLI, or New-AzResourceGroupDeployment command in Azure PowerShell.

Power BI app

An app is a collection of preset, ready-made visuals and reports that are shared with an entire organization. Using an app is like microwaving a TV dinner or ordering a fast-food value meal: you just have to press a few buttons or make a few comments, and you're quickly served a collection of entrees designed to go together, all presented in a tidy, ready-to-consume package For these software services, the Power BI service provides a collection of ready-made visuals that are pre-arranged on dashboards and reports for your organization. This collection of visuals is called an app. Apps get you up and running quickly, with data and dashboards that your organization has created for you. For example, when you use the GitHub app, Power BI connects to your GitHub account (after you provide your credentials) and then populates a predefined collection of visuals and dashboards in Power BI. There are apps for all sorts of online services. The following image shows a page of apps that are available for different online services, in alphabetical order. This page is shown when you select the Get button in the Services box (shown in the previous image). As you can see from the following image, there are many apps to choose from.

Apache Storm

Apache Storm is a scalable, fault tolerant platform for running real-time data processing applications. Storm can process high volumes of streaming data using comparatively modest computational requirements. Storm is designed for reliability, so that events shouldn't be lost. Storm solutions can also provide guaranteed processing of data, with the ability to replay data that wasn't successfully processed the first time. Storm can interoperate with a variety of event sources, including Azure Event Hubs, Azure IoT Hub, Apache Kafka, and RabbitMQ (a message queuing service). Storm can also write to data stores such as HDFS, Hive, HBase, Redis, and SQL databases. You write a Storm application using the APIs provided by Apache.

It is optimized to support append operations. You can only add blocks to the end of one of these; updating or deleting existing blocks isn't supported. Each block can vary in size, up to 4 MB. The maximum size is just over 195 GB.

Append blobs

Describe how you might use the components of HDInsight in a data warehousing solution.

As well as Spark, HDInsight supports streaming technologies such as Apache Kafka, and the Apache Hadoop processing model. The image below shows where you might use the components of HDInsight in a data warehousing solution. Note In this image, Hadoop is an open source framework that breaks large data processing problems down into smaller chunks and distributes them across a cluster of servers, similar to the way in which Synapse Analytics operates. Hive is a SQL-like query facility that you can use with an HDInsight cluster to examine data held in a variety of formats. You can use it to create, load, and query external tables, in a manner similar to PolyBase for Azure Synapse Analytics

comparison of Azure Analysis Services with Azure Synapse Analytics

Azure Analysis Services has significant functional overlap with Azure Synapse Analytics, but it's more suited for processing on a smaller scale. Use Azure Synapse Analytics for: - Very high volumes of data (multi-terabyte to petabyte sized datasets). - Very complex queries and aggregations. - Data mining, and data exploration. - Complex ETL operations. ETL stands for Extract, Transform, and Load, and refers to the way in which you can retrieve raw data from multiple sources, convert this data into a standard format, and store it. - Low to mid concurrency (128 users or fewer). Use Azure Analysis Services for: - Smaller volumes of data (a few terabytes). - Multiple sources that can be correlated. - High read concurrency (thousands of users). - Detailed analysis, and drilling into data, using functions in Power BI. - Rapid dashboard development from tabular data.

Azure Data Lake Storage is essentially an extension of what?

Azure Blob storage, organized as a near-infinite file system.

a data ingestion and transformation service that allows you to load raw data from many different sources, both on-premises and in the cloud. As it ingests the data, it can clean, transform, and restructure the data, before loading it into a repository

Azure Data Factory

a service that can ingest large amounts of raw, unorganized data from relational and non-relational systems, and convert this data into meaningful information. It provides a scalable and programmable ingestion engine that you can use to implement complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

Azure Data Factory

Azure provides a collection of what services you can use to build a modern data warehouse solution?

Azure Data Factory, Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Azure Analysis Services. You can use tools such as Power BI to analyze and visualize the data, generating reports, charts, and dashboards.

popular tools used with Azure to ingest data

Azure Data Factory, PolyBase, SQL Server Integration Services, and Azure Databricks.

Azure Data Lake Analytics

Azure Data Lake Analytics is an on-demand analytics job service that you can use to process big data. It provides a framework and set of tools that you use to analyze data held in Microsoft Azure Data Lake Store, and other repositories. You write jobs that contain queries to transform data and extract insights. You define a job using a language called U-SQL. This is a hybrid language that takes features from both SQL and C#, and provides declarative and procedural capabilities that you can use to process data.

Azure Data Lake Store security

Azure Data Lake Store provides granular security over data, using Access Control Lists. An Access Control List specifies which accounts can access which files and folders in the store. If you are more familiar with Linux, you can use POSIX-style permissions to grant read, write, and search access based on file ownership and group membership of users.

What is Azure Data Lake and what are its three main elements?

Azure Data Lake is a collection of analytics and storage services that you can combine to implement a big data solution. It comprises three main elements: Data Lake Store Data Lake Analytics HDInsight

How can you run on-premises SSIS packages from the cloud, or run Azure Data Factory pipelines from SSIS on-premises?

Azure Data factory allows you to run your existing SSIS packages as part of a pipeline in the cloud. This allows you to get started quickly without having to rewrite your existing transformation logic. The SSIS Feature Pack for Azure is an extension that provides components that connect to Azure services, transfer data between Azure and on-premises data sources, and process data stored in Azure. The components in the feature pack support transfer to or from Azure storage, Azure Data Lake, and Azure HDInsight. Using these components, you can perform large-scale processing of ingested data.

Which two Azure relational databased are provisioned in a similar manner?

Azure Database for PostgreSQL Azure Database for MySQL

Benefits of Azure Database for PostgreSQL

Azure Database for PostgreSQL is a highly available service. It contains built-in failure detection and failover mechanisms. Users of PostgreSQL will be familiar with the pgAdmin tool, which you can use to manage and monitor a PostgreSQL database. You can continue to use this tool to connect to Azure Database for PostgreSQL. However, some server-focused functionality, such as performing server backup and restore, are not available because the server is managed and maintained by Microsoft. Azure Database for PostgreSQL servers records information about the queries run against databases on the server, and saves them in a database named azure_sys. You query the query_store.qs_view view to see this information, and use it to monitor the queries that users are running. This information can prove invaluable if you need to fine-tune the queries performed by your applications.

What is Azure Databricks?

Azure Databricks is an analytics platform optimized for the Microsoft Azure cloud services platform. Databricks is based on Spark, and is integrated with Azure to streamline workflows. It provides an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Databricks can process data held in many different types of storage, including Azure Blob storage, Azure Data Lake Store, Hadoop storage, flat files, SQL databases, and data warehouses, and Azure services such as Cosmos DB. Databricks can also process streaming data. For example, you could capture data being streamed from sensors and other devices.

What is Azure Databricks?

Azure Databricks is an analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Databricks can process data held in many different types of storage, including Azure Blob storage, Azure Data Lake Store, Hadoop storage, flat files, databases, and data warehouses. Databricks can also process streaming data. Databricks uses an extensible architecture based on drivers. A driver is a piece of code that connects to a specific data source and enables you to read and write that source. A driver is typically provided as part of a library that you can load into the Databricks environment. Drivers are available for many Azure services, including Azure SQL Database, Azure Cosmos DB, Azure Blob storage, and Azure Data Lake storage, as well as many services and databases produced by third-parties, such as MySQL and PostgreSQL.

How you code Databricks

Azure Databricks provides a graphical user interface where you can define and test your processing step by step, before submitting it as a set of batch tasks.

Azure HDInsight

Azure HDInsight is a managed analytics service in the cloud. It's based on Apache Hadoop, a collection of open-source tools and utilities that enable you to run processing tasks over large amounts of data. HDInsight uses a clustered model, similar to that of Synapse Analytics. HDInsight stores data using Azure Data Lake storage. You can use HDInsight to analyze data using frameworks such as Hadoop Map/Reduce, Apache Spark, Apache Hive, Apache Kafka, Apache Storm, R, and more. Hadoop Map/Reduce uses a simple framework to split a task over a large dataset into a series of smaller tasks over subsets of the data that can be run in parallel, and the results then combined. You write your Map/Reduce code in a language such as Java, and then submit this code as a job to the Hadoop cluster. Hadoop Map/Reduce has largely been replaced by Spark, which offers a more advanced set of operations and a simpler interface. Like Map/Reduce jobs, Spark jobs are parallelized into a series of subtasks tasks that run on the cluster. You can write Spark jobs as part of an application, or you can use interactive notebooks. These notebooks are the same as those that you can run from Azure Databricks. Spark includes libraries that you can use to read and write data in a wide variety of data stores (not just HDFS). For example, you can connect to relational databases such as Azure SQL Database, and other services such as Azure Cosmos DB.

Configure connectivity from private endpoints

Azure Private Endpoint is a network interface that connects you privately and securely to a service powered by Azure Private Link. Private Endpoint uses a private IP address from your VNet, effectively bringing the service into your VNet. The service could be an Azure service such as Azure Storage, Azure Cosmos DB, SQL, or your own Private Link Service. The Private endpoint connections page for a service allows you to specify which private endpoints, if any, are permitted access to your service. You can use the settings on this page, together with the Firewalls and virtual networks page, to completely lock down users and applications from accessing public endpoints to connect to your Cosmos DB account.

helps you manage who has access to Azure resources, and what they can do with those resources.

Azure Role-Based Access Control (RBAC)

relational databases offered as Azure PaaS

Azure SQL Database Azure Database for PostgreSQL Azure Database for MySQL Azure Database for MariaDB

an analytics engine designed to process large amounts of data very quickly. you can ingest data from external sources, such as flat files, Azure Data Lake, or other database management systems, and then transform and aggregate this data into a format suitable for analytics processing. You can perform complex queries over this data and generate reports, graphs, and charts.

Azure Synapse Analytics

what you should keep in mind about Azure Synapse Analytics consumption

Azure Synapse Analytics can consume a lot of resources. If you aren't planning on performing any processing for a while, you can pause the service. This action releases the resources in the pool to other users, and reduces your costs.

Azure Synapse Analytics

Azure Synapse Analytics is an integrated analytics service that allows organizations to gain insights quickly from all their data at any hyperscale, from both data warehouses and big data analytics systems. Azure Synapse is composed of the following elements: Synapse SQL pool: This is a collection of servers running Transact-SQL. Transact-SQL is the dialect of SQL used by Azure SQL Database, and Microsoft SQL Server. You write your data processing logic using Transact-SQL. Synapse Spark pool: This is a cluster of servers running Apache Spark to process data. You write your data processing logic using one of the four supported languages: Python, Scala, SQL, and C# (via .NET for Apache Spark). Spark pools support Azure Machine Learning through integration with the SparkML and AzureML packages. Synapse Pipelines: A Synapse pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data. For example, you might use a copy activity to transform data from a source dataset to a destination dataset. You could include activities that transform the data as it is transferred, or you might combine data from multiple sources together. Synapse Link: This component allows you to connect to Cosmos DB. You can use it to perform near real-time analytics over the operational data stored in a Cosmos DB database. Synapse Studio: This is a web user interface that enables data engineers to access all the Synapse Analytics tools. You can use Synapse Studio to create SQL and Spark pools, define and run pipelines, and configure links to external data sources.

running queries in Azure Synapse Analytics

Azure Synapse Analytics is designed to run queries over massive datasets. You can manually scale the SQL pool up to 60 nodes. You can also pause a SQL pool if you don't require it for a while. Pausing releases the resources associated with the pool. You aren't charged for these resources until you manually resume the pool. However, you can't run any queries until the pool is resumed. Resuming a pool can take several minutes.

How to provision Blob storage in a storage account using the Azure portal

Blobs are stored in containers, and you create containers after you've created a storage account. In the Azure portal, you can add a container using the features on the Overview page for your storage account. The Containers page enables you to create and manage containers. Each container must have a unique name within the storage account. You can also specify the access level. By default, data held in a container is only accessible by the container owner. You can set the access level to Blob to enable public read access to any blobs created in the container, or Container to allow read access to the entire contents of the container, including the ability to list all blobs. You can also configure role-based access control for a blob if you need a more granular level of security. Once you've provisioned a container, your applications can upload blobs into the container

Azure currently supports three different types of blob:

Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 100 MB. A block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently. Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines. Append blobs. An append blob is a block blob optimized to support append operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.

default management of encryption keys for storage accounts.

By default, encryption is performed using keys managed and owned by Microsoft. If you prefer, you can provide your own encryption keys.

Describe the features of CosmosDB that make it easy to manage

Cosmos DB is a highly scalable database management system. Cosmos DB automatically allocates space in a container for your partitions, and each partition can grow up to 10 GB in size. Indexes are created and maintained automatically. There's virtually no administrative overhead. To ensure availability, all databases are replicated within a single region. This replication is transparent, and failover from a failed replica is automatic. Cosmos DB guarantees 99.99% high availability. Additionally, you can choose to replicate data across regions, at additional cost. This feature enables you to place copies of data anywhere in the world, and enable applications to connect to the copy of the data that happens to be the closest, reducing query latency. All replicas are synchronized, although there may be a small window while updates are transmitted and applied. The multi-master replication protocol supports five well-defined consistency choices - strong, bounded staleness, session, consistent prefix, and eventual.

Describe compliance and security characteristics of CosmosDB

Cosmos DB is certified for a wide array of compliance standards. Additionally, all data in Cosmos DB is encrypted at rest and in motion. Cosmos DB provides row level authorization and adheres to strict security standards.

How does indexing work in CosmosDB?

Cosmos DB maintains a separate index. This index contains not only the document IDs, but also tracks the value of every other field in each document. This index is created and maintained automatically. This index enables you to perform queries that specify criteria referencing any fields in a container, without incurring the need to scan the entire partition to find that data.

Data Lake Store

Data Lake Store provides a file system that can store near limitless quantities of data. It uses a hierarchical organization (like the Windows and Linux file systems), but you can hold massive amounts of raw data (blobs) and structured data. It is optimized for analytics workloads.

describe the clustered topology used by services such as Azure SQL Database

Each server and database is transparently replicated to ensure that a server is always accessible, even in the event of a database or server failure.

Cosmos DB enables you to specify how such inconsistencies should be handled. It provides what options?

Eventual. This option is the least consistent. It's based on the situation just described. Changes won't be lost, they'll appear eventually, but they might not appear immediately. Additionally, if an application makes several changes, some of those changes might be immediately visible, but others might be delayed; changes could appear out of order. Consistent Prefix. This option ensures that changes will appear in order, although there may be a delay before they become visible. In this period, applications may see old data. Session. If an application makes a number of changes, they'll all be visible to that application, and in order. Other applications may see old data, although any changes will appear in order, as they did for the Consistent Prefix option. This form of consistency is sometimes known as read your own writes. Bounded Staleness. There's a lag between writing and then reading the updated data. You specify this staleness either as a period of time, or number of previous versions the data will be inconsistent for. Strong: In this case, all writes are only visible to clients after the changes are confirmed as written successfully to all replicas. This option is unavailable if you need to distribute your data across multiple global regions. Eventual consistency provides the lowest latency and least consistency. Strong consistency results in the highest latency but also the greatest consistency. You should select a default consistency level that balances the performance and requirements of your applications.

Example of using Data Factory

For example, imagine a gaming company that collects petabytes of game logs that are produced by games in the cloud. The company wants to analyze these logs to gain insights into customer preferences, demographics, and usage behavior. It also wants to identify up-sell and cross-sell opportunities, develop compelling new features, drive business growth, and provide a better experience to its customers. To analyze these logs, the company needs to use reference data such as customer information, game information, and marketing campaign information that is in an on-premises data store. The company wants to utilize this data from the on-premises data store, combining it with additional log data that it has in a cloud data store. To extract insights, the company wants to process the joined data by using a Spark cluster in the cloud (using Azure HDInsight), and publish the transformed data into a cloud data warehouse such as Azure Synapse Analytics. The company can use the information in the data warehouse generate and publish reports. They want to automate this workflow, and monitor and manage it on a daily schedule. They also want to execute it when files land in a blob store container. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from the disparate data stores used by the gaming company. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight, Azure Databricks, and Azure SQL Database. You can then publish the transformed data to Azure Synapse Analytics for business intelligence applications to consume.

Give examples of some of the problems encountered in a non-normalized database

For example, in a database holding customer information, how do you handle customers that have more than one address? Do you add columns to hold the details for each address? If so, how many of these columns should you add? If you allow for three addresses, what happens if a customer has only one address? What do you store in the spare columns? What then happens if you suddenly have a customer with four addresses? Similarly, what information do you store in an address (street name, house number, city, zip code)? What happens if a house has a name rather than a number, or is located somewhere that doesn't use zip codes?

benefits of Azure Database for MySQL

High availability features built-in. Predictable performance. Easy scaling that responds quickly to demand. Secure data, both at rest and in motion. Automatic backups and point-in-time restore for the last 35 days. Enterprise-level security and compliance with legislation. pay-as-you-go pricing monitoring functionality to add alerts, and to view metrics and logs

How Power BI matches your role

How you use Power BI might depend on your role on a project or a team. And other people, in other roles, might use Power BI differently, which is just fine. For example, you might view reports and dashboards in the Power BI service, and that might be all you do with Power BI. But your number-crunching, business-report-creating coworker might make extensive use of Power BI Desktop (and publish Power BI Desktop reports to the Power BI service, which you then use to view them). And another coworker, in sales, might mainly use her Power BI phone app to monitor progress on her sales quotas and drill into new sales lead details. You also might use each element of Power BI at different times, depending on what you're trying to achieve, or what your role is for a given project or effort. Perhaps you view inventory and manufacturing progress in a real-time dashboard in the service, and also use Power BI Desktop to create reports for your own team about customer engagement statistics. How you use Power BI can depend on which feature or service of Power BI is the best tool for your situation. But each part of Power BI is available to you, which is why it's so flexible and compelling.

How is Azure Synapse Analytics different than an ordinary SQL Server database?

However, unlike an ordinary SQL Server database engine, Azure Synapse Analytics can receive data from a wide variety of sources. To do this, Azure Synapse Analytics uses a technology named PolyBase. PolyBase enables you to retrieve data from relational and non-relational sources, such as delimited text files, Azure Blob Storage, and Azure Data Lake Storage. You can save the data read in as SQL tables within the Synapse Analytics service.

SQL keywords used to insert a record

INSERT INTO VALUES

SQL Server on Virtual Machines is what approach?

IaaS

two options when moving operations and databases to the cloud

IaaS PaaS

how to run a query using sqlcmd

If the sign-in command succeeds, you'll see a 1> prompt. You can enter SQL commands, then type GO on a line by itself to run them

how distributed databases handle updates

If you require transactional consistency in this scenario, locks may be retained for a very long time, especially if there's a network failure between databases at a critical point in time. To counter this problem, many distributed database management systems relax the strict isolation requirements of transactions and implement "eventual consistency." In this form of consistency, as an application writes data, each change is recorded by one server and then propagated to the other servers in the distributed database system asynchronously. While this strategy helps to minimize latency, it can lead to temporary inconsistencies in the data. Eventual consistency is ideal where the application doesn't require any ordering guarantees. Examples include counts of shares, likes, or non-threaded comments in a social media system.

How to provision Data Lake storage in a storage account using the Azure portal

If you're provisioning a Data Lake storage, you must specify the appropriate configuration settings when you create the storage account. You can't configure Data Lake storage after the storage account has been set up. In the Azure portal, on the Advanced tab of the Create storage account page, in the Data Lake Storage Gen2 section, select Enabled for the Hierarchical namespace option. After the storage account has been created, you can add one or more Data Lake Storage containers to the account. Each container supports a directory structure for storing Data Lake files.

describe the relationship between document and an account in CosmosDB

In Cosmos DB, you organize your data as a collection of documents stored in containers. Containers are held in a database. A database runs in the context of a Cosmos DB account. You must create the account before you can set up any databases.

Power BI report

In Power BI, a report is a collection of visualizations that appear together on one or more pages. Just like any other report you might create for a sales presentation or write for a school assignment, a report in Power BI is a collection of items that are related to each other. The following image shows a report in Power BI Desktop—in this case, it's the second page in a five-page report. You can also create reports in the Power BI service. Reports let you create many visualizations, on multiple pages if necessary, and let you arrange those visualizations in whatever way best tells your story. You might have a report about quarterly sales, product growth in a particular segment, or migration patterns of polar bears. Whatever your subject, reports let you gather and organize your visualizations onto one page (or more).

Power BI tile

In Power BI, a tile is a single visualization on a report or a dashboard. It's the rectangular box that holds an individual visual. In the following image, you see one tile, which is also surrounded by other tiles. When you're creating a report or a dashboard in Power BI, you can move or arrange tiles however you want. You can make them bigger, change their height or width, and snuggle them up to other tiles. When you're viewing, or consuming, a dashboard or report—which means you're not the creator or owner, but the report or dashboard has been shared with you—you can interact with it, but you can't change the size of the tiles or their arrangement.

connect to a database and run a query in SQL Server Data Tools

In Visual Studio, on the Tools menu, select SQL Server, and then select New Query. In the Connect dialog box, enter the following information, and then select Connect: Server name The fully qualified server name, from the Overview page in the Azure portal Authentication SQL Server Authentication Login The user ID of the server admin account used to create the server Password Server admin account password Database Name Your database name

Describe using a SQL Pool computational model in Azure Synapse Analytics

In a SQL pool, each compute node uses an Azure SQL Database and Azure Storage to handle a portion of the data. You submit queries in the form of Transact-SQL statements, and Azure Synapse Analytics runs them. However, unlike an ordinary SQL Server database engine, Azure Synapse Analytics can receive data from a wide variety of sources. To do this, Azure Synapse Analytics uses a technology named PolyBase. PolyBase enables you to retrieve data from relational and non-relational sources, such as delimited text files, Azure Blob Storage, and Azure Data Lake Storage. You can save the data read in as SQL tables within the Synapse Analytics service. You specify the number of nodes when you create a SQL pool. You can scale the SQL pool manually to add or remove compute nodes as necessary. Note You can only scale a SQL pool when it's not running a Transact-SQL query.

how a Spark Pool computational model works in Azure Synapse Analytics

In a Spark pool, the nodes are replaced with a Spark cluster. You run Spark jobs comprising code written in Notebooks, in the same way as Azure Databricks. You can write the code for notebook in C#, Python, Scala, or Spark SQL (a different dialect of SQL from Transact-SQL). As with a SQL pool, the Spark cluster splits the work out into a series of parallel tasks that can be performed concurrently. You can save data generated by your notebooks in Azure Storage or Data Lake Storage. Note Spark is optimized for in-memory processing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications, but requires additional memory resources. You specify the number of nodes when you create the Spark cluster. Spark pools can have autoscaling enabled, so that pools scale by adding or removing nodes as needed. Autoscaling can occur while processing is active.

how a point query works efficiently

In a point query, when an application retrieves a single row, the partition key enables Azure to quickly hone in on the correct partition, and the row key lets Azure identify the row in that partition. You might have hundreds of millions of rows, but if you've defined the partition and row keys carefully when you designed your application, data retrieval can be very quick. The partition key and row key effectively define a clustered index over the data.

things to consider when indexing

In a table that is read only, or that contains data that is modified infrequently, more indexes will improve query performance. If a table is queried infrequently, but subject to a large number of inserts, updates, and deletes (such as a table involved in OLTP), then creating indexes on that table can slow your system down.

proxy (see image)

In computer networking, a proxy server is a server application or appliance that acts as an intermediary for requests from clients seeking resources from servers that provide those resources.[1] A proxy server thus functions on behalf of the client when requesting service, potentially masking the true origin of the request to the resource server. Instead of connecting directly to a server that can fulfill a requested resource, such as a file or web page for example, the client directs the request to the proxy server, which evaluates the request and performs the required network transactions. This serves as a method to simplify or control the complexity of the request,[2] or provide additional benefits such as load balancing, privacy, or security. Proxies were devised to add structure and encapsulation to distributed systems.[3]

IaaS

Infrastructure-as-a-Service. a virtual infrastructure in the cloud that mirrors the way an on-premises data center might work. You can create a set of virtual machines, connect them together using a virtual network, and add a range of virtual devices. In many ways, this approach is similar to the way in which you run your systems inside an organization, except that you don't have to concern yourself with buying or maintaining the hardware run any software for which you have the appropriate licenses using this approach. You're not restricted to any specific database management system. still responsible for many of the day-to-day operations, such as installing and configuring the software, patching, taking backups, and restoring data when needed. You can think of IaaS as a half-way-house to fully managed operations in the cloud; you don't have to worry about the hardware, but running and managing the software is still very much your responsibility.

Cosmos DB is highly suitable for the following scenarios:

IoT and telematics. These systems typically ingest large amounts of data in frequent bursts of activity. Cosmos DB can accept and store this information very quickly. The data can then be used by analytics services, such as Azure Machine Learning, Azure HDInsight, and Power BI. Additionally, you can process the data in real-time using Azure Functions that are triggered as data arrives in the database. Retail and marketing. Microsoft uses CosmosDB for its own e-commerce platforms that run as part of Windows Store and Xbox Live. It's also used in the retail industry for storing catalog data and for event sourcing in order processing pipelines. Gaming. The database tier is a crucial component of gaming applications. Modern games perform graphical processing on mobile/console clients, but rely on the cloud to deliver customized and personalized content like in-game stats, social media integration, and high-score leaderboards. Games often require single-millisecond latencies for reads and write to provide an engaging in-game experience. A game database needs to be fast and be able to handle massive spikes in request rates during new game launches and feature updates. Web and mobile applications. Azure Cosmos DB is commonly used within web and mobile applications, and is well suited for modeling social interactions, integrating with third-party services, and for building rich personalized experiences. The Cosmos DB SDKs can be used to build rich iOS and Android applications using the popular Xamarin framework.

main aggregate functions in SQL

MAX (which returns the largest value in a column) AVG (which returns the average value, but only if the column contains numeric data) SUM (which returns the sum of all the values in the column, but only if the column is numeric).

common relational database management systems that use SQL include

Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, and Oracle

Single Database SQL Server

Microsoft manages the server, so all you have to do is configure the database, create your tables, and populate them with your data

Azure File Storage is designed to support what scenarios?

Migrate existing applications to the cloud. Many existing applications access data using file-based APIs, and are designed to share data using SMB file shares. Azure File Storage enables you to migrate your on-premises file or file share-based applications to Azure without having to provision or manage highly available file server virtual machines. Share server data across on-premises and cloud. Customers can now store server data such as log files, event data, and backups in the cloud to leverage the availability, durability, scalability, and geo redundancy built into the Azure storage platform. With encryption in SMB 3.0, you can securely mount Azure File Storage shares from anywhere. Applications running in the cloud can share data with on-premises applications using the same consistency guarantees implemented by on-premises SMB servers. Integrate modern applications with Azure File Storage. By leveraging the modern REST API that Azure File Storage implements in addition to SMB 3.0, you can integrate legacy applications with modern cloud applications, or develop new file or file share-based applications. Simplify hosting High Availability (HA) workload data. Azure File Storage delivers continuous availability so it simplifies the effort to host HA workload data in the cloud. The persistent handles enabled in SMB 3.0 increase availability of the file share, which makes it possible to host applications such as SQL Server and IIS in Azure with data stored in shared file storage.

What is the data storage architecture of most organizations?

Most organizations have multiple data stores, often with different structures and varying formats They often have live, incoming streams of data, such as sensor data, that can be expensive to analyze. There's often a plethora of useful information available outside of organizations. This information could be combined with local data to add insights and enrich understanding. By combining all local data with useful external information, it's often possible to gain insights into the data that weren't previously possible.

Spark notebooks in Azure Synapse Analytics

Notebooks also allow you to visualize data through graphs, and transform data as it's loaded. The data can then be used by Spark Machine Learning (SparkML) and Azure Machine Learning (AzureML) to train machine learning models that support artificial intelligence.

name for the work performed by transactional systems

Online Transactional Processing (OLTP)

connect to a database and run a query using SQL Server Management Studio

Open SQL Server Management Studio. When the Connect to Server dialog box appears, enter the following information: Server type Database engine Server name The fully qualified server name, from the Overview page in the Azure portal Authentication SQL Server Authentication Login The user ID of the server admin account used to create the server. Password Server admin account password Select Connect. The Object Explorer window opens. To view the database's objects, expand Databases and then expand your database node. On the toolbar, select New Query to open a query window. Enter your SQL statements, and then select Execute to run queries and retrieve data from the database tables.

What is MariaDB compatible with

Oracle Database

the process of directing and controlling other services, and connecting them together, to allow data to flow between them. Data Factory uses this to combine and automate sequences of tasks that use different services to perform complex operations.

Orchestration

why it's important to have a flexible approach to data ingestion into an Azure data store.

Organizations often have numerous, disparate data sources. To deliver a full cloud solution, Azure offers many ways to ingest data.

Azure RBAC built-in roles

Owner - Has full access to all resources including the right to delegate access to others. Contributor - Can create and manage all types of Azure resources but can't grant access to others. Reader- Can view existing Azure resources. User Access Administrator - Lets you manage user access to Azure resources.

It is organized as a collection of fixed size 512-bytes. It is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. Each one can hold up to 8 TB of data. Azure uses these to implement virtual disk storage for virtual machines.

Page blobs

What is partitioning in Azure Table Storage?

Partitioning is a mechanism for grouping related rows, based on a common property or partition key. Rows that share the same partition key will be stored together. Partitioning not only helps to organize data, it can also improve scalability and performance: Partitions are independent from each other, and can grow or shrink as rows are added to, or removed from, a partition. A table can contain any number of partitions. When you search for data, you can include the partition key in the search criteria. This helps to narrow down the volume of data to be examined, and improves performance by reducing the amount of I/O (reads and writes) needed to locate the data.

PaaS

Platform-as-a-service. creates a virtual infrastructure, and installs and manages the database software

a feature of SQL Server and Azure Synapse Analytics that enables you to run Transact-SQL queries that read data from external data sources. It makes these external data sources appear like tables in a SQL database. Using it, you can read data managed by Hadoop, Spark, and Azure Blob Storage, as well as other database management systems such as Cosmos DB, Oracle, Teradata, and MongoDB. It enables you to transfer data from an external data source into a table, as well as copy data from an external data source in Azure SYnapse Analytics or SQL Server. You can also run queries that join tables in a SQL database with external data, enabling you to perform analytics that span multiple data stores. Azure Data Factory provides PolyBase support for loading data. For instance, Data Factory can directly invoke PolyBase on your behalf if your data is in a PolyBase-compatible data store.

PolyBase

the act of running series of tasks that a service provider, such as Azure Cosmos DB, performs to create and configure a service. Behind the scenes, the service provider will set up the various resources (disks, memory, CPUs, networks, and so on) required to run the service. You'll be assigned these resources, and they remain allocated to you (and charged to you), until you delete the service. all you do is specify parameters that determine the size of the resources required (how much disk space, memory, computing power, and network bandwidth). These parameters are determined by estimating the size of the workload that you intend to run using the service. In many cases, you can modify these parameters after the service has been created, perhaps increasing the amount of storage space or memory if the workload is greater than you initially anticipated.

Provisioning

connection policy where the connection is established via the gateway, and all subsequent requests flow through the gateway. Each request could (potentially) be serviced by a different database in the cluster.

Proxy

default connection policy if you are connecting from outside Azure, such as an on-premise application

Proxy

the policy that means that after your application establishes a connection to the Azure SQL database through the gateway, all following requests from your application will go directly to the database rather than through the gateway

Redirect

In CosmosDB, resources are allocated in terms of what?

Resources are allocated in terms of the storage space required to hold your databases and containers, and the processing power required to store and retrieve data. Azure Cosmos DB uses the concept of Request Units per second (RU/s) to manage the performance and cost of databases. This measure abstracts the underlying physical resources that need to be provisioned to support the required performance.

common levels of access

Restricted (no access) Read-only access means the users can read data but can't modify any existing data or create new data. Read/write access gives users the ability to view and modify existing data. Owner privilege gives full access to the data including managing the security like adding new users and removing access to existing users.

To connect to Azure Database for PostgreSQL using psql,

Run the following command. Make sure to replace the server name and admin name with the values from the Azure portal. psql --host=<server-name>.postgres.database.azure.com --username=<admin-user>@<server-name> --dbname=postgres Enter your password when prompted. If your connection is successful, you'll see the prompt postgres=>. You can create a new database with the following SQL command: CREATE DATABASE "Adventureworks"; Inside psql, you can run the command \c Adventureworks to connect to the database.

four main DML statements

SELECT Select/Read rows from a table INSERT Insert new rows into a table UPDATE Edit/Update existing rows DELETE Delete existing rows in a table

key words used to to accomplish almost everything that one needs to do with a database

SELECT, INSERT, UPDATE, DELETE, CREATE, and DROP

The APIs that Cosmos DB currently supports include:

SQL API. This interface provides a SQL-like query language over documents, enable to identify and retrieve documents using SELECT statements. Table API. This interface enables you to use the Azure Table Storage API to store and retrieve documents. The purpose of this interface is to enable you to switch from Table Storage to Cosmos DB without requiring that you modify your existing applications. MongoDB API. MongoDB is another well-known document database, with its own programmatic interface. Many organizations run MongoDB on-premises. You can use the MongoDB API for Cosmos DB to enable a MongoDB application to run unchanged against a Cosmos DB database. You can migrate the data in the MongoDB database to Cosmos DB running in the cloud, but continue to run your existing applications to access this data. Cassandra API. Cassandra is a column family database management system. This is another database management system that many organizations run on-premises. The Cassandra API for Cosmos DB provides a Cassandra-like programmatic interface for Cosmos DB. Cassandra API requests are mapped to Cosmos DB document requests. As with the MongoDB API, the primary purpose of the Cassandra API is to enable you to quickly migrate Cassandra databases and applications to Cosmos DB. Gremlin API. The Gremlin API implements a graph database interface to Cosmos DB. A graph is a collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you to perform graph queries over data. Using the Gremlin API you can walk through the objects and relationships in the graph to discover all manner of complex relationships, such as "What is the name of the pet of Sam's landlord?" in the graph shown here (image).

business benefits of Managed Instance option in Azure SQL Database

SQL Database managed instance provides all the management and security benefits available when using Single Database and Elastic Pool. Managed instance deployment enables a system administrator to spend less time on administrative tasks because the SQL Database service either performs them for you or greatly simplifies those tasks. Automated tasks include operating system and database management system software installation and patching, dynamic instance resizing and configuration, backups, database replication (including system databases), high availability configuration, and configuration of health and performance monitoring data streams. Managed instance has near 100% compatibility with SQL Server Enterprise Edition, running on-premises. The SQL Database managed instance deployment option supports traditional SQL Server Database engine logins and logins integrated with Azure Active Directory (AD). Traditional SQL Server Database engine logins include a username and a password. You must enter your credentials each time you connect to the server. Azure AD logins use the credentials associated with your current computer sign-in, and you don't need to provide them each time you connect to the server.

What is SSIS?

SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. You can use SSIS to solve complex business problems by copying or downloading files, loading data warehouses, cleaning and mining data, and managing SQL database objects and data. SSIS is part of Microsoft SQL Server. SSIS can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations. SSIS includes a rich set of built-in tasks and transformations, graphical tools for building packages, and the Integration Services Catalog database, where you store, run, and manage packages. A package is an organized collection of connections, control flow elements, data flow elements, event handlers, variables, parameters, and configurations, that you assemble using either the graphical design tools that SQL Server Integration Services provides, or build programmatically. You then save the completed package to SQL Server, the Integration Services Package Store, or the file system. You can use the graphical SSIS tools to create solutions without writing a single line of code. You can also program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.

Azure Synapse Analytics supports what two computational models?

SQL pools and Spark pools.

used to perform tasks such as update data in a database, or retrieve data from a database

SQL statements

how to connect to a database using Azure Data Studio

Select Create a connection to open the Connection pane: Fill in the following fields using the server name, user name, and password for your Azure SQL Server: Server name The fully qualified server name. You can find the server name in the Azure portal, as described earlier. Authentication SQL Login or Windows Authentication. Unless you're using Azure Active Directory, select SQL Login. User name The server admin account user name. Specify the user name from the account used to create the server. Password The password you specified when you provisioned the server. Database name The name of the database to which you wish to connect. Server Group If you have many servers, you can create groups to help categorize them. These groups are for convenience in Azure Data Studio, and don't affect the database or server in Azure. Select Connect. If your server doesn't have a firewall rule allowing Azure Data Studio to connect, the Create new firewall rule form opens. Complete the form to create a new firewall rule. For details, see Create a server-level firewall rule using the Azure portal. After successfully connecting, your server is available in the SERVERS sidebar on the Connections page. You can now use the New Query command to create and run scripts of SQL commands.

How to create a table in Azure Table Storage

Sign into the Azure portal using your Azure account. On the home page of the Azure portal, select +Create a resource. On the New page, select Storage account - blob, file, table, queue On the Create storage account page, enter the following details, and then select Review + create. Subscription Select your Azure subscription Resource group Select Create new, and specify the name of a new Azure resource group. Use a name of your choice, such as mystoragegroup Storage account name Enter a name of your choice for the storage account. The name must be unique though Location Select your nearest location Performance Standard Account kind StorageV2 (general purpose v2) Replication Read-access geo-redundant storage (RA-GRS) Access tier Hot On the validation page, click Create, and wait while the new storage account is configured. When the Your deployment is complete page appears, select Go to resource. On the Overview page for the new storage account, select Tables. On the Tables page, select + Table. In the Add table dialog box, enter testtable for the name of the table, and then select OK. When the new table has been created, select Storage Explorer. On the Storage Explorer page, expand Tables, and then select testtable. Select Add to insert a new entity into the table. Note In Storage Explorer, rows are also called entities. In the Add Entity dialog box, enter your own values for the PartitionKey and RowKey properties, and then select Add Property. Add a String property called Name and set the value to your name. Select Add Property again, and add a Double property (this is numeric) named Age, and set the value to your age. Select Insert to save the entity.

deployment options for Azure Database fpr {pstgreSQ:

Single-server Hyperscale

Spark pool autoscaling and shut down

Spark pools can have autoscaling enabled, so that pools scale by adding or removing nodes as needed. Also, Spark pools can be shut down with no loss of data since all the data is stored in Azure Storage or Data Lake Storage.

Spark pools in Synapse Analytics are especially suitable for the following scenarios:

Spark pools in Synapse Analytics are especially suitable for the following scenarios: Data Engineering/Data Preparation. Apache Spark includes many language features to support preparation and processing of large volumes of data so that it can be made more valuable and then consumed by other services within Synapse Analytics. This is enabled through the Spark libraries that support processing and connectivity. Machine Learning. Apache Spark comes with MLlib, a machine learning library built on top of Spark that you can use from a Spark pool in Synapse Analytics. Spark pools in Synapse Analytics also include Anaconda, a Python distribution with a variety of packages for data science including machine learning. When combined with built-in support for notebooks, you have an environment for creating machine learning applications.

in-memory cluster computing with Spark pools

Spark pools provide the basic building blocks for performing in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications. Spark pools in Azure Synapse are compatible with Azure Storage and Azure Data Lake Storage, so you can use Spark pools to process your data stored in Azure.

Azure Table Storage is an excellent mechanism for:

Storing TBs of structured data capable of serving web scale applications. Examples include product catalogs for eCommerce applications, and customer information, where the data can be quickly identified and ordered by a composite key. In the case of a product catalog, the partition key could be the product category (such as footwear), and the row key identifies the specific product in that category (such as climbing boots). Storing datasets that don't require complex joins, foreign keys, or stored procedures, and that can be denormalized for fast access. In an IoT system, you might use Azure Table Storage to capture device sensor data. Each device could have its own partition, and the data could be ordered by the date and time each measurement was captured. Capturing event logging and performance monitoring data. Event log and performance information typically contain data that is structured according to the type of event or performance measure being recorded. The data could be partitioned by event or performance measurement type, and ordered by the date and time it was recorded. Alternatively, you could partition data by date, if you need to analyze an ordered series of events and performance measures chronologically. If you want to analyze data by type and date/time, then consider storing the data twice, partitioned by type, and again by date. Writing data is fast, and the data is static once it has been recorded.

PolyBase

Synapse Analytics uses a technology called PolyBase to make external data look like SQL tables. You can run queries against these tables directly, or you can transfer the data into a series of SQL tables managed by Synapse Analytics for querying later. Synapse uses Azure Storage to manage your data while it's being processed.

Spark pools in Azure Synapse Analytics

Synapse Spark runs clusters based on Apache Spark rather than Azure SQL Database. You write your analytics jobs as notebooks, using code written in Python, Scala, C#, or Spark SQL (this is a different dialect from Transact-SQL). You can combine code written in multiple languages in the same notebook. Note Spark pools and SQL pools can coexist in the same Azure Synapse Analytics instance.

Synapse Studio

Synapse Studio is a web interface that enables you to create pools and pipelines interactively. With Synapse Studio you can develop, test, and debug Spark notebooks and Transact-SQL jobs. You can monitor the performance of operations that are currently running, and you can manage the serverless or provisioned resources. All of these capabilities are accessed via the web-native Synapse Studio that allows for model management, monitoring, coding, and security.

uses for Synapse link

Synapse link has a wide range of uses, including: Supply chain analytics and forecasting. You can query operational data directly and use it to build machine learning models. You can use the results generated by these models back into Cosmos DB for near-real-time scoring. You can use these assessments to successively refine the models and generate more accurate forecasts. Operational reporting. You can use Synapse Analytics to query operational data using Transact-SQL running in a SQL pool. You can publish the results to dashboards using the support provided to familiar tools such as Microsoft Power BI. Batch data integration and orchestration. With supply chains getting more complex, supply chain data platforms need to integrate with a variety of data sources and formats. The Azure Synapse data integration engine allows data engineers to create rich data pipelines without requiring a separate orchestration engine. Real-time personalization. You can build engaging ecommerce solutions that allow retailers to generate personalized recommendations and special offers for customers in real time. IoT maintenance. Industrial IoT innovations have drastically reduced downtimes of machinery and increased overall efficiency across all fields of industry. One such innovation is predictive maintenance analytics for machinery at the edge of the cloud. The historical operational data from IoT device sensors could be used to train predictive models such as anomaly detectors. These anomaly detectors are then deployed back to the edge for real-time monitoring. Looping back allows for continuous retraining of the predictive models.

This tier provides the lowest storage cost, but with increased latency. This tier is intended for historical data that mustn't be lost, but is required only rarely. Blobs in this tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few milliseconds, but for this tier, it can take hours for the data to become available. To retrieve a blob from this tier, you must change the access tier to Hot or Cool. The blob will then be rehydrated. You can read the blob only when the rehydration process is complete.

The Archive tier.

Azure provides several tools you can use to provision services:

The Azure portal. This is the most convenient way to provision a service for most users. The Azure portal displays a series of service-specific pages that prompt you for the settings required, and validates these settings, before actually provisioning the service. The Azure command-line interface (CLI). The CLI provides a set of commands that you can run from the operating system command prompt or the Cloud Shell in the Azure portal. You can use these commands to create and manage Azure resources. The CLI is suitable if you need to automate service creation; you can store CLI commands in scripts, and you can run these scripts programmatically. The CLI can run on Windows, macOS, and Linux computers. For detailed information about the Azure CLI, read What is Azure CLI. Azure PowerShell. Many administrators are familiar with using PowerShell commands to script and automate administrative tasks. Azure provides a series of commandlets (Azure-specific commands) that you can use in PowerShell to create and manage Azure resources. You can find further information about Azure PowerShell online, at Azure PowerShell documentation. Like the CLI, PowerShell is available for Windows, macOS, and Linux. Azure Resource Manager templates. An Azure Resource Manager template describes the service (or services) that you want to deploy in a text file, in a format known as JSON (JavaScript Object Notation). The example below shows a template that you can use to provision an Azure Storage account. JSONCopy "resources": [ { "type": "Microsoft.Storage/storageAccounts", "apiVersion": "2016-01-01", "name": "mystorageaccount", "location": "westus", "sku": { "name": "Standard_LRS" }, "kind": "Storage", "properties": {} } ] You send the template to Azure using the az deployment group create command in the Azure CLI, or New-AzResourceGroupDeployment command in Azure PowerShell. For more information about creating and using Azure Resource Manager templates to provision Azure resources, see What are Azure Resource Manager templates?

the basic parameters for configuring an Azure SQL Database for PostgreSQL or MySQL resource?

The Basics tab, prompts for the following details: Subscription. Select your Azure subscription. Resource Group. Either pick an existing resource group, or select Create new to build a new one. Server Name. Each MySQL or PostgreSQL database must have a unique name that hasn't already been used by someone else. The name must be between 3 and 31 characters long, and can only contain lower case letters, digits, and the "-" character. Data Source. Select None to create a new server from scratch. You can select Backup if you're creating a server from a geo-backup of an existing Azure Database for MySQL server. Location. Either select the region that is nearest to you, or the region nearest to your users. Version. The version of MySQL or PostgreSQL to deploy. Compute + storage. The compute, storage, and backup configurations for your new server. The Configure server link enables you to select the resources required to support you database workloads. These resources include the amount of computing power, memory, backups, and redundancy options (for high availability). Basic. This tier is suitable for workloads that require light compute and I/O performance. Examples include servers used for development or testing or small-scale, infrequently used applications. General Purpose. Use this pricing tier for business workloads that require balanced compute and memory with scalable I/O throughput. Examples include servers for hosting web and mobile apps and other enterprise applications. Memory Optimized This tier supports high-performance database workloads that require in-memory performance for faster transaction processing and higher concurrency. Examples include servers for processing real-time data and high-performance transactional or analytical apps. You can fine-tune the resources available for the selected tier. You can scale these resources up later, if necessary. Admin username. A sign-in account to use when you're connecting to the server. The admin sign-in name can't be azure_superuser, admin, administrator, root, guest, or public. Password. Provide a new password for the server admin account. It must contain from 8 to 128 characters. Your password must contain characters from three of the following categories: English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, and so on).

Which page for a storage account enables you to modify some general settings of the account?

The Configuration page

IOPS

The Configure page displays the performance that General Purpose and Memory Optimized configurations provide in terms of IOPS. IOPS is an acronym for Input/Output Operations per seconds, and is a measure of the read and write capacity available using the configured resources.

Describe the control node and compute nodes in massive parallel processing architecture used in Azure Synapse Analytics

The Control node is the brain of the architecture. It's the front end that interacts with all applications. The MPP engine runs on the Control node to optimize and coordinate parallel queries. When you submit a processing request, the Control node transforms it into smaller requests that run against distinct subsets of the data in parallel. The Compute nodes provide the computational power. The data to be processed is distributed evenly across the nodes. Users and applications send processing requests to the control node. The control node sends the queries to compute nodes, which run the queries over the portion of the data that they each hold. When each node has finished its processing, the results are sent back to the control node where they're combined into an overall result.

You use this tier for blobs that are accessed frequently. The blob data is stored on high-performance media.

The Hot tier is the default.

three access tiers for blob storage

The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data is stored on high-performance media. The Cool tier. This tier has lower performance and incurs reduced storage charges compared to the Hot tier. Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs to be accessed frequently initially, but less so as time passes. In these situations, you can create the blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob form the Cool tier back to the Hot tier. The Archive tier. This tier provides the lowest storage cost, but with increased latency. The Archive tier is intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few milliseconds, but for the Archive tier, it can take hours for the data to become available. To retrieve a blob from the Archive tier, you must change the access tier to Hot or Cool. The blob will then be rehydrated. You can read the blob only when the rehydration process is complete.

how to completely lock down users and applications from accessing public endpoints to connect to your Azure SQL Database account.

The Private endpoint connections page for a service allows you to specify which private endpoints, if any, are permitted access to your service. You can use the settings on this page, together with the Firewalls and virtual networks page, to completely lock down users and applications from accessing public endpoints to connect to your Azure SQL Database account.

Azure File Storage offers what two performance tiers.

The Standard tier uses hard disk-based hardware in a datacenter Premium tier uses solid-state disks. The Premium tier offers greater throughput, but is charged at a higher rate.

Power BI canvas

The canvas (the area in the center of the Power BI service) shows you the available sources of data in the Power BI service. In addition to common data sources like Microsoft Excel files, databases, or Microsoft Azure data, Power BI can just as easily connect to a whole assortment of software services (also called SaaS providers or cloud services): Salesforce, Facebook, Google Analytics, and more.

specifications for table in Azure Table Storage

The columns in a table can hold numeric, string, or binary data up to 64 KB in size. A table can have to 252 columns, apart from the partition and row keys. The maximum row size is 1 MB. For more information, read

Describe how the query in the attached image retrieves all orders for Customer C1. The Orders table has an index on the Customer ID column.

The database management system can consult the index to quickly find all matching rows in the Orders table.

schema

The database schema of a database is its structure described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases). The formal definition of a database schema is a set of formulas (sentences) called integrity constraints imposed on a database.[citation needed] These integrity constraints ensure compatibility between parts of the schema. All constraints are expressible in the same language. A database can be considered a structure in realization of the database language.[1] The states of a created conceptual schema are transformed into an explicit mapping, the database schema. This describes how real-world entities are modeled in the database. "A database schema specifies, based on the database administrator's knowledge of possible applications, the facts that can enter the database, or those of interest to the possible end-users."[2] The notion of a database schema plays the same role as the notion of theory in predicate calculus. A model of this "theory" closely corresponds to a database, which can be seen at any instant of time as a mathematical object. Thus a schema can contain formulas representing integrity constraints specifically for an application and the constraints specifically for a type of database, all expressed in the same database language.[1] In a relational database, the schema defines the tables, fields, relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, materialized views, synonyms, database links, directories, XML schemas, and other elements. A database generally stores its schema in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure. In other words, schema is the structure of the database that defines the objects in the database. https://en.wikipedia.org/wiki/Database_schema

Azure Data Lake Analytics example

The example U-SQL block below reads data from a file named StockPrices.csv, which is held in a folder named StockMarket in Data Lake Storage. This is a text file that contains stock market information (tickers, and prices, and possibly other data), held in comma-separated format. The EXTRACT statement reads the file line by line and pulls out the data in the Ticker, and Price fields (it skips the first line, where a CSV file typically holds field name information rather than data). The SELECT statement calculates that maximum price for each ticker. The OUTPUT statement stores the results to another file in Data Lake Storage. It's important to understand that the U-SQL code only provides a description of the work to be performed. Azure Data Lake Analytics determines how best to actually carry out this work. Data Lake Analytics takes the U-SQL description of a job, parses it to make sure it is syntactically correct, and then compiles it into an internal representation. Data Lake Analytics then breaks down this internal representation into stages of execution. Each stage performs a task, such as extracting the data from a specified source, dividing the data into partitions, processing the data in each partition, aggregating the results in a partition, and then combining the results from across all partitions. Partitioning is used to improve parallelization, and the processing for different partitions is performed concurrently on different processing nodes. The data for each partition is determined by the U-SQL compiler, according to the way in which the job retrieves and processes the data. A U-SQL job can output results to a single CSV file, partition the results across multiple files, or can write to other destinations. For example, Data Lake Analytics enables you to create custom outputters if you want to save data in a particular format (such as XML or HTML). You can also write data to the Data Lake Catalog. The catalog provides a SQL-like interface to Data Lake Storage, enabling you to create tables, and views, and run INSERT, UPDATE, and DELETE statements against these tables and views.

compare the focus of a data warehouse to a traditional relational database

The focus of a data warehouse is to provide answers to complex queries, unlike a traditional relational database, which is focused on transactional performance.

Howto create an Azure Blob Storage block blob

The following steps assume you've created the storage account described in the previous unit. In the Azure portal, on the left-hand navigation menu, select Home. On the home page, select Storage accounts. On the Storage accounts page, select the storage account you created in the previous unit. On the Overview page for your storage account, select Storage Explorer. On the Storage Explorer page, right-click BLOB CONTAINERS, and then select Create blob container. In the New Container dialog box, give your container a name, accept the default public access level, and then select Create. In the Storage Explorer window, expand BLOB CONTAINERS, and then select your new blob container. In the blobs window, select Upload. In the Upload blob dialog box, use the files button to pick a file of your choice on your computer, and then select Upload When the upload has completed, close the Upload blob dialog box. Verify that the block blob appears in your container.

give examples of the variety of types of information required in a linked service

The information a linked service contains varies according to the resource. For example, to create a linked service for Azure Blob Storage, you provide information such as the name of the Azure subscription that owns the storage account, the name of the storage account, and the information necessary to authenticate against the storage account. To create a linked service to a different resource, such as Azure SQL Database, you specify the database server name, the database name, and the appropriate credentials. For example, if you're reading and processing data from Azure Blob storage, you'd create an input dataset that uses a Blob Storage linked service to specify the details of the storage account. The dataset would specify which blob to ingest, and the format of the information in the blob (binary data, JSON, delimited text, and so on). If you're using Azure Data Factory to store data in a table in a SQL database, you would define an output dataset that uses a SQL Database linked service to connect to the database, and specifies which table to use in that database.

requirements of hosting an RDMS on-premise

The organization is responsible for: - maintaining the hardware and software - applying patches - backing up databases - restoring them when necessary - generally performing all the day-to-day management required to keep the platform operational. - upgrade or add more servers. You then need to expand your database onto these servers. This can be a formidable task that requires you to take a database offline while the operation is performed

advantages and disadvantages of Azure Table Storage

The primary advantages of using Azure Table Storage tables over other ways of storing data include: - It's simpler to scale. It takes the same time to insert data in an empty table, or a table with billions of entries. An Azure storage account can hold up to 500 TB of data. - A table can hold semi-structured data - There's no need to map and maintain the complex relationships typically required by a normalized relational database. - Row insertion is fast - Data retrieval is fast, if you specify the partition and row keys as query criteria There are disadvantages to storing data this way though, including: - Consistency needs to be given consideration as transactional updates across multiple entities aren't guaranteed - There's no referential integrity; any relationships between rows need to be maintained externally to the table - It's difficult to filter and sort on non-key data. Queries that search based on non-key fields could result in full table scans

What provides the processing engine for Databricks and how does it work?

The processing engine is provided by Apache Spark. Spark is a parallel-processing engine that supports large-scale analytics. You write application code that consumes data from one or more sources, and merge, reformat, filter, and remodel this data, and then store the results. Spark distributes the work across a cluster of computers. Each computer can process its data in parallel with the other computers. The strategy helps to reduce the time required to perform the work. Spark is designed to handle massive quantities of data. You can write the Spark application code using several languages, such as Python, R, Scala, Java, and SQL. Spark has a number of libraries for these languages, providing complex analytical routines that have been optimized for the clustered environment. These libraries include modules for machine learning, statistical analysis, linear and non-linear modeling, predictive analytics, and graphics.

What is the purpose of Azure Data Factory?

The purpose of Azure Data Factory is to retrieve data from one or more data sources, and convert it into a format that you process. The data sources might present data in different ways, and contain noise that you need to filter out. Azure Data Factory enables you to extract the interesting data, and discard the rest. The interesting data might not be in a suitable format for processing by the other services in your warehouse solution, so you can transform it. For example, your data might contain dates and times formatted in different ways in different data sources. You can use Azure Data Factory to transform these items into a single uniform structure. Azure Data Factory can then write the ingested data to a data store for subsequent processing.

Why ingested data needs to be transformed or processed.

The raw data might not be in a format that is suitable for querying. The data might contain anomalies that should be filtered out, or it may require transforming in some way. For example, dates or addresses might need to be converted into a standard format. After data is ingested into a data repository, you may want to: - do some cleaning operations - remove any questionable or invalid data - perform some aggregations such as calculating profit, margin, and other Key Performance Metrics (KPIs).

Azure Database for PostgreSQL single-server

The single-server deployment option for PostgreSQL provides similar benefits as Azure Database for MySQL. You choose from three pricing tiers: Basic, General Purpose, and Memory Optimized. Each tier supports different numbers of CPUs, memory, and storage sizes—you select one based on the load you expect to support.

when it may be better to use non-relational repositories that can store data in its original format, but that allow fast storage and retrieval access to this data

The structure of the data might be too varied to easily model as a set of relational tables. For example, the data might contain items such as video, audio, images, temporal information, large volumes of free text, encrypted information, or other types of data that aren't inherently relational. Additionally, the data processing requirements might not be best suited by attempting to convert this data into the relational format.

TiB

The tebibyte is a multiple of the unit byte for digital information. It is a member of the set of units with binary prefixes defined by the International Electrotechnical Commission (IEC). Its unit symbol is TiB. The prefix tebi (symbol Ti) represents multiplication by 1024 , therefore: 1 tebibyte = 2 bytes = 1099511627776bytes = 1024 gibibytes 1024 TiB = 1 pebibyte (PiB) The tebibyte is closely related to the terabyte (TB), which is defined as 10 bytes = 1000000000000bytes.

Describe big data and the capabilities of systems that must process it.

The term big data refers to data that is too large or complex for traditional database systems. Systems that process big data have to perform rapid data ingestion and processing; they must have capacity to store the results, and sufficient compute power to perform analytics over these results. Another option is to analyze operational data in its original location. This strategy is known as hybrid transactional analytical processing (HTAP). You can perform this style of analysis over data held in repositories such as Azure Cosmos DB using Azure Synapse Link.

Give examples of up-to-the-second data.

The up-to-the-second data might be used to help monitor real-time, critical manufacturing processes, where an instant decision is required. Other examples include streams of stock market data, where the current prices are required to make informed split-second buy or sell decisions.

Features of on-premises PostgreSQL not available in Azure Database for PostgreSQL

These features are mainly concerned with the extensions that users can add to a database to perform specialized tasks, such as writing stored procedures in various programming languages (other than pgsql, which is available), and interacting directly with the operating system. A core set of the most frequently used extensions is supported, and the list of available extensions is under continuous review.

tools you can use to connect to a PostgreSQL database and run queries.

These tools include the pgAdmin graphical user interface, and the psql command-line utility. There are a large number of third-party utilities you can use as well

how relational databases handle concurrency

They need to manage concurrent users possibly attempting to access and modify the same data at the same time, processing the transactions in isolation while keeping the database consistent and recoverable. Many systems implement relational consistency and isolation by applying locks to data when it is updated. The lock prevents another process from reading the data until the lock is released. The lock is only released when the transaction commits or rolls back. Extensive locking can lead to poor performance, while applications wait for locks to be released

describe how a database index is like an index in a book

Think of an index over a table like an index at the back of a book. A book index contains a sorted set of references, with the pages on which each reference occurs. When you want to find a reference to an item in the book, you look it up through the index. You can use the page numbers in the index to go directly to the correct pages in the book. Without an index, you might have to read through the entire book to find the references you're looking for.

What does using a cluster of servers within a single region help?

This approach helps to improve scalability and availability.

In CosmosDB, you designate one of the fields in your documents as the partition key. Why should you select a partition key that collects all related documents together?

This approach helps to reduce the amount of I/O (disk reads) that queries might need to perform when retrieving a set of documents for a given entity. For example, in a document database for an ecommerce system recording the details of customers and the orders they've placed, you could partition the data by customer ID, and store the customer and order details for each customer in the same partition. To find all the information and orders for a customer, you simply need to query that single partition: (see image)

how to configure connectivity to virtual networks and on-premises computers

To enable connectivity, use the Firewalls and virtual networks page for a service. To enable connectivity, choose Selected networks. Three further sections will appear, labeled Virtual network, Firewall, and Exceptions. In the Virtual networks section, you can specify which virtual networks are allowed to route traffic to the service. When you create items such as web applications and virtual machines, you can add them to a virtual network. If these applications and virtual machines require access to your resource, add the virtual network containing these items to the list of allowed networks. If you need to connect to the service from an on-premises computer, in the Firewall section, add the IP address of the computer. This setting creates a firewall rule that allows traffic from that address to reach the service.

Configure connectivity to virtual networks and on-premises computers

To restrict connectivity, use the Firewalls and virtual networks page for a service. To limit connectivity, choose Selected networks. Three further sections will appear, labeled Virtual Network, Firewall, and Exceptions. In the Virtual networks section, you can specify which virtual networks are allowed to route traffic to the service. When you create items such as web applications and virtual machines, you can add them to a virtual network. If these applications and virtual machines require access to your resource, add the virtual network containing these items to the list of allowed networks. If you need to connect to the service from an on-premises computer, in the Firewall section, add the IP address of the computer. This setting creates a firewall rule that allows traffic from that address to reach the service. The Exceptions setting allows you to enable access to any other of your services created in your Azure subscription.

To use your own encryption keys, what do you do?

To use your own keys, add them to Azure Key Vault. You then provide the details of the vault and key, or the URI of the key in the vault. All new data will be encrypted as it's written. Existing data will be encrypted using a process running in the background; this process may take a little time.

dialect of SQL that Microsoft uses

Transact-SQL (T-SQL)

popular dialects of SQL

Transact-SQL (T-SQL). This version of SQL is used by Microsoft SQL Server and Azure SQL Database. pgSQL. This is the dialect, with extensions implemented in PostgreSQL. PL/SQL. This is the dialect used by Oracle. PL/SQL stands for Procedural Language/SQL.

When to use SQL pools in Azure Synapse Analytics

Use SQL pools in Synapse Analytics for the following scenarios: Complex reporting. You can use the full power of Transact-SQL to run complex SQL statements that summarize and aggregate data. Data ingestion. PolyBase enables you to retrieve data from many external sources and convert it into a tabular format. You can reformat this data and save it as tables and materialized views in Azure Synapse.

how you add read replicas in Azure SQL Database for PostgreSQL

Use the Replication page for a PostgreSQL server in the Azure portal

How do you generate Shared Access Signature (SAS) tokens?

Use the Shared access signature page in the Azure portal to generate SAS tokens. You specify the permissions (you could provide read-only access to a blob, for example), the period for which the SAS token is valid, and the IP address range of computers allowed to use the SAS token. The SAS token is encrypted using one of the access keys; you specify which key to use (key1 or key2).

these two things are used to restrict network communications by source and destination networks, protocols, and port numbers.

Virtual local area networks (VLANs) access control lists (ACLs)

describe the data model in the image here

What do the columns marked PK mean? columns marked PK are the Primary Key for the table. What does the primary key indicate? the column (or combination of columns) that uniquely identify each row. What should every table have? a primary key. What does the diagram show between tables? the relationships What do the lines connecting the tables indicate? the type of relationship. What is the relationship from customers to orders? 1-to-many (one customer can place many orders, but each order is for a single customer). What is the relationship between orders and products? many-to-1 (several orders might be for the same product). What are the columns marked FK? Foreign Key columns. What do foreign key columns reference? They reference, or link to, the primary key of another table, and are used to maintain the relationships between tables. What does a foreign key also do? A foreign key also helps to identify and prevent anomalies, such as orders for customers that don't exist in the Customers table. What are the foreign keys in this model? the Customer ID and Product ID columns in the Orders table link to the customer that placed the order and the product that was ordered

SQL pool

When you use Synapse SQL, your analytics workload runs using a SQL pool. In a SQL pool, the Control and Compute nodes in the cluster run a version of Azure SQL Database that supports distributed queries. You define your logic using Transact-SQL statements. You send your Transact-SQL statements to the control node, which splits up the work into queries that operate over a subset of the data, and then sends these smaller queries to the compute nodes. The data is split into chunks called distributions. A distribution is the basic unit of storage and processing for parallel queries that run on distributed data. Each of the smaller queries runs on one of the data distributions. The control and compute nodes use the Data Movement Service (DMS) to move data across the nodes as necessary to run queries in parallel and return accurate results.

Power BI dashboard

When you're ready to share a single page from a report, or a collection of visualizations, you create a dashboard. Much like the dashboard in a car, a Power BI dashboard is a collection of visuals from a single page that you can share with others. Often, it's a selected group of visuals that provide quick insight into the data or story you're trying to present. A dashboard must fit on a single page, often called a canvas (the canvas is the blank backdrop in Power BI Desktop or the service, where you put visualizations). Think of it like the canvas that an artist or painter uses—a workspace where you create, combine, and rework interesting and compelling visuals. You can share dashboards with other users or groups, who can then interact with your dashboards when they're in the Power BI service or on their mobile device.

How you add role assignments to a resource

You add role assignments to a resource in the Azure portal using the Access control (IAM) page. The Role assignments tab enables you to associate a role with a security principal, defining the level of access the role has to the resource.

How is PostgreSQL erxtensible?

You can add code modules to the database, which can be run by queries.

Update data in the Power BI service

You can also choose to update the dataset for an app, or other data that you use in Power BI. To set update settings, select the schedule update icon for the dataset to update, and then use the menu that appears. You can also select the update icon (the circle with an arrow) next to the schedule update icon to update the dataset immediately.

Single Database serverless configuration

You can also specify a serverless configuration. In this configuration, Microsoft creates its own server, which might be shared by a number of databases belonging to other Azure subscribers. Microsoft ensures the privacy of your database. Your database automatically scales and resources are allocated or deallocated as required

how Azure Table Storage helps to protect your data.

You can configure security and role-based access control to ensure that only the people or applications that need to see your data can actually retrieve it.

two ways to run a pipeline in Azure Data Factory

You can run a pipeline manually, or you can arrange for it to be run later using a trigger. A trigger enables you to schedule a pipeline to occur according to a planned schedule (every Saturday evening, for example), or at repeated intervals (every few minutes or hours), or when an event occurs such as the arrival of a file in Azure Data Lake Storage, or the deletion of a blob in Azure Blob Storage.

Single Database scaling options

You can scale the database if you need additional storage space, memory, or processing power. By default, resources are pre-allocated, and you're charged per hour for the resources you've requested

What are the two ways to process data in Azure Synapse Analytics?

You can select between two technologies to process data: Transact-SQL. This is the same dialect of SQL used by Azure SQL Database, with some extensions for reading data from external sources, such as databases, files, and Azure Data Lake storage. You can use these extensions to load data quickly, generate aggregations and other analytics, create tables and views, and store information using these tables and views. You can use the results for later reporting and processing. Spark. This is the same open-source technology used to power Azure Databricks. You write your analytical code using notebooks in a programming language such as C#, Scala, Python, or SQL. The Spark libraries provided with Azure Synapse Analytics enable you to read data from external sources, and also write out data in a variety of different formats if you need to save your results for further analysis.

Ways to enter connection credentials for Azure SQL Database

You can set the Authorization type to SQL Server authentication and enter the user name and password that you set up when you created the database. Or you can select Active Directory password authentication and provide the credentials of an authorized user in Azure Active Directory. If Active Directory single sign-on is enabled, you can connect by using your Azure identity

Describe the RU per second (RU/s) used when provisioning Azure CosmosDB

You can think of a request unit as the amount of computation and I/O resources required to satisfy a simple read request made to the database. Microsoft gives a measure of approximately one RU as the resources required to read a 1-KB document with 10 fields. So a throughput of one RU per second (RU/s) will support an application that reads a single 1-KB document each second. You can specify how many RU/s of throughput you require when you create a database or when you create individual containers in a database. If you specify throughput for a database, all the containers in that database share that throughput. If you specify throughput for a container, the container gets that throughput all to itself. If you underprovision (by specifying too few RU/s), Cosmos DB will start throttling performance. Once throttling begins, requests will be asked to retry later when hopefully there are available resources to satisfy it. If an application makes too many attempts to retry a throttled request, the request could be aborted. The minimum throughput you can allocate to a database or container is 400 RU/s. You can increase and decrease the RU/s for a container at any time. Allocating more RU/s increases the cost. However, once you allocate throughput to a database or container, you'll be charged for the resources provisioned, whether you use them or not. Note If you applied the Free Tier Discount to your Cosmos DB account, you get the first 400 RU/s for a single database or container for free. 400 RU/s is enough capacity for most small to moderate databases.

SQL Server in VM use case: dev environment

You can use SQL Server in a virtual machine to develop and test traditional SQL Server applications. With a virtual machine, you have the full administrative rights over the DBMS and operating system. It's a perfect choice when an organization already has IT resources available to maintain the virtual machines. These capabilities enable you to: - Create rapid development and test scenarios when you do not want to buy on-premises non-production SQL Server hardware. - Become lift-and-shift ready for existing applications that require fast migration to the cloud with minimal changes or no changes. - Scale up the platform on which SQL Server is running, by allocating more memory, CPU power, and disk space to the virtual machine. You can quickly resize an Azure virtual machine without the requirement that you reinstall the software that is running on it.

to quit psql.

You can use the \q command

tools available to connect to MySQL that enable you to create and run scripts of SQL commands

You can use the mysql command-line utility, which is also available in the Azure Cloud Shell, or you can use graphical tools from the desktop such as MySQL Workbench. Currently there are no extensions available for connecting to MySQL from Azure Data Studio.

when you can create a new SQL database in Azure Data Studio

You can't create new SQL databases from a connection in Azure Data Studio if you're running SQL Database single database or elastic pools. You can only create new databases in this way if you're using SQL Database managed instance.

How to run a query against an Azure SQL Database from the Azure portal

You enter your SQL query in the query pane and then click Run to execute it. Any rows that are returned appear in the Results pane. The Messages pane displays information such as the number of rows returned, or any errors that occurred:

What you might be looking for when querying analytical data.

You may be looking for trends attempting to determine the cause of problems in your systems.

how to connect to a database using sqlcmd

You specify parameters that identify the server, database, and your credentials sqlcmd -S <server>.database.windows.net -d <database> -U <username> -P <password>

characters surround identifiers, such as the name of a table, database, column, or data type

[ and ]

analytical processing diagram

[see image]

What is Azure Database for PostgreSQL?

a PaaS implementation of PostgreSQL in the Azure Cloud. This service provides the same availability, performance, scaling, security, and administrative benefits as the MySQL service.

Within a single region, Cosmos DB uses what?

a cluster of servers.

what Power BI is

a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights

What type of database is PostgreSQL?

a hybrid relational-object databased. You can store data in relational tables, but a PostgreSQL database also enables you to store custom data types, with their own non-relational properties

you combine the data from multiple tables using this

a join operation

SQL database server

a logical construct that acts as a central administrative point for multiple single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups.

Azure Synapse Analytics leverages what processing architecture?

a massively parallel processing (MPP) architecture. This architecture includes a control node and a pool of compute nodes.

things data engineers must understand

a range of tools that enable you to create well-designed databases, optimized for the business processes that will be run. understanding of: - the architecture of the database management system - the platform on which the system runs - the business requirements for the data being stored in the database. SQL - create databases, tables, indexes, views, and the other objects required by the database interact with a database from the command line, e.g. use the sqlcmd utility to connect to Microsoft SQL Server and Azure SQL Database, and run ad-hoc queries and commands As a SQL Server professional, your primary data manipulation tool might be Transact-SQL As a data engineer you might use additional technologies, such as Azure Databricks, and Azure HDInsight to generate and test predictive models. If you're working in the non-relational field, you might use Azure Cosmos DB as your primary data store. To manipulate and query the data, you might use languages such as HiveQL, R, or Python.

An Azure Virtual Network

a representation of your own network in the cloud.

What is an Azure Virtual Network

a representation of your own network in the cloud. A virtual network enables you to connect virtual machines and Azure services together, in much the same way that you might use a physical network on-premises. Azure ensures that each virtual network is isolated from other virtual networks created by other users, and from the Internet. Azure enables you to specify which machines (real and virtual), and services, are allowed to access resources on the virtual network, and which ports they can use.

what is T-SQL

a set of programming extensions from Microsoft that adds several features to the Structured Query Language (SQL), including transaction control, exception and error handling, row processing, and declared variables.

How did MySQL start out as

a simple-to-use open-source DBMS

what the relational model provides

a standard way of representing and querying data that could be used by any application

The overarching principle for network security of the Azure SQL Database offering

allow only the connection and communication that is necessary to allow the service to operate. All other ports, protocols, and connections are blocked by default. Virtual local area networks (VLANs) and access control lists (ACLs) are used to restrict network communications by source and destination networks, protocols, and port numbers.

where data might be held

an Excel spreadsheet, or in a collection of cloud-based and on-premises databases, or some other set of data sources

costs of indexes

an index might consume additional storage space, and each time you insert, update, or delete data in a table, the indexes for that table must be maintained. This additional work can slow down insert, update, and delete operations, and incur additional processing charges.

Hadoop

an open source framework that breaks large data processing problems down into smaller chunks and distributes them across a cluster of servers, similar to the way in which Synapse Analytics operates.

To what does the mathematical term "relation" refer?

an organized set of data held as a table

what a relational database is useful for storing

any information containing related data elements that must be organized in a rules-based, consistent way

when you can use a relational database

any time you can easily model your data as a collection of tables with a fixed set of columns

how you define the work performed by Azure Data Factory

as a pipeline of operations A pipeline can run continuously, as data is received from the various data sources. You can create pipelines using the graphical user interface provided by Microsoft, or by writing your own code. The image below shows the pipeline editor in Azure Data Factory.

Replication is ______________, so there's likely to be a lag between a change made in one region, and that change becoming visible in other regions.

asynchronous

how replicas are updated in Azure SQL Database for PostgreSQL

asynchronously so there may be some lag between records being written at the master and becoming available across all replicas

benefits of the Elastic Pool option in Azure SQL Database

automatic updates and patches scalability without costly manual upgrade high availability (99.99%) point-in-time restore replication to different regions (assurance and disaster recovery) advanced threat protection auditing encryption

Why is a clustered index even better than a regular one?

because the relational database management system doesn't have to follow references from the index to find the corresponding data in the underlying table

why relational databases are well-suited for OLTP systems

because they naturally support insert, update, and delete operations. A relational database can often be tuned to make these operations fast. Also, the nature of SQL makes it easy for users to perform ad-hoc queries over data.

what you model in a relational database

collections of entities from the real world as tables

two key aspects of the data analyst role

communication and visualization

The process of combining all of the local data sources is known as what?

data warehousing.

three key job roles dealing with data

database administrators database engineers data analysts

why there are a variety of dialects of SQL

database vendors include their own proprietary extensions that are not part of the standard

You control traffic flow through these items is controlled by doing what

defining firewall rules

problems associated with storing each application's data in its own unique structure in the early days of databases

developers had to know a lot about the particular data structure to find the data they needed. data structures were: - inefficient - hard to maintain - hard to optimize for delivering good application performance.

Give examples of semi-structured data

documents held in JavaScript Object Notation (JSON) format key-value stores graph databases

Describe the documents and fields in an Azure CosmosDB database.

each document in can vary, and a field can contain child documents.

examples of instances of entities

each row in a customers table contains the data for a single customer, each row in a products table defines a single product, and each row in an orders table represents an order made by a customer.

Managed Instance in Azure SQL Database

effectively runs a fully controllable instance of SQL Server in the cloud. features: - install multiple databases on a single instance - automates backups, software patching, database monitoring, and other general tasks - full control over security and resource allocation - all communications are encrypted and signed using certificates, verified through certificate revocation lists

desribe the atomicity property of transactions

either all operations in the sequence must be completed successfully, or if something goes wrong, all operations run so far in the sequence must be undone. Each database transaction has a defined beginning point, followed by steps to modify the data within the database. At the end, the database either commits the changes to make them permanent, or rolls back the changes to the starting point, when the transaction can be tried again

Azure File Storage

enables you to create files shares in the cloud, and access these file shares from anywhere with an internet connection.

Consistency

ensures that a transaction can only take the data in the database from one valid state to another. It should never lose or create data in a manner that can't be accounted for. In the bank transfer example described earlier, if you add funds to an account, there must be a corresponding deduction of funds somewhere, or a record that describes where the funds have come from if they have been received externally. You can't suddenly create (or lose) money.

what a skilled data analyst does

explores the data and use it to determine trends, issues, and gain other insights that might be of benefit to the company.

bundle multiple related SQL objects together in a single package that can be loaded or removed from your database with a single command.

extensions

used to provides the ability to extend the functionality of your Azure PostgreSQL database

extensions

"lift-and-shift" migration (image)

fast migration to the cloud with minimal changes The term lift-and-shift refers to the way in which you can move a database directly from an on-premises server to an Azure virtual machine without requiring that you make any changes to it. Applications that previously connected to the on-premises database can be quickly reconfigured to connect to the database running on the virtual machine, but should otherwise remain unchanged.

What are the features and limitations of Azure Database for MySQL?

features: - high availability at no additional cost - scalability as required, without the need to manage hardware, network components, virtual servers, software patches, and other underlying components - you only pay for what you use - automated backups - point-in-time restore - connection security to enforce firewall rules, and require SSL connections - configurable lock modes, maximum number of connections, and timeouts - global database limitations - certain operations aren't available, primarily concerned with security and administration. Azure manages these aspects of the database server itself.

Examples of transactions

financial, such as the movement of money between accounts in a banking system part of a retail system, tracking payments for goods and services from customers.

Don't use Azure File Storage in what case?

for files that can be written by multiple concurrent processes simultaneously. Multiple writers require careful synchronization, otherwise the changes made by one process can be overwritten by another. The alternative solution is to lock the file as it is written, and then release the lock when the write operation is complete. However, this approach can severely impact concurrency and limit performance.

T-SQL proprietary extensions

for writing stored procedures and triggers (application code that can be stored in the database), and managing user accounts.

where you run SQL statements

from tools and utilities that connect to the appropriate database The tooling available depends on the database management system you're using.

What does historical data provide to a business?

give a business a more stabilized view of trends in performance. A manufacturing organization will require information such as the volumes of sales by products across a month, a quarter, or a year, to determine whether to continue producing various items, or whether to increase or decrease production according to seasonal fluctuations. This historical data can be generated by batch processes at regular intervals, based on the live sales data that might be captured continually.

storage option for social networking sites use databases to store data about millions of users, along with photographs and other information about those users and others

graph database

When to use the Azure command-line interface (CLI) to provision resources

if you need to automate service creation; you can store CLI commands in scripts, and you can run these scripts programmatically

when to use the Managed Instance option in Azure SQL Database

if you want to lift-and-shift an on-premises SQL Server instance and all its databases to the cloud, without incurring the management overhead of running SQL Server on a virtual machine. SQL Database managed instance provides features not available with the Single Database or Elastic Pool options. If your system uses features such as linked servers, Service Broker (a message processing system that can be used to distribute work across servers), or Database Mail (which enables your database to send email messages to users), then you should use managed instance. To check compatibility with an existing on-premises system, you can install Data Migration Assistant (DMA). This tool analyzes your databases on SQL Server and reports any issues that could block migration to a managed instance.

Advanced security

implements threat protection and assessment. Threat protection adds security intelligence to your service. This intelligence monitors the service and detects unusual patterns of activity that could be harmful, or compromise the data managed by the service. Assessment identifies potential security vulnerabilities and recommends actions to mitigate them. You're charged an additional fee for this feature. The image below shows the Advanced security page for Azure storage. The corresponding page for other non-relational services, such as Cosmos DB, is similar.

what read replicas help with

improve the performance and scale of read-intensive workloads. Read workloads can be isolated to the replicas, while write workloads can be directed to the master. Note: they don't directly reduce the burden of write operations on the master. This feature isn't targeted at write-intensive workloads.

Where is structured data typically stored?

in a relational database such as SQL Server or Azure SQL Database

In a customer and address example, when might a document database be at a disadvantage over a relational database?

in a relational database you would only need to store the address information once. In the diagram below, Jay and Frances Adams both share the same address. In a document database, the address would be duplicated in the documents for Jay and Francis Adams. This duplication not only increases the storage required, but can also make maintenance more complex (if the address changes, you must modify it in two documents).

how you might you store customer and address information in a non-relational document database

in a single document

ways in which Azure Table Storage provides high-availability guarantees

in a single region. The data for each table is replicated three times within an Azure region. For increased availability, but at additional cost, you can create tables in geo-redundant storage. In this case, the data for each table is replicated a further three times in another region several hundred miles away. If a replica in the local region becomes unavailable, Azure will transparently switch to a working replica while the failed replica is recovered. If an entire region is hit by an outage, your tables are safe in a remote region, and you can quickly switch your application to connect to that remote region.

How You create Azure File storage

in a storage account.

What is the superficial similarity between a Cosmos DB container and a table in Azure Table storage?

in both cases, data is partitioned and documents (rows in a table) are identified by a unique ID within a partition.

where every application stored its data in the early days of databases

in its own unique structure

when streaming data is most beneficial

in most scenarios where new, dynamic data is generated on a continual basis. for time-critical operations that require an instant real-time response.

where the psql utility is available

in the Azure Cloud Shell. you can also run it from a command prompt on your desktop computer, but you must download and install the psql client.

type of visual representations of data

include charts, graphs, infographics, and other pictorial diagrams

how you connect data sources together and define queries that combine, filter, and aggregate data in Azure Analysis Services

includes a graphical designer

Microsoft Azure supports what non-relational data services,

including Azure File storage, Azure Blob storage, Azure Data Lake Store, and Azure Cosmos DB

name two structures that help to optimize data organization in a relational database

indexes views

semi-structured data

information that doesn't reside in a relational database but still has some structure to it

What do Databricks workspaces enable?

interactive workspaces that enables collaboration between data scientists, data engineers, and business analysts.

How Azure Synapse Analytics enables you to repeatedly query the same data without the overhead of fetching and converting it each time

it enables you to store the data you have read in and processed locally, within the service (this is described later). This approach enables you to repeatedly query the same data without the overhead of fetching and converting it each time. You can also use this data as input to further analytical processing, using Azure Analysis Services.

What is a key feature of PostgreSQL?

it has the ability to store and manipulate geometric data, such as lines, circles, and polygons

compare Azure Table Storage with the relational model

items are referred to as rows, and fields are known as columns. However, don't let this terminology confuse you by thinking that an Azure Table Storage table is like a table in a relational database. An Azure table enables you to store semi-structured data. All rows in a table must have a key, but apart from that the columns in each row can vary. Unlike traditional relational databases, Azure Table Storage tables have no concept of relationships, stored procedures, secondary indexes, or foreign keys. Data will usually be denormalized, with each row holding the entire data for a logical entity. For example, a table holding customer information might store the forename, lastname, one or more telephone numbers, and one or more addresses for each customer. The number of fields in each row can be different, depending on the number of telephone numbers and addresses for each customer, and the details recorded for each address. In a relational database, this information would be split across multiple rows in several tables. In this example, using Azure Table Storage provides much faster access to the details of a customer because the data is available in a single row, without requiring that you perform joins across relationships.

languages you can use to create Databricks scripts and query data

languages such as R, Python, and Scala.

categories of activities related to data

managing controlling using

requirements of hosting an RDBMS in the cloud

many operations can be handled for you by the data center staff, in many cases with no (or minimal) downtime

what IaaS is best for

migrations and applications requiring operating system-level access

example of analytical report

monthly sales

sometimes you have to split an entity into what

more than one table

Migrating from the system running on-premises to an Azure virtual machine is no different than what?

moving the databases from one on-premises server to another

where you go to manage add role assignments in Azure RBAC

navigate to the Resource Group >> the Access control (IAM) page > > The Role assignments tab enables you to associate a role with a security principal, defining the level of access the role has to the resource.

unstructured data

nonnumeric information that is typically formatted in a way that is meant for human eyes and not easily understood by computers For example, audio and video files, and binary data files might not have a specific structure

the name for splitting tables out into groups of separate columns

normalization

You write your Spark code using what?

notebooks. A notebook contains cells, each of which contains a separate block of code. When you run a notebook, the code in each cell is passed to Spark in turn for execution. The image below shows a cell in a workbook that runs a query and generates a graph.

what Azure Data Studio (currently) connects to

on-premises SQL Server databases Azure SQL Database PostgreSQL Azure SQL Data Warehouse SQL Server Big Data Clusters you can download and install extensions from third-party developers that connect to other systems, or provide wizards that help to automate many administrative tasks. ...others

number of records supported in a standard SQL insert statement

one Note: Some dialects allow you to specify multiple VALUES clauses to add several rows at a time:

SQL Server in VM use case: hybrid deployment (image)

optimized for migrating existing applications to Azure, or extending existing on-premises applications to the cloud in hybrid deployments where part of the operation runs on-premises, and part in the cloud. Your database might be part of a larger system that runs on-premises, although the database elements might be hosted in the cloud.

What is the query language for PostgreSQL?

pgsql

What does a clustered index do?

physically reorganizes a table by the index key

default database name when you create an Azure SQL Database for PostgreSQL

postgres

the default management database created with Azure Database for PostgreSQL

postgres

what data visualization is key to doing

presenting large amounts of information in ways that are universally understandable or easy to interpret and spot patterns, trends, and correlations

on what do you define the relationship between two tables

primary and foreign keys

benefits of elastic pools

provide a simple and cost-effective solution for managing the performance of multiple databases within a fixed budget provides compute and storage resources shared between all of the databases it contains databases within the pool only use the resources they need when they need them within configurable limits the price of a pool is based only on the amount of resources configured and is independent of the number of databases it contains

why tables are so important in the relational database model

provide an intuitive, efficient, and flexible way to store and access structured information.

What you can do after analytical data has been ingested and transformed

query the data to analyze it

examples of information from a range of sources that might be stored in non-relational databases

real-time data monitoring the status of production line machinery, product quality control data, historical production logs, product volumes in stock, and raw materials inventory data

To ensure availability, Azure Blob storage provides this

redundancy.

compute

refers to the amount of processor power available, but in terms of size and number of CPUs allocated to the service.

what two categories of data are ingested using Azure Data Factory?

relational and non-relational

What type of database holds data that is represented by rows and columns?

relational database

what are PostgreSQL, MariaDB, and MySQL

relational database management systems tailored for different specializations

appeals of on-premise applications

reliable, secure, and allow enterprises to maintain close control.

enables you to bundle services and Azure resources together

resource groups for example, you might create separate resource groups for each project on which you are working

What happens after provisioning is complete?

resources are assigned to you until you delete the service you begin getting charged until

describe the Azure database administrator role

responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions built on Azure data services and SQL Server. ensures that data is available, protected from loss, corruption, or theft, and is easily accessible as needed. responsible for the overall availability and consistent performance and optimizations of the database solutions. work with stakeholders to implement policies, tools, and processes for backup and recovery plans to recover following a natural disaster or human-made error. responsible for managing the security of the data in the database, granting privileges over the data, granting or denying access to users as appropriate.

What does Azure Data Factory do?

retrieve and format relational and non-relational data in a process known as ingestion. The formatted data is written to Azure Data Lake storage

How data retrieval of customer and address information is efficient in a document database

retrieving customer and address information is a matter of reading a single document

what "role" definition in Azure RBAC is often abbreviated as

role

what you create in Azure RBAC to control access to resources

role assignments

What two items implement network-based Access Control Lists (ACLs)?

routers load balancers

how items in the same Azure Table Storage partition are stored

row key order If an application adds a new row to a table, Azure ensures that the row is placed in the correct position in the table. In the example below, taken from an IoT scenario, the row key is a date and time value.

Some tools and applications require a connection string that identifies what

server, database, account name, and password

key-value store

similar to a relational table, except that each row can have any number of columns.

the model that Azure Table Storage implements

the NoSQL key-value model

where to configure parameters for Azure SQL Database for MySQL

the Server parameters page in the Azure portal.

provisioning

the act of setting up an Azure service, such as a database server

how a range query works efficiently

the application searches for a set of rows in a partition, specifying the start and end point of the set as row keys. This type of query is also very quick, as long as you have designed your row keys according to the requirements of the queries performed by your application.

data processing

the conversion of raw data into information through a process

how data is stored in the NoSQL key-value model

the data for an item is stored as a set of fields, and the item is identified by a unique key.

what you need to connect to an Azure SQL Database

the details of the server to connect to an Azure SQL Database account (a username and password) that has access to this server the name of the database to use on this server

Azure Data Lake Storage combines what two things?

the hierarchical directory structure and file system semantics of a traditional file system with security and scalability provided by Azure

To connect to a PostgreSQL database,

the name of the server, and the credentials for an account that has access rights to connect to the server

two elements of Azure Table Storage

the partition key that identifies the partition containing the row (as described above), and a row key that is unique to each row in the same partition

what "scope" limits in Azure RBAC

the set of resources that the access applies to. When you assign a role, you can further limit the actions allowed by defining this. This is helpful if, for example, you want to make someone a Website Contributor, but only for one resource group.

Any modern data warehouse solution must be able to provide access to what to types of data?

the streams of raw data, and the cooked business information derived from this data.

big data

the term used for large quantities of data collected in escalating volumes, at higher velocities, and in a greater variety of formats than ever before. It can be historical (meaning stored) or real time (meaning streamed from the source). Businesses typically depend on this to help make critical business decisions.

the chief strength of the relational database model

the use of tables

example of batch processing

the way that credit card companies handle billing. The customer doesn't receive a bill for each separate credit card purchase but one monthly bill for all of that month's purchases.

The default connectivity for Azure Cosmos DB and Azure Storage is to enable access to what?

the world at large Although this level of access sounds risky, most Azure services mitigate this risk by requiring authentication before granting access.

Once a data storage service has been provisioned, you can then do what?

then configure the service to enable you to store and retrieve data, and to make it accessible to the users and applications that require it.

how roles (role definitions) in Azure RBAC can be named

they can be given high-level names, like owner, or specific names, like virtual machine reader

two elements of graph databases

they contain nodes (information about objects), and edges (information about the relationships between objects)

one of the main benefits of computer databases

they make it easy to store information so it's quick and easy to find

what configurations parameters are used for in Azure SQL Database for PostgreSQL

they support fine-tuning of the database, and debugging of code in the database.

what advanced data security implements

threat protection and assessment

help to balance access latency and storage cost of blob storage

three access tiers

Each replica increases the cost of the Cosmos DB service. For example, if you replicate your account to two regions, your costs will be what?

three times that of a non-replicated account.

the name for the compute, storage, and backup level for an Azure SQL Database configuration.

tier (e.g. Basic, Standard, Premium)

The important part in Azure Table Storage design

to choose the partition and row keys carefully.

primary workload of relational database systems

to handle transactions

How Key Performance Indicators are used.

to measure growth and performance

Azure Synapse Analytics is used for what?

to read data from many sources, process this data, generate various analyses and models, and save the results.

What is the primary purpose of the Table, MongoDB, Cassandra, and Gremlin APIs for Azure CosmosDB?

to support existing applications. If you are building a new application and database, you should use the SQL API.

The simplest way to create a table in Azure Table Storage is

to use the Azure portal.

auditing

tracks database events and writes them to an audit log in your Azure storage account benefits: - helps maintain regulatory compliance - understand database activity - gain insight into discrepancies and anomalies that might indicate business concerns or suspected security violations

two broad categories of data processing solutions

transaction processing systems analytical systems

what OLTP systems are focused on

transaction-oriented tasks that process a very large number of transactions per minute

what a transactional system records

transactions

How can you grant limited rights to resources in an Azure storage account for a specified time period?

use shared access signatures (SAS) This feature enables applications to access resources such as blobs and files, without requiring that they're authenticated first. You should only use SAS for data that you intend to make public.

graph databases

used to store and query information about complex relationships. it contains nodes (information about objects), and edges (information about the relationships between objects)

how you create an Azure Table Storage table

using an Azure storage account.

how you create blobs in Azure Blob Storage

using an Azure storage account. like Azure Table storage

two ways data can be ingested

using batch processing or streaming, depending on the nature of the data source.

How you write and run Spark code

using notebooks. A notebook is like a program that contains a series of steps (called cells). A notebook can contain cells that read data from one or more data sources, process the data, and write the results out to a data store. The scalability of Azure Databricks makes it an ideal platform for performing complex data ingestion and analytics tasks.

You can create Azure storage file shares how?

using the Azure portal The following steps assume you've created the storage account described in unit 2. In the Azure portal, on the hamburger menu, select Home. On the home page, select Storage accounts. On the Storage accounts page, select the storage account you created in the unit 2. On the Overview page for your storage account, select Storage Explorer. On the Storage Explorer page, right-click FILE SHARES, and then select Create file share. In the New file share dialog box, enter a name for your file share, leave Quota empty, and then select Create. In the Storage Explorer window, expand FILE SHARES, and select your new file share, and then select Upload. Tip If your new file share doesn't appear, right-click FILE SHARES, and then select Refresh. In the Upload files dialog box, use the files button to pick a file of your choice on your computer, and then select Upload When the upload has completed, close the Upload files dialog box. Verify that the file appears in file share. Tip If the file doesn't appear, right-click FILE SHARES, and then select Refresh.

Once you've created a storage account, how can you upload files to Azure File Storage

using the Azure portal or tools such as the AzCopy utility. You can also use the Azure File Sync service to synchronize locally cached copies of shared files with the data in Azure File Storage.

how to find the server name for an Azure SQL Database

using the Azure portal: go to the page for your database, and on the Overview page note the fully qualified server name in the Server name field.

storage options for a collection of music, video, or other media files

using unstructured storage, such as that available in Azure Blob storage

advanced threat protection

vulnerability assessments to: - help detect and remediate potential security problems - detects anomalous activities that indicate unusual and potentially harmful attempts to access or exploit your database Provides: - continuous monitoring - security alerts - recommended actions on how to investigate and mitigate threats

what data analysts are responsible for understanding

what the data means

When using asynchronous replication to a cluster of servers in a region, how can data inconsistencies occur?

when data is written to one server, data might be read from another server before the replication from the other server occurs. In this case, an application can see old data.

when streaming data is processed

when it arrives

If you're provisioning a Data Lake storage, when you must specify the appropriate configuration settings

when you create the storage account. You can't configure Data Lake storage after the storage account has been set up.

when you cannot use the Elastic Pool or Single Database options in Azure SQL Database

when you need linked servers

can views join tables together?

yes

does T-SQL allow multi-row inserts

yes

Can you modify parameters you have provisioned after the service has been created?

yes, in many cases.


Related study sets

Advantages and Disadvantages of Sole Proprietorships and Corporations

View Set

Trigonometry ratios in right triangles practice / quiz

View Set

Chapter 11 Learning Guide- The Worlds of Islam

View Set

OB Chapter 20 Practice Questions

View Set

MIS 301 Exam 2 - Example Questions

View Set