DP-200: Implementing an Azure Data Solution

Ace your homework & exams now with Quizwiz!

What is providing opportunities to perform different types of analysis on data?

the volume and variety of data

What is the role of consumers in data?

they both generate and use data

How does Azure Event Hubs provide authentication?

through a shared key

List some of the IoT capabilities

- Device-to-cloud messaging - Protocols: HTTPS, AMQP, AMQP over webSockets - Protocols: MQTT, MQTT over webSockets - Per-device identity - File upload from devices - Device Provisioning Service - Cloud-to-device messaging - Device twin and device management - IoT Edge

Where is unstructured data stored?

- as a file in Azure Blob storage, in Azure Data Lake for example - as NoSQL data in Azure Cosmos DB or Azure HDInsight

What are the specific learning objectives in the module "Survey the services on the Azure Data platform"?

- contrast structured data with nonstructured data - Explore common Azure data platform technologies and identify when to use them - List additional technologies that support the common data platform technologies

What should data professionals be able to explain to stakeholders?

- how the data landscape has changed - how roles and technologies are evolving - the key factors that are driving the changes - how an organization can benefit from embracing the changes

What does anyone working with data need to understand in order to generate value?

- how the landscape has changed - how roles and technologies are evolving

What data systems responsibilities must data engineers perform?

- maintain accuracy - security - high availability - compliance (e.g. General Data Protection Regulation, or industry standards such as PCI DSS (Payment Card Industry Data Security Standard) - language or norms standards (local language and date format)

In a Stream Analytics job, what can you do after storing the data?

- run batch analytics in Azure HDInsight - send the output to a service like Event Hubs for consumption - use the Power BI streaming API to send the output to Power BI for real-time visualization

What three things does Azure Storage offer?

1) a very scalable object store for data objects and file system services in the cloud. 2) a messaging store for reliable messaging 3) it can act as a NoSQL store

How much faster is Spark than Hadoop? Why?

100 times because of the difference in storage (file system versus in-memory)

petabyte

250 bytes; 1024 terabytes, or a million gigabytes.

What uptime does Cosmos DB support?

99.999 percent (5 9s)

access control list (ACL)

A clearly defined list of permissions that specifies what actions an authenticated user may perform on a shared resource. An access control list (ACL) is a list of access control entries (ACE). Each ACE in an ACL identifies a trustee and specifies the access rights allowed, denied, or audited for that trustee. The security descriptor for a securable object can contain two types of ACLs: a DACL and a SACL.

What access control does Azure Data Lake support?

Azure Active Directory ACLs

What technologies both consume data and make decisions the way humans do?

AI & Machine Learning

How do security administrators control data access in Data Lake Storage?

Active Directory Security Groups

What range of data security and compliance features are provided in SQL Database?

Advanced Threat Protection SQL Database auditing Data encryption Azure Active Directory authentication Multifactor authentication Compliance certification

What is a sensor?

An electronic component that converts energy from one source to another that can be understood by a computer.

What are three main functions of operating systems?

An operating system (OS) is system software that: - manages computer hardware - manages software resources - provides common services for computer programs.

antivirus software

Antivirus software was originally developed to detect and remove computer viruses, hence the name. However, with the proliferation of other kinds of malware, antivirus software started to provide protection from other computer threats. In particular, modern antivirus software can protect users from: - malicious browser helper objects (BHOs), - browser hijackers - ransomware - keyloggers - backdoors - rootkits - trojan horses - worms - malicious LSPs - dialers - fraud tools - adware - spyware. Some products also include protection from other computer threats, such as: - infected and malicious URLs - spam - scam and phishing attacks - online identity (privacy) - online banking attacks - social engineering techniques - advanced persistent threat (APT) - botnet DDoS attacks.

What is included in HDInsight?

Apache Hadoop Spark Kafka HBase Storm Interactive Query

What is Apache Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Describe how you can build a data architecture that supports web and mobile applications.

As a data engineer, use the Azure Cosmos DB multimaster replication model to create a data architecture that supports web and mobile applications. Thanks to Microsoft performance commitments, these applications can achieve a response time of less than 10 ms anywhere in the world. By reducing the processing time of their websites, global organizations can increase customer satisfaction.

Describe how Azure reduces the complexity of building and deploying servers.

As a data engineer, you'll use a web user interface for simple deployments. For more complex deployments, you can create and automate powerful scripts. In less time than it takes you to read this module, you can set up a database that's globally distributed, sophisticated, and highly available. You spend less time setting up services, and you focus more on security and on deriving business value from your data.

What are the four configuration options for Azure Storage?

Azure Blob: A scalable object store for text and binary data Azure Files: Managed file shares for cloud or on-premises deployments Azure Queue: A messaging store for reliable messaging between application components Azure Table: A NoSQL store for no-schema storage of structured data

With ELT, data is immediately extracted and loaded into what large data repositories?

Azure Cosmos DB or Azure Data Lake Storage

In Azure Synapse Analytics, where can you use PolyBase to ingest and process data?

Azure Data Factory

What do you use to ingest data into your system?

Azure Data Factory, Storage Explorer, the AzCopy tool, PowerShell, or Visual Studio. If you use the File Upload feature to import file sizes above 2 GB, use PowerShell or Visual Studio. AzCopy supports a maximum file size of 1 TB and automatically splits data files that exceed 200 GB.

What is Azure Data Lake Storage?

Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data.

HDInsight

Azure HDInsight is a managed, full-spectrum, open-source analytics service in the cloud for enterprises. You can use open-source frameworks such as Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more. Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.

What tool provides a permissions model that uses role-based access control to set set permissions and assign roles to users, groups, or applications?

Azure Resource Manager

What are common Microsoft data storage platforms?

Azure SQL Database Azure Synapse (formerly Azure SQL Data Warehouse) SQL Server SQL Server Analysis Services others...

Which Azure service is a managed relational database service?

Azure SQL database

What Azure technology do data engineers use to process streaming data and respond to data anomalies in real time

Azure Stream Analytics

How does batch data get processed in Azure Stream Analytics?

Batch systems process groups of data that are stored in an Azure Blob store. They do this in a single job that runs at a predefined interval.

As a data engineer, why might you need to transform the data you extract?

Because the data source might have a different structure than the target destination, you'll transform the data from the source schema to the destination schema.

data at rest

Data at rest in information technology means inactive data that is stored physically in any digital form (e.g. databases, data warehouses, spreadsheets, archives, tapes, off-site backups, mobile devices etc.). Data at rest is subject to threats from hackers and other malicious threats.

What features in Azure IoT Hub enrich the relationship between your devices and your back-end systems?

Bidirectional communication capabilities mean that while you receive data from devices, you can also send commands and policies back to devices. Take advantage of this ability, for example, to update properties or invoke device management actions. Azure IoT Hub can also authenticate access between the IoT device and the IoT hub.

What are built-in security group in Active Directory Security Groups?

Built-in security groups include ReadOnlyUsers, WriteAccessUsers, and FullAccessUsers.

Give an example where Cosmos DB helps resolve a business problem.

Consider this example where Azure Cosmos DB helps resolve a business problem. Contoso is an e-commerce retailer based in Manchester, UK. The company sells children's toys. After reviewing Power BI reports, Contoso's managers notice a significant decrease in sales in Australia. Managers review customer service cases in Dynamics 365 and see many Australian customer complaints that their site's shopping cart is timing out. Contoso's network operations manager confirms the problem. It's that the company's only data center is located in London. The physical distance to Australia is causing delays. Contoso applies a solution that uses the Microsoft Australia East datacenter to provide a local version of the data to users in Australia. Contoso migrates their on-premises SQL Database to Azure Cosmos DB by using the SQL API. This solution improves performance for Australian users. The data can be stored in the UK and replicated to Australia to improve throughput times.

How performant is Azure Cosmos DB?

Currently, Azure Cosmos DB supports five-nines uptime (99.999 percent). It can support response times below 10 ms when it's provisioned correctly.

Give examples where you could use Azure Stream Analytics.

Internet of Things (IoT) monitoring web logs remote patient monitoring point of sale (POS) systems

In Azure Synapse Analytics, what service coordinates and transports data between compute nodes as necessary.

Data Movement Service (DMS)

What is descriptive analytics?

Descriptive analytics is a statistical method that is used to search and summarize historical data in order to identify patterns or meaning. For learning analytics, this is a reflective analysis of learner data and is meant to provide insight into historical patterns of behaviors and performance in online learning environments.

What scalability is supported in Azure SQL Database?

It provides online transaction processing (OLTP) that can scale on demand

When should you NOT use batch systems? Give a couple of examples.

Don't use batch systems for business intelligence systems that can't tolerate the predefined interval. For example, an autonomous vehicle can't wait for a batch system to adjust its driving. Similarly, a fraud-detection system must decline a questionable financial transaction in real time.

How do you limit traffic to only Azure services.

Enable the firewall

What inputs are provided to Stream Analytics jobs?

Event Hubs IoT Hubs Azure Storage

In the 'Event queuing & stream ingestion' step of data stream processing, what technologies are used?

Event Hubs - for applications IoT Hubs - for IoT Devices and Gateways Blobs - for streaming ingress and reference data

Where does Stream Analytics handle security in Event Hubs?

Event Hubs uses a shared key to secure the data transfer. Streaming data is generally discarded after the windowing operations finish. If you want to store the data, your storage device will provide security.

In Azure Synapse Analytics, what approach is used on bulk data.

Extract, Load, and Transform (ELT)

In the 'Storage, Presentation, & Action' step of data stream processing, what technologies are used?

For archiving for long term storage/batch analytics: - Data Lake - Cosmos DB - SQL DB/DW, - more... For automation to kick-off workflows: - Service Bus - Azure Functions - Event Hubs - more... For presentation: - Power BI - others...

Give an example of how Data Lake Storage can be used to store massive amounts of data for big-data analytics?

For example, Contoso Life Sciences is a cancer research center that analyzes petabytes of genetic data, patient data, and records of related sample data. Data Lake Storage Gen2 reduces computation times, making the research faster and less expensive.

Compare support for on-premises versus cloud environments.

Hundreds of vendors sell physical server hardware. This variety means server administrators might need to know how to use many different platforms. Because of the diverse skills required to administer, maintain, and support on-premises systems, organizations sometimes have a hard time finding server administrators to hire. Cloud systems are easy to support because the environments are standardized. When Microsoft updates a product, the update applies to all consumers of the product.

What is Azure Data Lake Storage available as

Generation 1 (Gen1) or Generation 2 (Gen2). Data Lake Storage Gen1 users don't have to upgrade to Gen2, but they forgo some benefits.

What compute platforms can sit above Data Lake Storage?

HDInsight Hadoop Azure Databricks

What security compliance certifications does Azure Cosmos DB meet?

HIPAA FedRAMP SOCS HITRUST

What is Azure Data Lake Storage compatible with

Hadoop

How do you query data in HDInsight?

Hadoop supports Pib and HiveQL languages. In Spark, data engineers use Spark SQL.

How is security implemented in HDInsight?

Hadoop supports: encryption Secure Shell (SSH) shared access signatures Azure Active Directory security

Compare availability of on-premises versus cloud environments.

High-availability systems must be available most of the time. Service-level agreements (SLAs) specify your organization's availability expectations. System uptime can be expressed as three nines, four nines, or five nines. These expressions indicate system uptimes of 99.9 percent, 99.99 percent, or 99.999 percent. To calculate system uptime in terms of hours, multiply these percentages by the number of hours in a year (8,760). AVAILABILITY Uptime level Uptime hours per year Downtime hours per year 99.9% 8,751.24(8,760 - 8,751.24) = 8.76 99.99% 8,759.12 (8,760 - 8,759.12) = 0.88 99.999% 8,759.91 (8,760 - 8,759.91) = 0.09 For on-premises servers, the more uptime the SLA requires, the higher the cost. Azure duplicates customer content for redundancy and high availability. Many services and platforms use SLAs to ensure that customers know the capabilities of the platform they're using

When is Stream Analytics a good solution?

If your organization must respond to data events in real time or analyze large batches of data in a continuous time-bound stream

How do data engineers query data in Data Lake Storage?

In Data Lake Storage Gen1, data engineers query data by using the U-SQL language. In Gen 2, use the Azure Blob Storage API or the Azure Data Lake System (ADLS) API.

In HDInsight, how to you process data?

In Hadoop, use Java and Python to process big data. Mapper consumes and analyzes input data. It them emits tuples that Reducer can analyze. Reducer run summary operations to create a smaller combined result set. Spark processes streams by using Spark Streaming. For machine learning, use the 200 preloaded Anaconda libraries with Python. Use GraphX for graph computations. Developers can remotely submit and monitor jobs from Spark. Storm supports common programming languages like Java, C#, and Python.

Compare multilingual support for on-premises and cloud systems.

In on-premises SQL Server systems, multilingual support is difficult and expensive. One issue with multiple languages is the sorting order of text data. Different languages can sort text data differently. To address this issue, the SQL Server database administrator must install and configure the data's collation settings. But these settings can work only if the SQL database developers considered multilingual functionality when they were designing the system. Systems like this are complex to manage and maintain. Cloud systems often store data as a JSON file that includes the language code identifier (LCID). The LCID identifies the language that the data uses. Apps that process the data can use translation services such as the Bing Translator API to convert the data into an expected language when the data is consumed or as part of a process to prepare the data.

Describe how a data engineer can apply big-data analytics and AI solutions in the Healthcare industry.

In the healthcare industry, use Azure Databricks to accelerate big-data analytics and AI solutions. Apply these technologies to genome studies or pharmacy sales forecasting at a petabyte scale. Using Databricks features, you can set up your Spark environment in minutes and autoscale quickly and easily. Using Azure, you can collaborate with data scientists on shared projects and workspaces in a wide range of languages, including SQL, R, Scala, and Python. Because of native integration with Azure Active Directory and other Azure services, you can build diverse solution types. For example, build a modern data warehouse or machine learning and real-time analytics solutions.

What data security features are supported in Cosmos DB?

It supports data encryption, IP firewall configurations, and access from virtual networks. Data is encrypted automatically. User authentication is based on tokens, and Azure Active Directory provides role-based security.

What types of data does Azure SQL database support

It supports structures such as relational data and unstructured formats such as spatial and XML data.

What four types of NoSQL databases does the open-source world offer?

Key-value store: Stores key-value pairs of data in a table structure. Document database: Stores documents that are tagged with metadata to aid document searches. Graph database: Finds relationships between data points by using a structure that's composed of vertices and edges. Column database: Stores data based on columns rather than rows. Columns can be defined at the query's runtime, allowing flexibility in the data that's returned performantly.

If you create a storage account as a Blob store, can you query the data directly?

No. To query it, either move the data to a store that supports queries or set up the Azure Storage account for a data lake storage account.

What is a common platform for unstructured data?

NoSQL

Compare the computing environments of on-premises versus cloud solutions.

On-premises environments require physical equipment to execute applications and services. This equipment includes: - physical servers - network infrastructure - and storage. The equipment must have power, cooling, and periodic maintenance by qualified personnel. A server needs at least one operating system (OS) installed. It might need more than one OS if the organization uses virtualization technology. Cloud computing environments provide the physical and logical infrastructure to host services, virtual servers, intelligent applications, and containers for their subscribers. Different from on-premises physical servers, cloud environments require no capital investment. Instead, an organization provisions service in the cloud and pays only for what it uses. Moving servers and services to the cloud also reduces operational costs. Within minutes, an organization can provision anything from virtual servers to clusters of containerized apps by using Azure services. Azure automatically creates and handles all of the physical and logical infrastructure in the background. In this way, Azure reduces the complexity and cost of creating the services. On-premises servers store data on physical and virtual disks. On a cloud platform, storage is more generic. Diverse storage types include Azure Blob storage, Azure Files storage, and Azure Disk Storage. Complex systems often use each type of storage as part of their technical architecture. With Azure Disk Storage, customers can choose to have Microsoft manage their disk storage or to pay a premium for greater control over disk allocation.

Compare maintenance of on-premises versus cloud solutions.

On-premises systems require maintenance for the hardware, firmware, drivers, BIOS, operating system, software, and antivirus software. Organizations try to reduce the cost of this maintenance where it makes sense. In the cloud, Microsoft manages many operations to create a stable computing environment. This service is part of the Azure product benefit. Microsoft manages key infrastructure services such as: - physical hardware - computer networking - firewalls and network security - datacenter fault tolerance - compliance - physical security of the buildings. Microsoft also invests heavily to battle cybersecurity threats, and it updates operating systems and firmware for the customer. These services allow data engineers to focus more on data engineering and eliminating system complexity.

When data loads increase the processing time for on-premises data warehousing descriptive analytic solutions, what might organization look to? Why?

Organizations that face this issue might look to a cloud-based alternative to reduce processing time and release business intelligence reports faster. But many organizations first consider scaling up on-premises servers. As this approach reaches its physical limits, they look for a solution on a petabyte scale that doesn't involve complex installations and configurations. The SQL Pools capability of Azure Synapse Analytics can meet this need.

Describe how Microsoft data engineers can create IoT solutions.

Over the last couple of years, hundreds of thousands of devices have been produced to generate sensor data. These are known as IoT devices. Using technologies like Azure IoT Hub, you can design a data solution architecture that captures information from IoT devices so that the information can be analyzed.

On what devices do data consumers view data?

PCs tablets mobile devices

What classification of offering is Azure SQL Database?

Platform as a Service (PaaS)

In Azure Synapse Analytics, what technology removes the complexity of loading data?

PolyBase

What capability of Azure Synapse Analytics can meet the need of a petabyte scale that doesn't involve complex installations and configurations?

SQL Pools

In what relational systems is structure defined at design time?

SQL Server Azure SQL Database Azure SQL Data Warehouse (Synapse?)

What authentication options are available with Azure Synapse Analytics?

SQL Server authentication Azure Active Directory

If your organization must respond to data events in real time or analyze large batches of data in a continuous time-bound stream, what technology is a good solution?

Stream Analytics

In the 'Stream analytics' step of data stream processing, what technologies are used?

Stream Analytics Machine Learning

How does streaming compare to nonstreaming systems in terms of volume and payload?

Streaming data is high volume and has a lighter payload than nonstreaming systems.

What language is used to query SQL Database?

T-SQL This method benefits from a wide range of standard SQL features to filter, order, and project the data into the form you need.

By what can even the most experienced data engineer feel overwhelmed?

the range of data platform technologies in Microsoft Azure

byte

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as The Internet Protocol (RFC 791)(1981) refer to an 8-bit byte as an octet.

Over the past 30 years, what have we seen an exponential increase in?

The number of devices and software that generate data to meet business and user needs

Name the key features of Data Lake Storage.

Unlimited scalability Hadoop compatibility Security support for both access control lists (ACLs) POSIX compliance An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics Zone-redundant storage Geo-redundant storage

Compare total cost of ownership of on-premises and cloud solutions.

The term total cost of ownership (TCO) describes the final cost of owning a given technology. In on-premises systems, TCO includes the following costs: - Hardware - Software licensing - Labor (installation, upgrades, maintenance) - Datacenter overhead (power, telecommunications, building, heating and cooling) It's difficult to align on-premises expenses with actual usage. Organizations buy servers that have extra capacity so they can accommodate future growth. A newly purchased server will always have excess capacity that isn't used. When an on-premises server is at maximum capacity, even an incremental increase in resource demand will require the purchase of more hardware. Because on-premises server systems are very expensive, costs are often capitalized. This means that on financial statements, costs are spread out across the expected lifetime of the server equipment. Capitalization restricts an IT manager's ability to buy upgraded server equipment during the expected lifetime of a server. This restriction limits the server system's ability to accommodate increased demand. In cloud solutions, expenses are recorded on the financial statements each month. They're monthly expenses instead of capital expenses. Because subscriptions are a different kind of expense, the expected server lifetime doesn't limit the IT manager's ability to upgrade to meet an increase in demand. Cloud systems like Azure track costs by subscriptions. A subscription can be based on usage that's measured in compute units, hours, or transactions. The cost includes hardware, software, disk storage, and labor. Because of economies of scale, an on-premises system can rarely compete with the cloud in terms of the measurement of the service usage. The cost of operating an on-premises server system rarely aligns with the actual usage of the system. In cloud systems, the cost usually aligns more closely with the actual usage. In some cases, however, those costs don't align. For example, an organization will be charged for a service that a cloud administrator provisions but doesn't use. This scenario is called underutilization. Organizations can reduce the costs of underutilization by adopting a best practice to provision production instances only after their developers are ready to deploy an application to production. Developers can use tools like the Azure Cosmos DB emulator or the Azure Storage emulator to develop and test cloud applications without incurring production costs.

Describe the work of SQL Server professionals in an on-premises environment.

They work with versions of on-premises SQL Server to meet the data requirements of their organization. They install and configure servers and services to provide the infrastructure to support a solution. These processes can take days to complete. In high-availability environments, the process can even take weeks.

In Azure Synpase Analytics, what language do you use to query data?

Transact-SQL

What query language do relational systems typically use?

Transact-SQL (T-SQL)

Compare scalability of on-premises versus cloud solutions.

When administrators can no longer scale up a server, they can instead scale out their operations. To scale an on-premises server horizontally, server administrators add another server node to a cluster. Clustering uses either a hardware load balancer or a software load balancer to distribute incoming network requests to a node of the cluster. A limitation of server clustering is that the hardware for each server in the cluster must be identical. So when the server cluster reaches maximum capacity, a server administrator must replace or upgrade each node in the cluster. Scalability in on-premises systems is complicated and time-consuming. But scalability in the cloud can be as simple as a mouse click. Typically, scalability in the cloud is measured in compute units. Compute units might be defined differently for each Azure product.

What are the benefits of lift and shift?

When moving to the cloud, many customers migrate from physical or virtualized on-premises servers to Azure Virtual Machines. This strategy is known as lift and shift. Server administrators lift and shift an application from a physical environment to Azure Virtual Machines without rearchitecting the application. The lift-and-shift strategy provides immediate benefits. These benefits include higher availability, lower operational costs, and the ability to transfer workloads from one datacenter to another. The disadvantage is that the application can't take advantage of the many features available within Azure. Consider using the migration as an opportunity to transform your business practices by creating new versions of your applications and databases. Your rearchitected application can take advantage of Azure offerings such as Cognitive Services, Bot Service, and machine learning capabilities.

How are the skills of data managers changing?

Where you work. Your skills need to evolve from managing on-premises database server systems, such as SQL Server, to managing cloud-based data systems. If you're a SQL Server professional, over time you'll focus less on SQL Server and more on data in general. You'll be a data engineer. Types of data and systems SQL Server professionals generally work only with relational database systems. Data engineers also work with unstructured data and a wide variety of new data types, such as streaming data. Skills needed To master data engineering, you'll need to learn a new set of tools, architectures, and platforms. As a SQL Server professional, your primary data manipulation tool might be T-SQL. As a data engineer you might use additional technologies like Azure HDInsight and Azure Cosmos DB. To manipulate the data in big-data systems, you might use languages such as HiveQL or Python.

What is included in the data structure of relational database systems?

the relational model table structure column width data types

When are unstructured data organized?

With unstructured (NoSQL) data, each data element can have its own schema at query time.

Is role-based access control (RBAC) is available in Gen1 and Gen2?

Yes

Does Azure Storage encrypt all data that's written to it?

Yes.

What failover support does Cosmos DB provide?

You can invoke a regional failover by using programing or the Azure portal. An Azure Cosmos DB database will automatically fail over if there's a regional disaster

What does Hadoop store data in

a file system (HDFS)

In Azure Synapse Analytics, what can you use to reduce data movement and improve performance?

a replicated table

BIOS

a set of computer instructions in firmware which control input and output operations.

POSIX

a set of formal descriptions that provide a standard for the design of operating systems, especially ones which are compatible with Unix.

What security and availability are supported with Azure SQL Database?

comprehensive (like other Azure database services)

In the Event Production step of data stream processing, what are common data sources?

applications IoT devices gateways

What are sources of broadcasts of continuous event data?

applications sensors monitoring devices gateways

With ELT, when do data engineers begin transforming the data?

as soon as the load is complete

With unstructured (NoSQL) data, each data element can have its own schema when?

at query time

In what state does Data Lake Storage automatically encrypts data to protect data privacy

at rest

At what data level does Azure Synapse Analytics support security?

at the column and row levels

Where does Stream Analytics handle security in Azure IoT Hub?

at the transport layer between the device and Azure IoT Hub. Streaming data is generally discarded after the windowing operations finish. If you want to store the data, your storage device will provide security.

What are the two tiers of IoT Hub

basic standard

What features does Azure HDInsight support?

batch processing data warehousing IoT data science

What tools can be used to bulk copy data in Azure Synapse Analytics?

bcp SQLBulkCopy API PolyBase

Why to relational systems react slowly to changes in data requirements?

because the structural database needs to change every time a data requirement changes. For example, when new columns are added, you might need to bulk-update all existing records to populate the new column throughout the table.

In relational database systems, when is the structure designed?

before any information is loaded into the system.

What services does Azure Event Hubs provide?

big-data streaming services.

Give examples of nonstructured data.

binary files audio files image files

How does a data engineer set up data ingestion in Stream Analytics?

by configuring data inputs from first-class integration sources. First-class integration sources include: - Azure Event Hubs - Azure IoT Hub - Azure Blob storage

By understanding what can data engineers pick the right tool for the job?

by understanding the data types and capabilities of the data platform technologies.

How do you secure data in Azure Storage?

by using keys or shared access signatures.

How do you deploy Azure Cosmos DB?

by using several API models: - SQL API - MongoDB API - Cassandra API - Gremlin API - Table API

What aspect that sits above Data Lake Storage can vary?

compute

What are the main factors when comparing on-premises and cloud environment options?

computing environment licensing maintenance scalability availability support multilingual support total cost of ownership lift and shift

Generally speaking, how are Azure SQL Database and SQL Server installed on a VM different?

configuration benefits

What are ways to query Azure Cosmos DB?

create stored procedures, triggers, and user-defined functions (UDFs). Or use the JavaScript query API. You'll also find other methods to query the other APIs within Azure Cosmos DB. For example, in the Data Explorer component, you can use the graph visualization pane.

How is structured data defined?

data architects define structures (schema) as they create data storage in databases.

With unstructured (NoSQL) data, each what can have its own schema at query time?

data element

How is real-time data processed in Azure Stream Analytics?

data is ingested from applications or IoT devices and gateways into an event hub or IoT hub. Then, the event hub or IoT hub then streams the data into Stream Analytics for real-time analysis.

What can increase the processing time for on-premises data warehousing descriptive analytics solutions?

data loads

What does Azure Storage offer scalable object store for?

data objects file system services

What has been the affect on data professionals of the increase in the amount of data that systems and devices generate?

data professional want to understand the new technologies, roles, and approaches to working with data

Applications, sensors, monitoring devices, and gateways broadcast continuous event data known as what?

data streams

What do organizations consider when traditional hardware and infrastructure components near the end of their life cycle?

digital transformation projects

What does Azure Synapse Analytics bring together?

enterprise data warehousing Big Data analytics

List the steps in the processing of data streams

event production --> event queuing and stream ingestion --> stream analytics --> storage, presentation, and action

What is an alternative approach to ETL?

extract, load, and transform (ELT)

What level of data security does Azure Storage provide?

fine-grained control over who has access to your data

In nonrelational systems, what does the structure definition point give you?

flexibility to use the same source data for different outputs.

How do you benefit from the multimodel architecture of Azure Cosmos DB?

from each model's inherent capabilities. For example, you can use MongoDB for semi-structured data, Cassandra for wide columns, or Gremlin for graph databases. When you move your data from SQL, MongoDB, or Cassandra to Azure Cosmos DB, applications that are built using the SQL, MongoDB, or Cassandra APIs will continue to operate.

What are two key characteristics of Azure Cosmos DB

globally distributed multimodel

In Azure Synapse Analytics, what three types of distributed tables are supported that can be used to tune performance?

hash round-robin replicated

What is Azure Event Hubs designed for?

high data throughput

Give an example of how ELT has more architectural flexibility than ETL?

how the marketing department needs to transform the data can be different than how the operations department needs to transform that same data

With batch processing in Stream Analytics, where can you store your data before you process it?

in Azure Storage

Where is nonstructured data stored?

in nonrelational systems, commonly called unstructured or NoSQL systems.

How can Azure customers see new technologies in the Azure platform prior to general acceptance?

in preview mode

Where do consumers generate and use data?

in the workplace during leisure time with social media applications

What is HBase

included in HDInsight a NoSQL database built on Hadoop. commonly used for search engines offers automatic failover

What is Storm?

included in HDInsight a distributed real-time streamlining analytics solution

What is Kafka?

included in HDInsight an open-source platform that's used to compose data pipelines. it offers message queue functionality, which allows users to publish or subscribe to real-time data streams

What is Apache Hadoop

includes Apache Hive, HBase, Spark, and Kafka. Hadoop stores data in a file system (HDFS). Spark stores data in memory. This difference in storage makes Spark about 100 times faster.

Azure HDInsight provides technologies to help you do what?

ingest, process, and analyze big data

Who are the parties interested in data?

internal management investors business partners regulators consumers

Once data has been transformed, where does it get loaded?

into the data warehouse

What is the disadvantage of the transformation stage?

it can take a long time, and tie up source system resources.

What price point is HDInsight at?

it is a low-cost solution

What is the function of the IoT hub?

it is the cloud gateway that connects IoT devices. it gathers data to drive business insights and automation.

What is the benefit of switching processes from ETL to ELT?

it reduces the resource contention on source systems.

What is the scalability option for Azure Synapse Analytics?

limitless

What do business stakeholders use data to do?

make business decisions

What do consumers use data to do?

make decisions such as what to buy, for example

How many data platform technologies does Azure provide to meet the needs of common data varieties?

many

Where can Stream Analytics route job output to?

many storage systems, including: - Azure Blob - Azure SQL Database - Azure Data Lake Storage - Azure Cosmos DB

What does SQL Pools use to quickly run queries across petabytes of data?

massively parallel processing (MPP)

What does Spark store data in?

memory

In Azure Synapse Analytics, what authentication is available for high-security environments?

multifactor authentication

What are the connectivity requirements for data consumers?

must be able to view data while connected and disconnected

In nonrelational systems, is the data structure defined at design time?

no

What subscription model allows customer to minimize costs, paying only for what they consume and only when they need it.

on-demand Azure subscription model

Where can data be located?

on-premise in the cloud

Compare licensing of on-premises versus cloud solutions.

on-premises Each OS that's installed on a server might have its own licensing cost. OS and software licenses are typically sold per server or per CAL (Client Access License). As companies grow, licensing arrangements become more restrictive. cloud Cloud systems like Azure track costs by subscriptions. A subscription can be based on usage that's measured in compute units, hours, or transactions. The cost includes hardware, software, disk storage, and labor. Because of economies of scale, an on-premises system can rarely compete with the cloud in terms of the measurement of the service usage. The cost of operating an on-premises server system rarely aligns with the actual usage of the system. In cloud systems, the cost usually aligns more closely with the actual usage.

What are the two environment options for digital transformation projects?

on-premises cloud

In nonrelational systems, when is the data structure defined

only when the data is read

firmware

permanent software programmed into a read-only memory.

What does SQL Database offer?

predictable performance for multiple resource types, service tiers, and compute sizes. requires almost no administration it provides dynamic scalability with no downtime built-in intelligent optimization global scalability and availability advanced security options These capabilities let you focus on rapid app development and on speeding up your time to market. You no longer have to devote precious time and resources to managing virtual machines and infrastructure.

In nonrelational systems, what format is data typically loaded in?

raw

How can data be processed?

real-time or batch

Azure regions and zones

region - A set of datacenters deployed within a latency-defined perimeter and connected through a dedicated regional low-latency network. Availability Zone - Unique physical locations within a region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking.

What does Microsoft do as data formats evolve?

releases new technologies to the Azure platform

What feature does Azure Resource Manager use to set permissions and assign roles to users, groups, or applications?

role-based access control (RBAC)

What the key features of Azure Storage accounts?

scalable and secure, durable, and highly available. Azure handles your hardware maintenance, updates, and critical issues. It also provides REST APIs and SDKs for Azure Storage in various languages. Supported languages include .NET, Java, Node.js, Python, PHP, Ruby, and Go. Azure Storage also supports scripting in Azure PowerShell and the Azure CLI.

With unstructured (NoSQL) data, each data element can have its own what at query time?

schema

Where can you find more information about the APIs that are available in Azure Cosmos DB?

see Choose the appropriate API for Azure Cosmos DB storage.

What other type of data can nonrelational systems also support?

semistructured data such as JSON file formats

What does Azure Event Hubs allow customer to do?

send billions of requests per day

What is the basic setup to process streaming data in Stream Analytics?

set up Stream Analytics jobs with input and output pipelines.

If you need to provision a data store that will store but not query data, your cheapest option is to do what?

set up a storage account as a Blob store. Blob storage works well with images and unstructured data, and it's the cheapest way to store data in Azure.

With SQL Pools, how is it possible to scale compute independently of storage?

storage is separated from the compute nodes

What do businesses do with data?

store interpret manage transform process aggregate report

What can Azure data technologies do?

store transform process analyze visualize

What is Data Lake Storage designed to do?

store massive amounts of data for big-data analytics.

In Azure Synpase Analytics, what can you apply in your applications using PolyBase?

stored procedures labels views SQL

What database programming features are included with Azure Cosmos DB?

stored procedures, triggers, and user-defined functions (UDFs)

The broadcasting of continuous event data is know as what

streaming

What are the structure classifications of data?

structured unstructured semi-structured aggregated (?)

What are the two broad types of data?

structured and nonstructured

What is data structure designed in the form of?

tables

What types of analysis can be performed using the Big Data Analytics capability of Azure Synapse Analytics?

techniques such as exploratory data analysis to identify initial patterns or meaning in the data. conducting predictive analytics for forecasting segmenting data

In what forms does data come?

text stream audio video metadata

What is the base storage type within Azure?

the Azure Storage account

What is the Stream Analytics query language consistent with

the SQL language If you're familiar with the SQL language, you can start creating jobs.

What is the partitioned consumer model in Event Hubs integrated into?

the big-data and analytics services of Azure. These include: - Databricks - Stream Analytics - Azure Data Lake Storage - HDInsight

How can SQL Database ingest data?

through application integration from a wide range of developer SDKs. Allowed programming languages include .NET, Python, Java, and Node.js. Beyond applications, you can also ingest data through Transact-SQL (T-SQL) techniques and from the movement of data using Azure Data Factory.

What is the objective of the module "Survey the services on the Azure Data platform"?

to be able to answer high-level questions about Azure data services.

What are options for ingesting data into Azure Cosmos DB?

use Azure Data Factory create an application that writes data into Azure Cosmos DB through its API upload JSON documents directly edit the document.

In HDInsight, how do you ingest data?

use Hive to run ETL operations on the data you're ingesting. Or, orchestrate Hive queries in Azure Data Factory.

In Azure Synapse Analytics, what technologies can you combine to load data fast?

use PolyBase with additional Transact-SQL constructs such as CREATE TABLE and SELECT.

How do you define job transformations in Stream Analytics?

use a simple, declarative Stream Analytics query language. The language should let you use simple SQL constructs to write complex temporal queries and analytics.

What features does Data Lake Storage Gen 2 provide?

users take advantage of Azure Blob storage, a hierarchical file system, and performance tuning that helps them process big-data analytics solutions. developers can access data through either the Blob API or the Data Lake file API. It can also act as a storage layer for a wide range of compute platforms, including Azure Databricks, Hadoop, and Azure HDInsight, but data doesn't need to be loaded into the platforms.

How does Event Hubs scale out your data system?

using a partitioned consumer model.

When should you use Azure Cosmos DB?

when you need a NoSQL database of the supported API model, at planet scale, and with low latency performance.

When should you use SQL Database?

when you need to scale up and scale down OLTP systems on demand. when your organization wants to take advantage of Azure security and availability features. when you want to avoid the risks of capital expenditures and of increasing operational spending on complex on-premises systems. when you want more flexibility than an on-premises SQL Server solution because you can provision and configure it in minutes. when you want your solution to be backed up by the Azure service-level agreement (SLA).

What do you use Azure Storage as the storage basis for?

when you're provisioning a data platform technology such as Azure Date Lake Storage and HDInsight. But you can also provision Azure Storage for standalone use. For example, you provision an Azure Blob store either as standard storage in the form of magnetic disk storage or as premium storage in the form of solid-state drives (SSDs).

If your organization must respond to data events in real time or analyze large batches of data in a continuous time-bound stream, what must your organization decide?

whether to work with streaming data or batch data

How is pay-for-compute implemented in Azure Synapse Analytics?

you can pause and resume the compute layer. This means you pay only for the computation you use

What are the basics extraction steps of a data engineer?

you'll extract raw data from a structured or unstructured data pool and migrate it to a staging data repository.


Related study sets

Lab Diagnostics - Cardiac Enzymes & Lactate

View Set

Intro to Java Programming Chapter 10

View Set

Human Development Final Exam Review

View Set

Chapter 2-ENTR-202: Small Business Entrepreneurs: Characteristics and Competencies

View Set

UWorld Gastrointestinal/Nutrition

View Set

Recursive Formulas for Arithmetic Sequences

View Set

SUBJECT and OBJECT PRONOUNS: Replace the object and subject with the correct pronoun!

View Set

Osha (this has some of the assessment questions you just have to search for it)

View Set