Explore Core Data Concepts
BLANK is capturing raw data streaming from various sources and storing it
Data Ingestion
What are the some examples of a key value database?
Data cache - collecting from website Session management Product management
What type of job role is tasked with managing and organizing data, while also monitoring for trends or inconsistencies that will impact business goals. It's a highly technical position, requiring experience and skills in areas like programming, mathematics, and computer science. But also need soft skills to communicate data trends to others in the organization and to help the business make use of the data it collects.
Data engineers
What task is the process of capturing raw data
Data ingestion
What are the typical tasks of an analytical system?
Data ingestion, Data transforming, data processing, data query, data visualizations
Why are distributed databases are widely used in many organizations?
Data is stored across different physical locs It may be held in multiple computers located in the same physical loc. Ex: a datacenter If there is an update - it may take time to update to a distributed database across multiple locs
After data is ingested and transformed, what is the task of quering and analyzing?
Data query
What are the four types of categories that are differences between batch and streaming data?
Data scope, Data size, Performance, Analysis
What is semi-structured data?
Data structure is defined within the actual data by fields, contains fields, fields don't have to identify a certain entity
What task is looking at raw data, reviewing formatting, eliminating anomalies?
Data transforming
What task is visualizing the data that can be useful to examine that data?
Data visualization
What are the disadantages of a key value database?
Disadvantages: Consistency needs to be given consideration as transactional updates across multiple entities aren't guaranteed Any relationships between rows would have to be done externally to the table Difficult to sort non key data
Give an example of a non-relational database
Document database
In a BLANK, each document has a unique ID, but the fields in the documents are transparent to the database management system
Document databse
Which NoSQL database describes this? Each document has a unique ID Binary document file Keys are used to locate and individual documents Documents are organized by tags, metadata, or collections If there are updates, only where it needs to be changed
Document store
What is unstructured data?
Does not naturally contain fields Examples: video, audio,media streams Often used data form and categorize or identify "structures" Frequently used in combination with machine learning or Cognitive Services capabilities to "extract data" by using
What is semi-structured
Doesn't reside in relational data but has some structure to it. Ex: doc in javascript, key-value stores, graph databases
Which ACID role does this fall under? guaranteed that once a transaction has been committed, it will remain committed even if there's a system failure such as power outage or crash
Durability
What does atomic mean?
Either all operations in the sequence must be completely successful, if something was to go wrong, all operations run so far in a sequence must be undone.
What are some examples of streaming data?
Financial institution tracks changes in the stock market in real time, computes value-at-risk, automatically rebalanced portfolios Online gaming company collects real-time data about player-game interactions and feeds the data into its gaming platform. Analyzes the data in real time - incentive for players to stay engaged real -estate website tracks a subset of data from consumers' mobile devices and makes real-time property recommendations of properties to visit based on their geo-location
What Azure service describes below: Azure enables you to create a virtual infrastructure in the cloud that is much like the way an on-premises data center might work You can create a set of virtual machines, connect them using a virtual network, and add a range of virtual devices Much like an on site premises but you don't have to purchase the hardware and worry about maintaining the hardware But you're still responsible for the day to day operations, such as installing and configuring the software, taking backup and restoring data as needed
IaaS
What service describes below: Run any software for which you have appropriate licenses using this approach Best for migrations and applications requiring operating system-level access SQL virtual machines are "lift and shift". You can copy your on-premises solution directly to a virtual machine in the cloud.
IaaS
Which cloud service is the best for migrations and operations requiring level access
IaaS
Where is the Azure SQL Database service located?
In the cloud
IaaS - stands for
Infrastructure as a service
What does IaaS stand for?
Infrastructure as a service
What is IaaS
Infrastructure as a service. (IaaS) is an instant computing infrastructure, provisioned and managed over the internet. It's one of the four types of cloud services, along with software as a service (SaaS), platform as a service (PaaS), and serverless.
What are the common non-relational database use cases?
IoT (internet of things) and Telematics: Retail and marketing Gaming Web and Mobile applications
Which non relational database case would this fall under? These systems typically ingest large amounts of data in frequent bursts of activity. Non-relational databases can store this information very quickly. The data can then be used by analytics services such as Azure Machine Learning, Azure HDInsight, and Microsoft Power BI. Additionally, you can process the data in real-time using Azure Functions that are triggered as data arrives in the database.
IoT and telematics.
Which ACID role does this fall under? the funds have been deducted from the acct, but not yet credited to another
Isolation
Which ACID role is this? Ensures concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
Isolation
What are some examples of when batch processing is done?
It can be set at a scheduled time, or business decision makers will decide on a "low point" or "idle" point , or when a certain amt of data has arrived
What is an Index?
It helps you search up info in a table
Why do indexes come at a cost?
It increases storage space and must be maintained
What are the common ways to store semi-structured data?
JSON - Java script object notation AVRO - Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop. These services can be used together or independently. Avro facilitates the exchange of big data between programs written in any language.It uses JSON for defining data types and protocols, and serializes data in a compact binary format ORC - The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. ... It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data. Organize data in columns rather than rows Parquet - Apache Parquet is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment
The term JSON stands for BLANK ; it's the format used by JavaScript applications to store data in memory, but can also be used to read and write documents to and from files.
JavaScript Object Notation
A BLANK can ingest large amounts of data rapidly
Key value sstore
Give some examples of IoT (internet of things) and Telematics:
Large amts of data and frequent bursts of activity Process data in real time Use MS Azure services - Azure Cosmo DB, Analytics, Power BI Examples of IoT - alarm, Traeger grill
What are the advantages of batch processing?
Large volumes of data can be processed It can be scheduled to run at a time when computers or systems or idle such as overnight
What is NoSql?
Loose term, to describe non-relational Documents, rather than tables Fall under four non-relational databases which is column/wide value, document store, graph, key value store
As the head of sales team, would a manager prefer to see daily transactions (transactional info) or monthly report to identify trends and insight (analytical info)? Why or why not?
Monthly report. The purpose of an analytical workload is to capture trends and insight, this will allow the manager of the sales team to see the bigger picture.
BLANK data is an all-encompassing term that means anything not structured as a set of tables
Non-relational data i
What database does not impose a schema on data? Instead, they focus on the data itself rather than how to structure it. This approach means that you can store information in a natural format, that mirrors the way in which you would consume, query and use it.
Non-relational databases
A process when you have to split an entity to more than one table
Normalization
What is the name of the process where typically the end result is that your data is split into a large number of narrow, well defined tables?
Normalization
Splitting tables out into separate groups of columns like this is called what?
Normalized
What is unstructured data?
Not all data is structured or even semi-structured. Ex. audio and video files, binary data values might not have a specific structure
What applications are focused on transaction- oriented tasks that process a large number of transactions per minute.
OLTP
these are examples of what for relational databases: Banking solutions Online retails applications Flight reservation systems Online purchasing applications
OLTP
Work performed by transactional systems is often referred to as what?
OLTP - Online transactional processing
What are the benefits of on-premises for data management system?
On premises Personal control of data security Low operational expenditure
Why is transactional info an integral part of analytical info?
One goes with the other, both rely on each other. Efficiency on how it's been handled is key
What does OLTP stand for?
Online Transaction Processing
A what uniquely identifies each row in a table. No two rows share the same what?
PRIMARY key
Azure SQL databases, virtual databases describe what
PaaS
What service describes the below? Rather than creating a virtual infrastructure and installing and managing the database software yourself, a what is the route to take You specify the resources that you require (based on how large you think your database will be, number of users, and performance you require and Azure automatically creates the necessary virtual machines, networks, and other devices for your You can usually scale up or down (increase or decrease the size and number of resources
PaaS
To help ensure fast access, Azure Table Storage splits a table into blank. Blank is a mechanism for grouping related rows, based on a common property or partition key. Rows that share the same partition key will be stored together. Partitioning not only helps to organize data, it can also improve scalability and performance:
Partitions, Partitioning
PaaS - stands for what?
Platform as a service
What does PaaS stand for?
Platform as a service
What is the act of setting up the database service called?
Provisioning
What are the three different levels to access data?
Read only, read/write, and owner privledge
What are the characteristics of a graph store?
Relationships between entities Nodes and edges Enable an application to effectively perform Simple to perform queries Can perform complex analyses API (Application Programming Interface) standard language for graph database
Which non relational database case would this fall under? Microsoft uses CosmosDB for its own ecommerce platforms that run as part of Windows Store and Xbox Live. It's also used in the retail industry for storing catalog data and for event sourcing in order processing pipelines.
Retail and marketing
IaaS
SQL server in Azure virtual machines, virtualized machines describe what?
What is a benefit of PaaS instead of an on premises system?
Scalability - you can scale up or out without having to worry about procuring your own hardwared
What are the benefits of a data management system on the cloud?
Scalable Hardware maintained Software maintained Low capital expenditure
What are the characteristics of key value storage
Scalable, key-value store in the cloud. You create a table using an Azure storage acct Items are referred to as rows, and fields are known as columns Enables you to store semi-structured data. Unlike traditional relational databases, each row DOES NOT have to be the same number of columns. All rows require a key, but apart from that the columns in each row can vary Azure table has no concepts of relationship, no require PK, FK, stored index, secondary index For example example, using Azure Table Storage provides much faster access to the details of a customer because the data is available in a single row, without requiring that you perform joins across relationships.
BLANK data is data that contains fields. The fields don't have to be the same in every entity. You only define the fields that you need on a per-entity basis.
Semi-structured
Doc in javascript, key-value stores, graph databases are examples of what class of data?
Semi-structured
What does SaaS stand for
Software as a service
Benefits of non relational data
Store large amt of data with little structure Useful for data that contains video, audio, images, temporal information, large volumes of free text, encrypted information, or other types of data that aren't inherently relational
Give an example of Retail and marketing
Storing catalog data
Data ingestion is what type of processing?
Streaming
What is the difference between batch and streaming data?
Streaming - processing data as it arrives Batch processing - Buffering and processing the data in groups
What type of processing is each new piece of data is processed when it arrives?
Streaming data
What type of processing would be used for time-critical operations? Give an example.
Streaming processing. Example: systems that monitor a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately
Tabular data, same number of rows and columns are examples of what class of data?
Structured
What class of data is typically stored in a relational database such as SQL service or Azure SQL Database - this service runs in the cloud.
Structured
What type of workload is a document store database?
System capacity - view, editing, sharing Doesn't have to be structured Semi-structured, business streamline
Why are Database systems that process transactional workloads are inherently complex?
There are many different things happening all at once. Many systems apply relational consistency and isolation by applying locks to data when it's updated - prevents another process from reading that data until the lock is released
What are the disadvantages of batch processing?
Time delay between ingesting the data and getting the results Batch inputs must be ready before a batch can be processed. Careful checks must be done. If there is an issue, it could create delays
What is the purpose to normalize data?
To reduce storage, avoid data duplication, improve data quality
What is the primary role of relational databases?
Transaction processing
Records transactions, movement of money between bank accounts, retail system are examples of what type of processing system?
Transactional system
What are the two elements in a key value data store?
Two elements: key and a value Key - identifies the item Value - holds the data for the item
Azure portal can typically do what?
Typical configuration tasks such as increasing the database size, creating a new database, and deleting an existing database
What is structured data?
Typically tabular data that is represented by rows and columns in a database. Databases that hold tables in this form are relational databases (mathematical term relation refers to an organized set of data held as a table). Each row in a table has the same set of columns. Ex: table with customer id and contact info
Audio and video files are examples of what class of data?
Unstructured
What is the purpose of a view?
Views are created to simplify the query, combine relational data into a single pane view
Which non relational database case would this fall under? A non-relational database such as Azure Cosmos DB is commonly used within web and mobile applications, and is well suited for modeling social interactions, integrating with third-party services, and for building rich personalized experiences. The Cosmos DB SDKs (software development kits) can be used to build rich iOS and Android applications using the popular Xamarin framework.
Web and mobile applications
PaaS
What service describes the below? Azure handles the scaling or you and you don't have to manually add or remove virtual machines, or perform any other form of configuration. Azure offers several PaaS solutions for relational databases such as Azure SQL Database, Azure Database for PostgreSQL Azure Database for MySql These services run managed versions of the database management systems on your behalf, you just connect to them, create your databases, and upload your data. There may be restrictions often due to security issues
What are the reasons why batch processing may NOT be effective?
When real time is required, small amounts of data (ex. financial stock ticker)
What can you do with IaaS
You can create a virtual machine, run them together using a virtual network, and add a range of virtual devices. In many ways, this approach is similar to run your systems inside an organization, except that you don't have to worry about buying or maintaining the hardware. However, you're still responsible for the day to day operations. It's a way to house and manage systems in the cloud.
What are two two options when moving your operations and databases to the cloud.
You can select the IaaS approach, or PaaS
What is data?
a collection of facts such as numbers, descriptions and observations used in decision making. Classified as structured, non-structured, and semi-structured
Some relations database management systems also support what? A what? physically reorganized a table by the index key In data management systems that support the, a table can only have a single clustered index
clustered index
The partition key and row key effectively define a what
clustered index over the data.
The most widely used BLANK database management system is Apache Cassandra. Azure Cosmos DB supports the column-family approach through the Cassandra API.
column family
Which database fits the below description Workload: Low latency, easy to access Easy to move around Examples: Census bureau Breakdown minute tables
column/wide database
Which database fits the description? Data is stored in cells grouped in columns rather than rows as data
column/wide database
Which database has the characteristics below? Organizes data into rows and columns Store structure data storage into collectives Read a single column family w/o reading through all the data Scalable
column/wide store database
A relational database has all rows in the same table have the same number of what
columns
The rows in a table have one or more columns that define that properties of the entity, such as customer name and address. All rows in the same table have the same number of what?
columns
What are the reasons that batch processing are effective?
connection to a main frame system, vast amounts of data, doesn't have to be real time
What is the SQL demand for the following? create a table to store data in a table to modify data in a table to remove rows from a table retrieves data from a table
create, insert, update, delete, select
What type of processing is simply the conversion of raw data to meaningful info through a process
data processing
How is data stored? Depends on what?
depends on structured, semi-structured, and unstructured data
Most BLANK will ingest large volumes of data more rapidly than a relational database, but aren't as optimal as a key-value store for this type of processing. The focus of a document database is its query capabilities.
document databases
In a document store, an application can retrieve documents by using the BLANK. The key is a unique identifier for the document. Some document databases create the document key automatically. Others enable you to specify an attribute of the document to use as the key. The application can also query documents based on the value of one or more fields. Some document databases support indexing to facilitate fast lookup of documents based on one or more indexed fields.
document key
A BLANK does not require that all documents have the same structure. This free-form approach provides a great deal of flexibility. Applications can store different data in documents as business requirements change.
document store
What type of database are the examples for? Product catalog Content management Inventory management
document store
A blank describes the information known or to be held
entity
A table contains rows, each row contains a what?
entity
In a relational database, you model characteristics of a what as tables
entity
The what columns reference or link to the primary key of another table, and are used to maintain the relationships between tables.
foreign key
What is allowed in owner priviledge access?
full access to the data including managing the security, add new users, remove access to existing users
What type of workload does the database have below? High level and low level - organizational Social graphs
graph
Which database has a collection of nodes and edges?
graph
Which database is mostly used for social networks?
graph
Azure Cosmos DB supports WHAT using the Gremlin API. The Gremlin API is a standard language for creating and querying graphs.
graph databases
What is Azure Table Storage?
implements the NoSQL key-value model. In this model, the data for an item is stored as a set of fields, and the item is identified by a unique key.
Azure SQL database provides database services in where?
in the cloud
A what helps you search for data in a table.
index
When you create a what?, you specify a column for a table, and the what? contains a copy of this data in a sorted order, with pointers to the corresponding rows in the table When the user runs a query that specifies this column in the WHERE clause, the database management system can use this index to fetch the data more quickly that if it had to scan through the entire table row by row
index
A typical relational database contains other structures that help to optimize data organization and improve the speed of access - two types of structures are what?
index and view
What is this referring to?, it might consume additional storage space and it must be maintained. The additional work can slow down insert, upcate, and delta operations and incur additional processing charges.
indexes
More advanced non-relational systems support BLANK, in a similar manner to an index in a relational database. Queries can then use the BLANK to identify and fetch data based on non-key fields.
indexing
What are edges?
information about the relationships between objects
You can combine the data from multiple tables in a query using a what? A what? spans the relationships between tables, enabling you to get data from more than one table at a time.
join operation
What is the simplest type of NoSql database?
key value
What type of NoSql describes the below? Hold a single serialized for each key, comes from a table value Supports simple query, insert, and delete operations Optimize applications, performing simple lookups Store data that does not require complex joins, foreign keys,or stored procedures Good way to capture event logging and performance Writing data is fast, static
key value
You are building a system that monitors the temperature throughout a set of office blocks, and sets the air conditioning in each room in each block to maintain a pleasant ambient temperature. Your system has to manage the air conditioning in several thousand buildings spread across the country or region, and each building typically contains at least 100 air-conditioned rooms. What type of NoSQL data store is most appropriate for capturing the temperature data to enable it to be processed quickly?
key value - stores large amts of data quickly
Azure Table storage is an example of a BLANK . Cosmos DB also implements a key-value store using the Table API.
key-value store
Give an example of gaming when non relational databases are used?
leadership board, Must be fast, handle massive spike
The characteristic below describe what kind of data? Store large amt of data with little structure Useful for data that contains video, audio, images, temporal information, large volumes of free text, encrypted information, or other types of data that aren't inherently relational
non relational
What type of database enables you to store data in a format that more closely matches the original structure
non relational
In a BLANK database Each entity in a BLANK only has the fields it needs, and these fields can vary between different entities.
non relational database
NoSQL is a rather loose term that simply means BLANK
non-relational
What type of database enables you to store data in a format that more closely matches the original structure
non-relational
In a blank system you store the information for entities in collections or containers rather than relational tables
non-relational system
The term BLANK means that the database management system just sees the value as an unstructured block. Only the application understands how the data in the value is structured and what fields it contains. The opposite of BLANK is transparent. If the data is transparent, the database management system understands how the fields in the data are organized. A BLANK table is an example of a transparent structure.
opaque, opaque, relational
What is the purpose of an index?
optimize search inquiry for faster data retrieval, reduces the amt of pages that need to be read to retrieve the data in a SQL statement, data is retrieved by joining tables together in a query
The key in an Azure Table Storage table comprises two elements; the blank that identifies the blank containing the row (as described above), and a row key that is unique to each row in the same partition.
partition key
Every table should have what? . The what indicates the column (or combination of columns) that uniquely identify each row.
primary key
Data in rows and columns is what called of database?
relational
Databases holding cust info, can have more than one address is what type of database?
relational
What type of databases are used to track inventories, process ecommerce transactions, manage huge amounts of mission-critical customer information?
relational database
ecommerce systems, but one of the major use cases for using relational database is OLTP (online translation processing) are common uses of what?
relational database
A BLANK database restructures the data into a fixed format that is designed to answer specific queries. When data needs to be ingested very quickly, or the query is unknown and unconstrained, a BLANK database can be less suitable than a non-relational database.
relational,
Give an example of an analytical workload?
report on monthly sales
Non-relational data generally falls into which two categories;
semi-structured and non-structured
Most relational databases support what. You use what to create tables, update, and delete rows in tables, and to query tables.
sql
You can defined relationships between tables using PK and FK and you can access the data in tables using what?
sql
What type of processing is data processed as individual pieces rather than being processed a batch at a time
streaming
All data is what. Entities are modeled as tables, each row has an entity, each entity is defined as a column
tabular
What does latency mean?
the time taken for the data to be received and processed
What is allowed in read-only access?
users can read data but can't modify any existing data or create new data
What is allowed in read/write access?
users can view and modify existing data
A what is a virtual table based on the result set of a query.
view
What type of structure are the characteristics below referring to? You can think of a view as a window on specific rows in an underlying table You can query the view and filter the data in much the same way as a table A view can also join tables together
view
Give an example of a relational databse
A database holding customer info
What are nodes?
A graph contains nodes (information about objects)
What type of table has few columns, with references from one table to another
A narrow table
Explain nodes and edges?
A node represents an an entity. An edge represents the relationships or connections between entities
Give an example of a disadvantage of a non-relational database
A relational database, if there is an address change, you would only need to change it once. For non-relational you would have to change it in two different areas. In a doc database, the address would be duplicated. This route would increase storage space, but also make maintenance more complex, such as address, change, you would have to modify two docs
What is a view of a table?
A view is a virtual table based on the result of query
Foreign key
A what references rows in another, related table. Each value in the foreign key should be a shown with the same value in the corresponding PK column in the other table
Transaction database must follow the what acronym?
ACID
Data ingestions, data transformation, data querying, data visualization are typical tasks of what type of system?
Analytical
To support business users who need to query data and gain BIG PICTURE view of the info held in a database is what type of system?
Analytical
What type of system captures data and use it to generate insights?
Analytical
Data processing often falls into what two data processing sytems?
Analytical and Transaction processing systems
What are the characteristic of analytical workloads?
Analytical workloads typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels Used for data analysis and decision making Business information Snapshot of the data at a given point of time or a series of snapshots - business makers want the bigger picture
Either all operations in the sequence must be completely successful, if something was to go wrong, all operations run so far in a sequence must be undone. This type of sequence of operations is called what?
Atomic
Transaction is a sequence of operations that are what? either all operations in the sequence must be completely successful, if something was to go wrong, all operations run so far in a sequence must be undone.
Atomic
What guarantees that each transaction is treated as a single unit, either it succeeds completely or fails completely.
Atomocity
ACID stands for what?
Atomocity, Consistency, Isolation, Durability
Using Azure, where can you store unstructured data such as video or audio files?
Azure Blob storage
BLANK is a multi model database and supports key value, document, graph, and column value models
Azure Cosmos DB
BLANK provides a graph database service via the Gremlin API on a fully managed database service designed for any scale.
Azure Cosmos DB
BLANK provides the Table API for applications that are written for Azure Table storage and that need premium capabilities like: Turnkey global distribution. Dedicated throughput worldwide (when using provisioned throughput). Single-digit millisecond latencies at the 99th percentile. Guaranteed high availability. Automatic secondary indexing.
Azure Cosmos DB
Using Azure, where can you store semi structured data such as docs?
Azure Cosmos DB
Give an example of Web and Mobile applications when non relational databases are used?
Azure Cosmos DB Integrating 3rd party services
Non-relational systems such as BLANK , support indexing even when the structure of the indexed data can vary from record to record.
Azure Cosmos DB (a non-relational database management system available in Azure)
Where can you dynamically manage and adjust resources such as the data storage size and the number of cores available for the database processing. These tasks would require the support of a system administrator if you were running the database on-premises.
Azure portal
You can manage Azure SQL database using what? .
Azure portal
Votes counted when completed is an example of what type of processing?
Batch
What type of processing is data is collected into a group. The whole group is then processed at a future time in a batch.
Batch
Describe performance in batch and streaming data
Batch - the latency for batch processing is typically a few hours Stream typically occurs immediately - order of seconds or milliseconds
Describe analysis in batch and streaming data
Batch - typically for complex analytics Stream - simple response functions: aggregates, or calcs such as rolling averages
Describe data size in batch and streaming data
Batch is suitable for handling large datasets efficiently Stream is for individual records or micro batches (few records)
Describe data scope in batch and streaming data
Batch processing can process all the data in the data set Streaming typically only has access to the most recent data received, or within a rolling time window (ex. 30 seconds)
What does the acronym Blob stand for?
Binary Large Object
What are the three ways data is generally structured?
Classified as structured, non-structured, and semi-structured
In this example, what ACID role is this? bank transfer. If funds were added to the account, then there must be a corresponding deduction of funds somewhere or a record that describes where the funds come from - can't suddenly lose (or create money)
Consistency
What ensures that a transaction can only take the data in the database from one valid state to another.
Consistency
BLANK supports several common models of non-relational database, include key-value stores, graph databases, document databases, and column family stores.
Cosmos DB