AWS Databases - AWS Certified Database Specialty - Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is data management tasks?

1) Data ingestion - getting data into the target system is a major task. Understand options, tools and constraints available for the particular database setup. Refactoring is a database is a major task. 2) Backup/restore - major operational component and each service has options. Knowing the backup requirements, is it a scheduled or manual backup. Should the backups be encrypted or plain text? How much time do you have for Backup and Restore? Can it be done with the database online, or do you have to take the database offline? Having the proper data protection in place can make or break a business when disaster strikes, so getting Backup and Restore set up right is one of the most important tasks of your database implementation. 3) Automation - Consistent, reliable automation can help reduce the chance of introducing errors into your solution when performing data management tasks manually. Other AWS services, as well as native and third‑party tools, are available to help you automate data management tasks. Data management decisions often affect the performance, cost, and complexity of your solution. Be aware of how data management can help you bring those constraints into the proper balance for your implementation.

What does it mean to be a database specialist?

1) Know what database to use in a given scenario. 2) Know how to use that database, requiring both breadth and depth. 3) Breadth of Knowledge allows you to choose the best technology. 4) Depth of knowledge allows you to use that database in the best possible way. Allowing you to set up secure and robust systems, Managed tasks, geographical restrictions, data size constraints 5) Knowing what not to use in certain use cases

What are the database design constraints and requirements?

1) Time - Knowing the time constraints and requirements - query performance SLA, Backup/Restore, Uptime SLAs 2) Cost - components, incremental per unit, allocation of costs to which users, understanding incremental resource costs. Accounting for costs and providing appropriate audit data is also important, particularly in multitenant situations where you need to allocate cost back to different users of your application. Cost and performance are closely tied together, and being able to define that relationship helps you understand the incremental cost for adding resources to boost performance, as well as the incremental cost for adding more data to your database. 3) Security - encryption, credentials, access control, encryption keys. Security constraints are nonnegotiable. Knowing what is and is not supported for your database can help you avoid a partial implementation that has to be scrapped because the proper security wasn't considered from the beginning. Encryption is built into some services, while others need to be configured to enable that feature. You'll also have different options when it comes to encryption keys and access policies, all of which can affect the other constraints you're working with. 4) Complexity - ease of implementation and maintenance. In meeting the other constraints, how complex is your solution? Is there anything you can do to simplify the design or the number of services involved in your solution? For example, some backup operations can involve additional AWS services, while others happen automatically within the database service itself. You may be able to increase performance, but it could add a few more layers of complexity to your database implementation.

The AWS Certified Database ‑ Specialty exam separates the content into the following five domains:

1) workload‑specific database design 2) deployment and migration 3) management and operations 4) monitoring and troubleshooting 5) database security

What are characteristics of AWS CloudHSM (hardware security model) ?

AWS CloudHSM is the service to use if you require a single tenant hardware security module to manage your keys or if you need to interact directly with the HSM. CloudHSM launches into your own VPC so you have total control over access to the device. You pay an hourly charge for each CloudHSM you launch. This is different than the previous generation where you had to pay an upfront fee for each device. Now you simply pay an hourly charge for each device you use. With CloudHSM, AWS automates the hardware provisioning, patching, and backups of the underlying HSMs so you can interact with your cluster as one logical HSM. You can add and remove HSMs from your cluster on demand, and CloudHSM will automatically load balance requests and securely duplicate keys to all HSMs in the cluster.

What are characteristics of AWS Web Application Firewall (WAF) ?

AWS Web Application Firewall, or WAF, allows you to filter traffic with rules based on any part of the web request. You can use managed rules, which are designed to protect against common threats including SQL injection, and are automatically updated as new issues emerge. You can use WAF with CloudFront, Application Load Balancers, or API Gateway. Depending on your database service or application, WAF may be a good addition to filter incoming requests. Amazon GuardDuty is a threat detection service that continually monitors events across multiple AWS data sources, such as, AWS CloudTrail, Amazon VPC Flow Logs, and DNS Logs. GuardDuty uses machine learning to establish a baseline for your normal account activity and assigns threats to a category and severity. You can even integrate findings into services like lambda to automatically take actions for remediation or prevention. This adds another layer of protection around your account activity to help avoid vulnerabilities.

Describe the characteristics of the AWS migration service?

AWS provide some tools to help make migrating to the cloud easier. Over 300,000 databases have been migrated using AWS Database Migration Service. With AWS DMS, migration starts while the original database stays live. It takes care of replicating any changes that occur in the source database during the migration process, so you don't have to worry about missing anything, and you can migrate with virtually no downtime. It's also highly resilient and self‑healing, So if there is an interruption, it automatically restarts the process and continues migration from where it stopped. You can view metrics to track the progress of your migration, and you can also use AWS DMS for replication tasks across regions or to help build databases in test environments. AWS DMS supports most commercial and open‑source databases. You can perform homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms such as Oracle to Amazon Aurora.

What are characteristics of AWS regions?

AWS regions are geographically separated around the world. In the history of AWS, there have only been a few outages that took out a service in an entire region. For example, S3 was down in the North Virginia region for 4‑5 hours in 2017, and EC2 was down for 1‑5 hours in the Sydney region in 2016. While regional outage mitigation is one reason to go multi‑region, a more common case for operating across regions is to get data closer to your users. A multi‑region architecture is going to cost the most and be the most complex. So unless you have a specific requirement for multi‑region, you can usually stick with multi‑AZ and achieve the necessary availability for your database system.

What are some characteristics of Amazon Aurora?

Amazon Aurora is a cloud‑native relational database. It is designed to solve RDBMS challenges by leveraging cloud features. For read‑intensive applications, Aurora can have up to 15 low‑latency read replicas across three availability zones. Each Aurora database instance can also handle up to 128 TB of database storage. To make your data durable, Aurora replicates six copies of your data across three availability zones and continuously backs up data to S3. You can also enable global database, which allows a single Aurora database to span multiple AWS regions. Aurora is MySQL or PostgreSQL compatible, so there's no new language or specialized tools to learn. Much of the database administration is built in and automatic, so you won't need an army of DBAs just to keep the lights on. Finally, Aurora is about one‑tenth the cost of commercial databases with similar performance and features. You can also utilize Aurora Serverless, which automatically scales up and down based on usage so you don't have to worry about over or underprovisioning your database instances.

What are the current available Amazon RDS database engines?

Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, SQL Server

What service does Amazon CloudWatch provide?

Amazon CloudWatch provides visibility into your cloud resources and operations. You can collect and store logs, as well as view built‑in and custom metrics. including dashboards to visualize your metrics, alarms to trigger alerts and actions based on certain conditions of your metrics, insights to run queries across CloudWatch data and determine the health of applications, and events to respond to operational changes if you're using RDS, you can utilize RDS Performance Insights to analyze and tune your database performance. For a given database instance, it shows load over time and the top queries, users, and hosts that are generating that load. You can use the interactive dashboard to change time ranges and drill into your data to identify performance issues. Redshift provides a similar dashboard that you can access from your cluster menu in the AWS console and gain insights into CPU, RAM, storage used, connections, query runtime, and query load.

What are some automation solutions for migrations and database creation that Amazon offers?

Amazon Database Migration Service, AWS Schema Conversion Tool, You also have the option to use native database tools. AWS Snowmobile or Snowball to speed up the process. Amazon CloudFormation can help you represent your infrastructure as code so you can automate database deployments and ensure consistency across your database environments.

What are some characteristics of Amazon Key Management Service?

Amazon Key Management Service, or KMS, is a service you will see listed for use with most database services to assist with data encryption. KMS is a managed service that provides you centralized control over the lifecycle and permissions of your encryption keys. KMS uses hardware security modules, or HSMs, that meet many industry standards for secure key generation and storage. You can control which users have access to administer each key and which users have access to use those keys. You can also enable automatic yearly rotation of master keys. Don't worry, you won't have to bulk reencrypt everything when a key rotates. KMS automatically keeps previous versions of keys to use for decryption of data encrypted under an old version of the key, then all new encryption requests are encrypted under the newest version of the key. As a managed service, KMS dynamically scales to meet your demand, is extremely durable, 11 9s, and is integrated with many AWS services for ease of use, including CloudTrail so you can audit who used what keys when and where they were used. When you generate a master key in KMS, you can import your own key or have KMS generate it for you. You'll then decide which IAM users and roles can use the key and which can administer the key. Integrated AWS services generate a data encryption key to encrypt the data in the service. KMS uses envelope encryption to encrypt the data encryption key with a master key from KMS. You can choose to let AWS manage the customer master key used to encrypt the data encryption key for the service, or you can manage the CMK used for that envelope encryption.

What is an availability zone?

An availability zone is an isolated set of resources within an AWS region that are separated such that a fire or power outage in one availability zone should not affect other availability zones. AZ's are connected by high speed networks. Running a databases across multiple AZs should not add significant latency. Most of the available managed database solutions are multi‑AZ by default, since that configuration meets most production requirements for high availability.

Describe the optimization options for read heavy and write heavy databases?

Another performance consideration is to determine if you're database is read‑heavy or write‑heavy, meaning, are the majority of query operations performing reads of existing data or are queries writing new information to the database? For read‑heavy databases, the option to add additional read replicas can help lighten the load on your database. For write‑heavy databases, you may employ sharding to distribute the load across machines. Let's take a look at each of those strategies, keeping in mind that there is some overhead associated with both read replicas and sharding, so be sure to test and verify the benefits for your particular application. You can think of a read replica as the same data copied to multiple machines. A read replica contains up‑to‑date information for the database and is only used to process read queries, so nothing is written directly to these instances, they're all updated from the same source database. For example, if your overall system is experiencing 1000 queries per second and you have four read replicas to service the load, then each read replica can effectively handle only 250 queries per second, instead of a single instance handling all 1000 per second. Depending on your database performance, distributing a smaller query volume across more instances can lead to faster response times than letting a single instance handle all of the load. For some database services, adding read replicas is a fairly simple process, so understand how the database service you are using handles read replicas if that is something you may need. You can think of sharding as copying different data to multiple machines. Sharding involves splitting the database across different machines based on a shard key in order to distribute the load. The idea is that if you have 100 queries trying to write to the database, wha

What are the factors in adjusting your database cost model?

As you scale up or down, you may need to adjust your pricing strategy for a particular database service. For example, reserved instances and savings plans are good options for more continuous workloads. Whether or not you use managed services can also be a cost factor. In some cases, managed services may cost more on your AWS bill, but you may spend less on resources to support your AWS infrastructure.

Name the Amazon Database Service offerings ?

Aurora: Relational Database Built for the Cloud ElastiCache: Boost Database Performance DynamoDB: Operate Your Database at Any Scale Keyspaces: High-performance Managed Cassandra DocumentDB: High Throughput Managed MongoDB Redshift: Data Warehousing in the Cloud Neptune: Graph Database Built for the Cloud Quantum Ledger Database: Immutable System of Record Timestream: Managed Time Series Database You started out seeing the importance of using the right database for the type of data you'll be working with and the access patterns you expect. You then saw Amazon RDS where you can utilize a variety of managed relational database engines with less effort than doing it all yourself. Next, you saw Amazon Aurora, the relational database engine that leverages performance and cost optimizations for the cloud, followed by ElastiCache, the caching service you can use to boost database performance. You learned about DynamoDB and how it can scale to handle very large transaction loads, along with Keyspaces and DocumentDB if you're looking to move Cassandra or MongoDB workloads to the cloud. You saw how Redshift enables data warehousing in the cloud, and Neptune is available if you need a graph database. Finally, you learned how QLDB can help you create an immutable, verifiable system of record and how Timestream can increase performance and lower costs for working with your time series data. The rest of this course will go into more detail on the five domains of what you need to know as a database specialist. Individual courses on specific database engines and services will dive into how to do it in each particular database type or service. Join me in the next module where we'll start out with designing workload‑specific databases.

What are some characteristics of Elasticache

Boost Database Performance When you need sub‑millisecond response times or need to ease the load off of databases and applications, you can use Amazon Elasticache. Elasticache works as an in‑memory datastore and cache and can scale up to hold large datasets. It sits between your requests and other back‑end sources like your database or application server, allowing faster in‑memory speed for retrieving values. Elasticache is fully managed, so the hardware provisioning, software patching, setup configuration, and monitoring of your clusters is taken care of for you. Elasticache is compatible with both Redis and Memcached. Each engine offers some different features, so depending on your requirements, you may choose one over the other. Redis supports more advanced data types and operations such as lists, sets, and sorting. Redis also allows persistence and can run in multiple availability zones. Memcached is better suited for object caching and is simple to implement and use. Memcached also scales out to hold much larger datasets.

List the solutions AWS has created to migrate you databases to the cloud?

Building and maintaining test environments for your databases is another task that can present some logistical challenges. We'll cover some strategies to automate your deployments, as well as requirements that can put constraints on your test environment design. When migrating from a source to a target database, one basic approach could be make a backup of your source database, restore that backup to your target database, use replication to catch up on what happened on the source in between taking the backup and loading the backup to the target, then switch traffic over from the source to the target database. While that sounds easy enough, there can be significant challenges when trying to execute this in real life. For example, taking a full backup can be taxing on your source database, so you'll need to make sure that normal operations can proceed at an acceptable level of performance. Do you have enough disk space to store a full backup? How long will it take to transfer the backup from the source to the target database? Will your migration require any downtime? If so, can you fit that downtime into a planned maintenance window? AWS provide some tools to help make migrating to the cloud easier. Over 300,000 databases have been migrated using AWS Database Migration Service.

What are some characteristics of Amazon CloudTrail?

CloudTrail records actions taken in your AWS account. This allows you to gain visibility into what is going on across the various services you are using. You can set up alerts when certain actions happen in your account and even configure automated responses when certain events occur. CloudTrail can provide valuable information for several different use cases. First, compliance auditing. You can configure CloudTrail to capture all events in your AWS account and prevent users from changing CloudTrail events or turning off the service, ensuring you have a complete record of actions when it comes time to audit. Next, operational troubleshooting. When something goes wrong, you can work backwards to narrow down what might have caused the problem or if actions in one AWS service had unintentional side effects on another AWS service. Third, security analysis. You can identify user behavior patterns over time, then detect any unusual behavior that may indicate unauthorized infrastructure changes or excessive resource usage. Finally, not only can you detect unusual or unauthorized behavior in your account, you can create automation to respond to those events and apply any necessary remediation. CloudTrail tracks who is performing actions in your account, whether that is a user, a role, or another AWS service. CloudTrail records what they are doing in terms of the actual API call that was made to an AWS service, and when that call was made.

What are some characteristics of Amazon Redshift?

Data Warehousing in the Cloud Data Warehouses hold data that is needed to perform comprehensive analytics for organizations and they operate on a much larger scale than other databases within the organization. When dealing with data warehouses, some common pain points that arise are size, scale, and storage. The sheer volume of data needed for data warehouse activities exceeds the maximum operating capacity of some database engines. A data warehouse needs to have a storage system capable of scaling up to handle very large amounts of data. Performance, operating on extremely large datasets can magnify problems that are sometimes hidden when working with smaller amounts of data. A data warehouse needs to be able to query and analyze the data fast enough to provide value to the organization. For example, if you're trying to make decisions each day based on insights from your data warehouse, but it takes a week for the reports to complete, you'll always be acting on stale information. Cost, this goes back to providing value. If it costs more to run your data warehouse than you can gain in value from using the data warehouse, then why bother creating one in the first place? Ideally, cost should also be predictable and correspond to your actual usage. Security, many organizations are subject to certain audit and compliance requirements that don't go away just because you're using a data warehouse. A security breach on a data warehouse could have a much larger impact than other database compromises due to the volume and variety of data contained in the data warehouse. Finally, management, how many resources are you going to need to dedicate to managing your data warehouse? Amazon redshift data warehouse is designed to address the pain points of data warehousing by leveraging the scale, flexibility, performance, and cost savi

How do you resolve common database issues?

Depending on your database type, monitoring won't always take the form of looking at CPU utilization and disk I/O. For example, managed serverless databases like DynamoDB provide performance in terms of read request units and write request units. You can choose to use an on‑demand mode where Dynamo will automatically scale up and down to match your load or provisioned mode where you specify a certain read and write unit capacity. There are cost and performance differences between the two options, so you'll have to determine which option is best for you based on your application requirements and budget. For services like RDS, Redshift, and ElastiCache, you'll be looking for spikes in resources to determine if a certain resource is being taxed such as CPU, RAM or disk. When using these instance‑based database services, you'll need to be aware of errors such as full disk error if you run out of storage, network errors such as running out of IP addresses in a subnet when trying to launch new instances, database status messages, including those that are nonrecoverable where you'll have to spin up a new database instance to solve the problem, and native error messages that are unique to the database engine you're using. Finally, with all database types, you may need to troubleshoot access problems such as timeouts, missing certificates, and other credential problems that prevent applications or users from accessing the database.

What are characteristics of Virtual Private Cloud (VPC) ?

For database services that are instance‑based, you'll be working with instances inside of a Virtual Private Cloud, or VPC. Security groups belong to a VPC and are assigned to instances as a virtual firewall to control inbound and outbound traffic for that instance. You can use the same security group in different subnets in the same VPC, and the same subnet can also have different security groups. If you're using instance‑based databases, there are several networking components involved to get traffic to your instances, so let's do a brief overview so you understand what is involved. First, you'll have an internet or VPN gateway. Next, a router and a route table, followed by a network access control list. Finally, you'll reach the security group and your instance.

What are some characteristics of network access control lists (NACL)?

For finer‑grained control of instance access, you'll likely use a combination of security groups and network access control lists. Let's compare the two. Security groups operate on the instance level, whereas network access control list is on these subnet level. One provides allow rules, while the other can provide both allow and deny rules. Security groups will evaluate all rules before allowing traffic, whereas NACLs have rules processed in numeric order. A security group is stateful, meaning that return traffic is automatically allowed regardless of any rules, whereas your NACL is stateless and return traffic must be explicitly allowed by rules. A security group applies to an instance only if it is associated with that security group, whereas your NACL automatically applies to all instances in the subnets associated with the NACL, so you can't accidentally forget to assign it. Way to go. You've seen how AWS Key Management Service helps you maintain encryption keys for encrypting your data. You've also seen how CloudTrail, Web Application Firewall, and GuardDuty can help with account auditing and avoiding vulnerabilities. Finally, you saw some additional ways you can manage security including IAM users and roles, security groups, and network controls. Great job completing this course. You now have a better understanding of the different AWS database services and the roles and responsibilities to consider as a database specialist. I hope you found this valuable, and thank you for watching.

What are considerations for database performance based on the service type?

For instance‑based database services, simply increasing your instance type or size adds additional CPU, RAM, network, and disk capacity. In some cases, this strategy of increasing instance size is about as complex as you need to get in order to meet your performance needs. For many managed services, you can choose from on‑demand capacity mode, which will automatically scale up and down to meet demand, or you can select provisioned capacity mode, which helps you optimize for price when you have predictable loads.

Describe characteristics of the database storage model.

For some storage types, you pay for allocated space, while others only charge for space actually used. For each database type and environment, be sure to understand which type of space you are dealing with and provision your disk space accordingly. Disk type and usage is another factor for cost. The disk type you use should be based on your performance requirements. Utilizing slower, less expensive storage for less frequently accessed data can help control costs. Some database services like Timestream automatically manage this for you. Backups are another area of cost that can be easy to overlook, but again, everything you store or allocate is going to have an associated charge. Some services backup to S3, which saves you money over using other disk types to store backups. You should also have a backup lifecycle plan in place to prevent holding onto backups for longer than they are useful to your application.

What are some characteristics of Amazon Neptune?

Graph Database Built for the Cloud To understand graph databases, let's look at a simple example. Suppose you like a particular movie. You may know some other people who like the movie as well, and there may be other people you don't know who also like the movie. A social networking graph could look at multiple data points and relationships in determining what or who you may want to connect with based on your current connections. This pattern of using nodes and edges versus tables and keys allows easier exploration of patterns than having to create nested queries and complex joins in a relational database. Graph databases are a good fit for creating knowledge or identity graphs, fraud detection, recommendation engines, and, of course, social networking to name just a few. Amazon Neptune is a fully managed graph database that enables you to query billions of relationships in milliseconds. Like the other managed services we've seen, you get high availability with multiple read replicas that can scale up and down, instance monitoring and repair, fault tolerant, self‑healing storage, continuous incremental backups with point‑in‑time restore, and encryption. Neptune also supports open graph APIs for both Gremlin and Sparkle so you can use existing tools and code when migrating to Neptune.

What are some characteristics of Amazon DocumentDB?

High Throughput Managed MongoDB Amazon DocumentDB is very much like the other services we've covered in that it's a managed, highly available, low latency, fault tolerant, self‑healing service. You can easily scale the cluster up and down, there are continuous backups, and you can encrypt the data with keys you generate using Amazon Key Management Service. The key feature is that DocumentDB is MongoDB compatible. This allows you to use existing MongoDB tools and application code with a managed cloud service. DocumentDB makes it easy to store, query, and index JSON data. This allows you to persist the data using the same document model format used in application code without having to perform transformations every time you read or write data.

What are some characteristics of Amazon keyspaces?

High-performance Managed Cassandra Amazon Keyspaces is an Apache Cassandra‑compatible database service. Amazon Keyspaces has a lot of the same features as DynamoDB, so let's go through those. Keyspaces is managed serverless service, so you don't have to install, maintain or patch softwares and servers. You can also choose from on‑demand capacity mode, which will automatically scale up and down to meet demand, or you can select provisioned capacity mode, which helps you optimize for price when you have predictable loads. Whether you choose on‑demand or provisioned mode, you'll get consistent, single millisecond response times whether you have small or large datasets or high or low query volume. Storage is also managed, so you have virtually unlimited space, and continuous backups enable point‑in‑time recovery, so if you decide you want to see what your data looked like 10 days ago, you can do that. Data is also encrypted by default. Keyspaces is a good choice if you're looking to run your Cassandra workloads in the cloud.

What are access pattern roles in determining a database service ?

How often will the data be accessed? Frequent real time? Daily batch? Long-term storage? Will multiple users access ? is there a need for immediate versus eventual consistency? Does the solution need to be highly available? Are there defined performance SLAs? Multi-AZ vs Multi-region? What granularity of security do you need? What about access control? What are your security and encryption requirements?

What are some characteristics of Quantum Ledger Database?

Immutable System of Record Historically, ledgers were used to keep track of information that needed a verifiable history of changes. For example, a financial account could just show you the current balance, but what if the balance was different than you expected? You'd want to know what happened to arrive at that current balance or what was the history. While far from foolproof, older ledgers employed a few measures to preserve data integrity. For example, writing in pen instead of pencil made it difficult to remove or change an entry. The books were usually locked up in a safe when not in use. The handwriting style of different individuals could sometimes, but not always, be detected, so if an unauthorized person tried to change the books after hours, it could be easy to detect. Modern systems no longer rely on handwritten books as systems of record, but how can you solve the problem of data integrity in cloud systems? Some systems create audit logs using a traditional database. These can be error prone, hard to scale, and they are not immutable. Given the right conditions, the data could be changed or deleted without detection. Blockchain solves the immutability problem, but adds additional overhead since it operates on a decentralized model best suited to maintain a single record for multiple untrusted parties. Amazon Quantum Ledger Database, or QLDB, is a fully‑managed ledger database owned by a central trusted authority. By using a central trusted authority instead of a decentralized model, QLDB can perform operations much faster than other blockchain networks, making it easier and less resource intensive to scale, in addition to providing predictable completion times for operations. As a managed serverless service, you get automatic scaling, high availability, and continuous backups. QLDB has a document data mode

What are characteristics of a balanced solution?

Implementation Speed, database performance, cost, security, complexity, use multiple database types

What are the tradeoffs when designing an index strategy?

Indexes are another key part of database performance. Again, depending on the database type, you'll have different indexing options, but the general idea of indexes is to identify common operations and make those faster to execute by performing some work ahead of time in building and maintaining an index. For most systems, indexes will also take up more disk space, so that is a trade‑off you may have to consider when creating indexes.

What are some of the cost factors for database design?

Larger instance size equals a higher cost. Overprovisioning instance sizes can significantly add to costs over time. For database services that don't have instances to size, some charge by reads and writes. Others provide options such as on‑demand modes that scale up and down automatically and charge you based on actual use or provisioned modes where you have a fixed amount of capacity that you pay for by the hour. The disk type you use should be based on your performance requirements. Utilizing slower, less expensive storage for less frequently accessed data can help control costs.

What are some characteristics of Amazon Timestream database?

Managed Time Series Database There's a lot of interesting data that happens over time, for example, stock prices, clickstream activity, computer metrics like CPU usage, and data gathered by IoT devices such as temperature or motion. Many organizations will collect time series data from multiple sources in order to monitor operations and gain insights into how services are used. This type of data can have multiple measurements per second, resulting in a very high volume of data. Trying to use a relational database to manage and analyze time series data often results in slow performance and high costs over time. Amazon Timestream is a managed serverless time series database. It can automatically scale to handle trillions of events per day up to 1000 times faster and for as little as 1/10 of the cost of relational databases. Each data point has a timestamp and one or more attributes like price, temperature or percentage utilization. The schema is dynamically created based on the attributes of incoming time series data, so you don't have to worry about migrating database tables if the data you collect adds additional attributes. Encryption is built in. Timestream also manages your data lifecycle, keeping more recent data in memory while shifting older data to less expensive storage. This helps manage the cost of keeping historical data for analytics. The query engine lets you access recent and historical data together using SQL and built‑in time series analytics functions. Timestream is easy to integrate with other services, so you can build dashboards or perform additional analytics by feeding the data into other services.

What administrative options do you have for a managed database service ?

On the other end of the spectrum, you could use a fully managed database service, where you simply use the database and the service adapts to your usage patterns. Scaling servers up and down, performing updates and patches, high availability, monitoring, and even backups are all built into the service and managed for you. While you can still provide certain configuration values for your database, you won't be able to control every aspect of how the database operates. However, this arrangement requires much lower maintenance from you because the service is handling many of the operational and maintenance activities.

What are some characteristics of DynamoDB ?

Operate Your Database at Any Scale Amazon DynamoDB is a key‑value and document database that delivers single‑digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster, durable database with built‑in security, backup and restore, and in‑memory caching for internet‑scale applications. DynamoDB is a NoSQL database service. It is not a relational database. This gives you more flexibility in how you structure your data. You can set up a table with different attributes or columns, add items, rows, then later on add additional attributes to new items without having to redefine the table schema and migrate the data. DynamoDB is a key‑value database, meaning you have a unique key, for example user ID, and some value associated with that key, attributes like name, address, and employer. Different items may have different attributes, but they can all be stored in the same table. DynamoDB is also a document database, meaning you can store HTML, XML or JSON documents as items in tables and then read attributes from those documents. DynamoDB delivers single‑digit millisecond performance at any scale. Some databases start to slow down as table size or query volume increases. Dynamo is designed to maintain consistent performance no matter how large your table becomes or how many queries are issued per second. DynamoDB operates on a massive scale, handling over 10 trillion requests per day, peeking up to 20 million requests per second over petabytes of data. Dynamo is fully managed, meaning the underlying hardware, provisioning, patching, and networking is taken care of for you. If you need to run in multiple AWS regions, that's supported natively. This helps distribute your traffic to geographic locations closer to your users, as well as mitigate the effects of a regional failure. It has mul

What is RPO?

RPO, or Recovery Point Objective, represents how much data you can afford to lose or recreate. In other words, what is the maximum gap of data you are willing to have between a recovery point and a disaster? For example, if the database went down at 9:00 a.m. and you wanted data from up to 8:30 a.m. to be included in the restored database, your RPO would be 30 minutes. In other words, your restored database should only be missing the previous 30 minutes of data before the disaster.

What are characteristics of backup and recovery?

RTO and RPO requirements influence both your backup and restore strategy. In this example, the backup would need to take less than 30 minutes to complete to maintain an RPO of 30 minutes, and the restore would need to take less than an hour to maintain an RTO of 1 hour. Different backup and restore methods may not meet your time requirements, so you may need to look at options like replication and warm or hot standbys instead of simply restoring a new instance from a backup. RTO and RPO will dictate your backup frequency and, to some extent, the location, or locations, where you will store those backups.

What is RTO?

RTO, or Recovery Time Objective, represents how quickly you need to recover from a disaster. For example, if something happened to your database at 9:00 a.m. and you wanted it operational again by 10:00 a.m., then your RTO would be 1 hour.

What are the objectives of database security domain?

Security needs to be built into your design and is too important to simply be an afterthought, thrown together after everything else is working. The security objectives cover several areas, including encrypt data at rest and in transit, evaluate auditing solutions, determine access control and authentication mechanisms, recognize potential security vulnerabilities within database solutions. Different database services have different options for encryption, auditing, and access control, so you'll need to be aware of your options in order to meet any specific security requirements you may have. You can use AWS databases to implement solutions that are compliant with several financial, privacy, and government standards. Cloud does not equal less secure. Another component of security is monitoring and detecting potential threats and vulnerabilities, and AWS has several tools to make that easier for you.

What are the objectives of workload specific database design domain?

Select appropriate database services for specific types of data and workloads, Determine strategies for disaster recovery and high availability. Design database solutions for performance, compliance, and scalability. Compare the costs of database solutions. These objectives get into knowing everything about everything in order to select the appropriate database service or services to use based on the type of data involved and other implementation requirements, such as performance, compliance, scalability, and cost. These objectives will help you know the breadth of services and their different capabilities so you can design the best possible solution.

What is the purpose of single instance databases ?

Single instances reduces management overhead and cost. They are good fits nonproduction/functional environments.

What are characteristics of Amazon RDS which allows for a more in-between approach totally unmanaged and totally managed administrative solution?

Somewhere in the middle are services like RDS, which handles some aspects of the operations and maintenance but also give you several options for controlling database engine settings, high availability configuration, and selecting maintenance windows for updates. You're also responsible for monitoring log file retention, disk space, and monitoring. Wherever you end up on the spectrum of maintenance responsibilities, be sure to become familiar with management options for the AWS database services you use. If you're running in RDS, you'll have specific configuration available to manage your log files. CloudTrail is a service that logs calls to AWS APIs. Some organizations utilize CloudTrail logs to help audit database activity. CloudWatch and CloudWatch Events can be used in a variety of ways for monitoring your database and logging activity. You can also leverage services such as Simple Queue Service, SQS; Simple Notification Service, SNS; and Simple Storage Service, S3, to customize how you manage database log files.

What are some AWS products that can optimize performance in order to apply an optimization strategy?

Speaking of disk, for database types that allow you to choose disk type, there are definite performance implications. Let's look at the different disk types available. Elastic Block Store, or EBS, Elastic File System, or EFS, and Instance Store. We'll talk about the features of each, but again, depending on the database engine you're using and your application, some of these may perform better than others. Elastic Block Store, or EBS, is the most common type of storage for EC2 instances, with some instance types operating on EBS only. With EBS, you can create an SSD‑based volume or an HDD‑based volume. SSD‑based volumes are divided into two categories. First, general purpose. These are a good mix of price and performance for many types of transactional data, and are the default EBS volume type for EC2 instances. Based on volume size, you will be allocated a certain number of IOPS that are burstable beyond the limit for a short period of time, but may be throttled down to your allocated limit. Second, provisioned IOPS. These are best for latency sensitive transactional workloads, where throttling IOPS would be problematic for your app, like a transactional database. To get the highest performance from provisioned IOPS, the volume must be attached to an EBS optimized instance type. HDD‑based volumes are also divided into two categories. First, throughput optimized. This is good for frequently accessed throughput intensive workloads with large data sets and large IO sizes. Second, Cold HDD. This is the lowest cost volume designed for less frequently accessed workloads with large cold data sets. Elastic File System, or EFS, is a managed NFS file system. It's highly available and scalable, and automatically grows and shrinks as you add and remove files. This automatic scaling allows you to pay only for what you actually use

List additional considerations that will impact the service pricing model?

The actual database engine you choose to run also affects costs. Are you running an open source or commercial database engine? What is the license cost? Is that engine providing you unique features to justify the cost? Network transfer can also incur charges, so it's important to understand if that's a factor with your database service and optimize your design to minimize those network transfer costs. Finally, pricing models are constantly evolving, and over the years, AWS has introduced several different ways to lower your operating costs. Some built‑in tools to help you optimize costs include AWS Budgets, AWS Cost & Usage Report, and AWS Cost Explorer. Regular use of these tools helps you stay informed of the best pricing options for your overall usage and can result in significant savings to your monthly AWS bill.

How is CloudTrail data stored ?

The resulting who, what, and when information is packaged into a JSON file and stored in an S3 bucket. CloudTrail writes the logs to an S3 bucket in batches, so there can be a delay of several minutes between the time an event occurs and the time you con view it. You can view CloudTrail events from the CloudTrail dashboard in the AWS console. You can also download the event logs as a ZIP file from the S3 bucket. Amazon Athena allows you to query the S3 bucket directly with a SQL‑like syntax to analyze events. Amazon CloudWatch integrates with CloudTrail so you can set up CloudWatch logs to monitor CloudTrail events. You can also create rules that look for certain CloudTrail events, then trigger a CloudWatch event when a CloudTrail event matches a rule. The CloudWatch event can then invoke several different targets, for example, an SNS topic, which could deliver an email notification about the event, or a lambda function, which could take some corrective action. Depending on the database services you use, CloudTrail can be a good addition to your monitoring and troubleshooting strategy.

What is the benefit of multi-Availability Zones (AZ)?

There is not a single point of failure. Clustering multiple machines to serve a single database mitigates having single points of failure. AZ's provided both clustering by having cluster support the database instance but also having them physically separated.

What are the objectives of deployment and migration domain?

This domain covers the following objectives: automate database solution deployments, determine data preparation and migration strategies, execute and validate data migration. Automation is an important way to reduce deployment time, increase repeatability, and eliminate mistakes as you deploy database solutions. Using tools like CloudFormation can help you develop robust automation with AWS database deployments. As systems first move to the cloud or evolve within the cloud, you'll find that you need to perform a fair amount of data preparation, migration, and validation, so understanding best practices and available tools for those tasks is an important part of being a database specialist.

What are the objectives of monitoring and troubleshooting domain?

This domain has the following objectives: determine monitoring and alerting strategies, troubleshoot and resolve common database issues, optimize database performance. Monitoring is a key part of your database solution. It's what allows you to measure performance, spot slowdowns and anomalies, and alert the appropriate resources when things go wrong. The objectives for this domain will help you be aware of common issues for each database service, how to solve those issues, and different options for monitoring, alerting, and optimization.

What are the objectives of management and operations domain?

This domain is all about keeping things up and running, as well as having the appropriate backup and restore strategies in place when failures occur. Objectives for this domain are: determine maintenance tasks and processes, determine backup and restore strategies, manage the operational environment of a database solution.

So why not just always implement an active‑active multi‑region database solution?

Well, first, it may not be supported for a given database service. Second, you may not have time to implement such a solution. Third, you may not have enough budget to pay for that solution. And fourth, it may be overly complex for the database deployment.

What is the best approach to designing a database solution?

When you begin your database design by defining a realistic RTO and RPO for your database deployment, it's easier to balance your availability requirements with the constraints of time, cost, and complexity. For example, a test database may not really have an RPO or RTO because all of the data is generated. And if you need a new one, you can just make a new one. You don't need to preserve any transactions that are created during the lifetime of the database, so there is very little value in spending the resources to replicate this database across regions. Other mission‑critical systems, on the other hand, where every second the system is down could cost the organization thousands of dollars, would need to have a very low RTO and RPO, usually measured in seconds. In these cases, it does make sense to invest in the infrastructure that can support a near‑instantaneous failover in the event of a disaster. If disaster recovery and failover speed are critical elements of your database deployment, be sure to select a database service that can support those RPO and RTO requirements. Some managed services, like Amazon Aurora Global Database, can support an RPO of 1 second and an RTO of less than 1 minute. Other services may not be able to achieve those recovery times, so be sure to understand the capabilities of the database service when making your decision.

What administrative options do you have for an unmanaged database service ?

Within AWS, you have a range of options for how this is managed. For example, you can provision EC2 machines and create your own machine image for your database engine, handle clustering and routing using load balancers, implement your monitoring system of choice, and manage updates and patches. This is similar to how you would manage assets in a data center. The biggest difference is that you would be managing cloud resources using AWS APIs instead of physical resources in a data center. This arrangement provides you the greatest control over database configuration and how and when updates are performed on your cluster, since most maintenance and operational tasks are your responsibility.

What is Identify and Access Management (IAM)?

dentity and Access Management, or IAM, is the core security construct for all AWS services. The core components are policies, users, groups, and roles. At a basic level, policies define what actions can or cannot be taken. You can then assign policies to users, groups, and roles to manage account permissions. When you create an IAM user, you can decide what credentials you want to allow for that user. You can allow a username and password for console access, as well as AWS access keys for programmatic access. You can also configure an MFA device for the user or require them to set one up when they log in. As an IAM user administrator, you can disable or reset passwords, as well as generate, deactivate, or delete AWS access keys. You may also use AWS Security Token Service, or STS, to request temporary limited privilege credentials for IAM users. These tools can help you manage user access as your organization changes or if a user account is compromised. An IAM role is similar to an IAM user in that it is an AWS identity with a name and policies to determine what actions it can and cannot do. However, roles do not have permanent credentials, and they cannot make direct requests to AWS services. IAM roles are meant to delegate access to a trusted entity without having to share long‑term access keys with that entity. When a trusted entity assumes a role, that entity gains a temporary ability to perform actions authorized by the role.

What tools does AWS provide for heterogeneous migration?

heterogeneous migrations between different database platforms such as Oracle to Amazon Aurora. For heterogeneous migrations, you can first make use of the AWS Schema Conversion Tool, which will determine the complexity of the conversion and perform much of the conversion automatically, including psSQL and T‑SQL code. If there are any code fragments that cannot be automatically converted to the target language, it'll flag those for review. For homogeneous migrations, you can also make use of native schema export tools available for your database engine. You can use a VPN or set up a Direct Connect connection between your data center and AWS to securely transfer your data.

What is the Snow Family of products used for?

securely migrate data into or out of the AWS cloud. AWS Snowmobile is a 45‑foot long ruggedized shipping container that can transfer up to 100 petabytes of data. It's delivered to your data center where you can then load your data, then it's driven back to AWS where your data is imported into S3. Snowmobile uses multiple layers of security and encryption for secure transfer and full chain of custody of your data. If you don't have quite that much data to transfer, but still need a physical device, you can use AWS Snowball. Snowball is about the size of a desktop computer case and can transfer up to 80 terabytes of data. You can also chain multiple snowball devices together for larger transfers. Simply load it with data and contact the designated shipping carrier in your country who will pick it up and take it to AWS. Once your data is transferred to AWS, if you need to perform any extract, transform, and load or ETL activities, you can use AWS Glue. Glue is a serverless data preparation service. You can use it to catalog multiple AWS datasets without moving the data. This allows you to search and query the data using multiple AWS database services. It can also run ETL jobs as new data arrives and provides both visual and code‑based interfaces to make data preparation easy.


Ensembles d'études connexes

Ch. 15 Standard Library Containers

View Set

Final Review Questions for NR222 Health & Wellness

View Set

Major Themes of the Renaissance Period

View Set