ISMV4 Module 10, 11, & 12

Ace your homework & exams now with Quizwiz!

Local Replication: VM Snapshot

A VM snapshot preserves the state and data of a VM at a specific PIT - The state includes the VM's power state, for example: powered-on, powered-off, or suspended The data includes all the files that make up the VM - This includes disks, memory, and other devices, such as virtual network interface cards - This VM snapshot is useful for quick restore of a VM

Causes of Information Unavailability

Application failure (for example: due to catastrophic exceptions caused by bad logic) Data loss Infrastructure component failure (for example: due to power failure or disaster) Data center or site down For example: due to power failure or disaster Refreshing IT infrastructure

Graceful Degradation

Application maintains limited functionality even when some of the modules or supporting services are not available Unavailability of certain application components or modules should not bring down the entire application

Persistent Application State Model

Application state information is stored out of the memory Stored in a data repository If an instance fails, the state information is still available in the repository

Resilient Application Overview

Applications have to be designed to deal with IT resource's failure to guarantee the required availability Fault resilient applications have logic to detect and handle transient fault conditions to avoid application downtime Examples of key application design strategies for improving availability: Graceful degradation of application functionality Retry logic in application code Persistent application state model

Dynamic Disk Sparing

Automatically replaces a failed drive with a spare drive to protect against data loss Multiple spare drives can be configured to improve availability

Business Continuity Traits

BC process enables continuous availability of information and services in the event of failure to meet the required SLA BC involves various proactive and reactive countermeasures It is important to automate BC process to reduce the manual intervention Goal of BC solution is to ensure information availability

Key backup components are:

Backup client Backup server Storage node Backup device (backup target)

Local Replication: VM Snapshot Example

Child virtual disks store all the changes that are made to the parent VM after snapshots are created When committing snapshot 3, the data on child virtual disk file 1 and 2 are committed prior to committing data on child virtual disk 3 to the parent virtual disk file After committing the data, the child virtual disk 1, 2, and 3 are deleted However, while rolling back to the snapshot 1, child disk file 1 is retained and the snapshots 2 and 3 are discarded

VM clone

Clone is a copy of an existing virtual machine (parent VM) o The clone VM's MAC address is different from the parent VM Typically clones are deployed when many identical VMs are required o Reduces the time that is required to deploy a new VM

Local Replication: Clone

Cloning provides the ability to create fully populated point-in-time copies of LUNs within a storage system or create a copy of an existing VM Clone of a storage volume Initial synchronization is performed between the source LUN and the replica (clone) Changes made to both the source and the replica can be tracked at some predefined granularity

BC - Analyze

Collect information on data profiles, business processes, infrastructure support, dependencies, and frequency of using business infrastructure Conduct a business impact analysis Identify critical business processes and assign recovery priorities Perform risk analysis for critical functions and create mitigation strategies Perform cost benefit analysis for available solutions based on the mitigation strategy Evaluate options

Link Aggregation

Combines links between two switches and also between a switch and a node Enables network traffic failover in the event of a link failure in the aggregation

Journal Volume

Contains all the data that has changed from the time the replication session started to the production volume

Remote Replication: Multisite

Data from source site is replicated to multiple remote sites for DR purpose Disaster recovery protection is always available if any one-site failure occurs Mitigates the risk in two-site replication No DR protection after source or remote site failure

BC - Design & Develop

Define the team structure and assign individual roles and responsibilities; for example, different teams are formed for activities such as emergency response and infrastructure and application recovery Design data protection strategies and develop infrastructure Develop contingency solution and emergency response procedures Detail recovery and restart procedures

BC - Establish Objectives

Determine BC requirements Estimate the scope and budget to achieve requirements Select a BC team that includes subject matter experts from all areas of business, whether internal or external Create BC policies

Virtual Tape Library

Disks are emulated and presented as tapes to backup software. Does not require any additional modules or changes in the legacy backup software Provides better performance and reliability over physical tape Does not require the usual maintenance tasks that are associated with a physical tape drive, such as periodic cleaning and drive calibration

Multipathing

Enables a compute system to use multiple paths for transferring data to a LUN Enables failover by redirecting I/O from a failed path to another active path Performs load balancing by distributing I/O across active paths

Elastic Load Balancing

Enables dynamic distribution of application and client I/O traffic Dynamically scales resources (VM instances) to meet traffic demands Provides fault tolerance capability by detecting the unhealthy VM instances and automatically redirects the I/Os to other healthy VM instances

Disk Library

Enhanced backup and recovery performance No inherent offsite capability Disk-based backup appliance includes features such as deduplication, compression, encryption, and replication to support business objectives

Dell EMC PowerPath

Host-based multipathing software Provides path failover and load-balancing functionality Automatic detection and recovery from host-to-array path failures PowerPath/VE software enables optimizing virtual environments with PowerPath multipathing features

BC - Implement

Implement risk management and mitigation procedures that include backup, replication, and management of resources Prepare the DR sites that can be utilized if a disaster affects the primary data center. The DR site could be one of the organization's own data center or could be a cloud Implement redundancy for every resource in a data center to avoid single points of failure

CDP Appliance

Intelligent hardware platform that runs the CDP software Manages both the local and the remote replications Appliance could also be virtual, where CDP software is running inside VMs

Write Splitter

Intercept writes to the production volume from the compute system and splits each write into two copies Can be implemented at the compute, fabric, or storage system

Hypervisor-based CDP

Protects a single or multiple VMs locally or remotely Enables to restore VM to any PIT Virtual appliance is running on a hypervisor Write splitter is embedded in the hypervisor

VMware FT

Provides continuous availability for application in the event of server failure Creates a live shadow instance of a VM that is in virtual lockstep with the primary instance FT eliminates even the smallest chance of data loss or disruption

Fault Detection and Retry Logic

Refers to a mechanism that implements a logic in the code of an application to improve the availability To detect and retry the service that is temporarily down; may result in successful restore of service

Tape Library

Tapes are portable and can be used for long term offsite storage. Must be stored in locations with a controlled environment Not optimized to recognize duplicate content Data integrity and recoverability are major issues with tape-based backup media.

Train, Test, Assess, and Maintain

Train the employees who are responsible for backup and replication of business-critical data on a regular basis or whenever there is a modification in the BC plan Train employees on emergency response procedures when disasters are declared Train the recovery team on recovery procedures based on contingency scenarios Perform damage-assessment processes and review recovery plans Test the BC plan regularly to evaluate its performance and identify its limitations Assess the performance reports and identify limitations Update the BC plans and recovery/restart procedures to reflect regular changes within the data center

Compute Clustering

Two or more compute systems/hypervisors are clustered to provide high availability and load balancing Service running on a failed compute system moves to another compute system Two common clustering implementations are: Active/active Active/passive

Storage Virtualization

Virtual volume is created using virtualization appliance Each I/O to the volume is mirrored to the LUNs on the storage systems Virtual volume is continuously available to compute system Even if one of the storage systems is unavailable due to failure

Remote Replication: Synchronous

Write is committed to both the source and the remote replica before it is acknowledged to the compute system Enables to restart business operations at a remote site with zero data loss; Provides near zero RPO

An availability zone is

A location with its own set of resources and isolated from other zones.A zone can be an entire data center or a part of the data center Enables running multiple service instances within and across zones to survive data center or site failure If there is an outage, the service should seamlessly failover across the zones

Definition: Disaster Recovery (DR)

A part of BC process, which involves a set of policies and procedures for restoring IT infrastructure, including data that is required to support ongoing IT services, after a natural or human-induced disaster occurs.

Definition: Data Replication

A process of creating an exact copy (replica) of the data to ensure business continuity in the event of a local outage or disaster. Replicas are used to restore and restart operations if data loss occurs Data can be replicated to one or more locations based on the business requirements

Network Fault Tolerance Mechanisms

A short-time network interruption could impact plenty of services running in a data center environment. So, the network infrastructure must be fully redundant and highly available with no single points of failure.

Definition: Recovery-in-place

A term that refers to running a VM directly from the backup device, using a backed up copy of the VM image instead of restoring that image file. Eliminates the need to transfer the image from the backup device to the primary storage before it is restarted Provides an almost instant recovery of a failed VM Requires a random access device to work efficiently Disk-based backup target Reduces the RTO and network bandwidth to restore VM files

Remote Replication: Asynchronous

A write is committed to the source and immediately acknowledged to the compute system: Data is buffered at the source and sent to the remote site periodically Applications write response time is not dependent on the latency of the link Replica is behind the source by a finite amount (finite RPO)

Definition: Fault Tolerance

Ability of an IT system to continue functioning in the event of a failure.

Optimize Load Balancing:

Adjust I/O paths to dynamically rebalance your application environment for peak performance

Impact of Information Unavailability

An IT service outage, due to information unavailability, results in loss of productivity, loss of revenue, poor financial performance, and damages to reputation.

Definition: Backup

An additional copy of production data, which is created and retained for the sole purpose of recovering lost or corrupted data.

Definition: NDMP

An open standard TCP/IP-based protocol that is designed for a backup in a NAS environment. Data can be backed up using NDMP regardless of the operating system or platform Backup data is sent directly from NAS to the backup device No longer necessary to transport data through application servers Backs up and restores data while preserving security attributes of file system (NFS and CIFS) and maintains data integrity

Data Migration

Another use for a replica is data migration. Data migrations are performed for various reasons such as migrating from a smaller capacity LUN to one of a larger capacity.

Recovery Operation

BBB (only need to know the first 3 B's) DSB (1) Backup client requests backup server for data restore (2) Backup server scans backup catalog to identify data to be restored and the client that will receive data (3) Backup server instructs storage node to load backup media in the backup device (4) Data is then read and sent to the backup client (5) Storage node sends restore metadata to the backup server (6) Backup server updates the backup catalog

BC vs Disaster Recovery

BC is before (ensuring uptime) Disaster Recovery (steps after to recover)

BC Planning Lifecycle

BC planning must follow a disciplined approach like any other planning process. Organizations today dedicate specialized resources to develop and maintain BC plans. From the conceptualization to the realization of the BC plan, a lifecycle of activities can be defined for the BC process. The BC planning lifecycle includes five stages:

Replicas are created for various purposes which include the following:

Can act as a source for backup Can be used to restart business operations or to recover the data Used for running decision support activities Used for testing applications Data migration

heartbeat

Clustering uses a heartbeat mechanism to determine the health of each node in the cluster. The exchange of heartbeat signals, usually happens over a private network enables participating cluster members to monitor one another's status.

Continuous Near-zero RPO

Consistency Ensures the usability of a replica Replica must be consistent with the source

Cumulative Backup:

Cumulative Backup: It copies the data that has changed since the last full backup.

Storage Fault Tolerance Mechanisms

Data centers comprise storage systems with a large number of disk drives, and solid state drives. This storage systems support various applications and services running in the environment.

Automate Failover/Recovery

Define failover and recovery rules that route application requests to alternative resources in the event of component failures or user errors

Backup Granularity

Different granularity levels are: Full backup Incremental backup Cumulative backup

Cloud-Based Backup: Backup as a Service

Enables consumers to procure backup services on demand through a self-service portal Backup and Recovery Lesson Information Storage and Management (ISM) v4 Provides the capability to perform backup and recovery at any time, from anywhere Reduces the backup management overhead Transforms from CAPEX to OPEX Pay-per-use/subscription-based pricing Enables organizations to meet long-term retention requirements Backing up to cloud ensures regular and automated backup of data Gives consumers the flexibility to select a backup technology based on their current requirements

Restartability

Enables restarting business operations using the replicas.

Recoverability

Enables restoration of data from the replicas to the source if data loss occurs.

Erasure Coding

Erasure Coding: Provides space-optimal data redundancy to protect data loss against multiple drive failure

Fault Isolation

Fault isolation limits the scope of a fault into local area so that the other areas of a system are not impacted by the fault. It does not prevent failure of a component but ensures that the failure does not impact the overall system.

Fast Recovery and Restart

For critical applications, replicas can be taken at short, regular intervals. This enables fast recovery from data loss. If a complete failure of the source LUN occurs, the replication solution enables to restart the production operation on the replica. This approach reduces the RTO.

NIC Teaming

Groups NICs so that they appear as a single, logical NIC to the operation system or hypervisor Provides network traffic failover in the event of a NIC/link failure Distributes network traffic across NICs

Importance of Business Continuity

HAD High-risk Data Application Dependency Data Protection Laws

IA = Calculate

IA = Uptime / (Uptime + Downtime)

Image-Based Backup

Image-based backup makes a copy of the virtual drive and configuration that are associated with a particular VM. Backup is saved as a single entity called a VM image Enables quick restoration of a VM Supports recovery at VM-level and file-level No agent is required inside the VM to perform backup Backup processing is offloaded from VMs to a proxy server

Agent-Based Backup

In this approach, an agent or client is installed on a virtual machine or a physical compute system. The agent streams the backup data to the backup device as shown in the illustration.

Measurement of Information Availability

Information availability relies on the availability of both physical and virtual components of a data center.

Incremental Backup:

It copies the data that has changed since the last backup.

MTBF: How do you calculate?

MTBF = Total uptime / Number of failures

MTTR: Calculate MTTR

MTTR = Total downtime / Number of failures

Compute Cluster Example

Multiple hypervisors running on different systems are clustered. Provides continuous availability of services running on VMs

Continuous Data Protection (CDP)

Network-based replication solution Provides the ability to restore data and VMs to any previous PIT Supports heterogeneous compute and storage platforms Supports both local and remote replication Data can also be replicated to more than two sites (multisite) Supports WAN optimization techniques to reduce bandwidth requirements

Backup Operation

ON CERT: B B B B B S S B - Drag and Drop (1) Backup server initiates scheduled backup process. (2) Backup server retrieves backup-related information from the backup catalog. (3a) Backup server instructs storage node to load backup media in the backup device. (3b) Backup server instructs backup clients to send data to be backed up to the storage node. (4) Backup clients send data to storage node and update the backup catalog on the backup server. (5) Storage node sends data to the backup device (6) Storage node sends metadata and media information to the backup server (7) Backup server updates the backup catalog

Standardize Path Management:

Optimize I/O paths in physical and virtual environments (PowerPath/VE) and cloud deployments

Implementing Redundancy at Component-Level

Organizations should follow stringent guidelines to implement fault tolerance in their data centers for uninterrupted services. The underlying IT infrastructure components (compute, storage, and network) should be highly available and the single points of failure at the component level should be avoided.

Recovery Point Objectives (RPO)

Point-in-time to which data must be recovered. (How much data loss)

Definition: Business Continuity (BC)

Process that prepares for, responds to, and recovers from a system outage that can adversely affect business operations.

VMware HA

Provides high availability for applications running in virtual machines If there is a fault in a physical compute system, then the affected VMs are automatically restarted on other compute systems

Point-in-Time (PIT) Nonzero RPO

Recoverability/Restartability Replica could restore data to the source device Restart business operation from replica

Local Replication: Storage System-Based Snapshot - RoW

Redirects new writes that are destined for the source LUN to a reserved LUN in the storage pool Replica (snapshot) still points to the source LUN All reads from replica are served from the source LUN

Definition: Single Point of Failure

Refers to any individual component or aspect of an infrastructure whose failure can make the entire system or service unavailable.

Remote Replication

Refers to replicating data to remote locations (locations can be geographically dispersed) Data can be synchronously or asynchronously replicated Helps to mitigate the risks associated with regional outages Enables organizations to replicate the data to cloud for DR purpose

Local Replication

Refers to replicating data within the same location. Within a data center in compute-based replication. Within a storage system in storage system-based replication. Typically used for operational restore of data if there is a data loss.

Information Availability can be defined in terms of:

Reliability Timeliness

Consistency

Replica must be consistent with the source so that it is usable for both recovery and restart operations.

Testing Platform

Replicas are also used for testing new applications or upgrades.

Decision-Support Activities

Running reports using the data on the replicas greatly reduces the I/O burden on the production device.

Definition: Information Availability (IA)

The ability of an IT infrastructure to function according to business requirements and customer expectations, during its specified time of operation.

PIT replica

The data on the replica is an identical image of the production at some specific timestamp.

Continuous replica

The data on the replica is in-sync with the production data always. The objective with any continuous replication is to reduce the RPO to zero or near-zero.

Primary Storage-Based Backup

This backup approach backs up data directly from primary storage system to backup target without requiring additional backup software. This backup approach backs up data directly from primary storage system to backup target without requiring additional backup software. Eliminates the backup impact on application servers Improves the backup and recovery performance to meet SLAs

Recovery Time Objectives (RTO)

Time within which systems and applications must be recovered. (How fast is recovery)

MTTR =

Total downtime / Number of failures

MTBF =

Total uptime / Number of failures

Fault tolerance protects an IT system or a service against the following types of unavailability:

Transient unavailability: It occurs once for short time and then disappears. For example, an online transaction times out but works fine when a user retries the operation. Intermittent unavailability: It is a recurring unavailability that is characterized by an outage and then availability again and then another outage, and so on. Permanent unavailability: It exists until the faulty component is repaired or replaced. Examples of permanent unavailability are network link outage, application issues, and manufacturing defects.

Alternative Source for Backup

Under normal backup operations, data is read from the production LUNs and written to the backup device. This places an extra burden on the production infrastructure because production LUNs are simultaneously involved in production operations and servicing data for backup operations

Information Availability (I/A)

Uptime / (Uptime + Downtime)

Eliminating Single Points of Failure

avoided by implementing fault tolerance mechanisms such as redundancy Implement redundancy at component level Compute Network Storage Implement multiple availability zones Avoid single points of failure at data center (site) level It is important to have high availability mechanisms that enable automated application/service failover

In active/active clustering

the nodes in a cluster are all active participants and run the same service of their clients. The active/active cluster balances requests for service among the nodes. If one of the nodes fails, the surviving nodes take the load of the failed one. This method enhances both the performance and the availability of a service.

In active/passive clustering,

the service runs on one or more nodes and the passive node waits for a failover. If the active node fails, the service that had been running on the active node is failed over to the passive node. Active/passive clustering does not provide performance improvement like active/active clustering.

ISMV4 Module 10, 11, & 12

Related study sets

Mental Health

State Government New York

Chapter 15

Chapter 3

Sports & Social Media Exam #2 Practice Questions

Chapter 4 BMI410

Patent Bar good set 5

Chapter 3 law

Fundamentals of Nursing - Taylor - Ch 1

Mental Health - Exam 2

History Exam 1 (Part 3)

Contracts Law 1

EPID 309 Midterm

Macroeconomic HW(Ch. 5,6,7)

HWorld History: Chapter 13 Test

math vocab chap 5

Ag Law FInal

Econ 2035 Ch 3 Jha LSU

Ch. 5

Chapter 9