Domain 7: Disaster Recovery
The organization can take the following steps to better ensure the continuity of its outsourcing
- make the ability of such companies to reliably assure continuity of products and services part of any work proposals - make sure that business continuity planning is included in contracts with such companies, and that their responsibilities and levels of service are clearly spelled out - draw up realistic and reasonable service levels that the outsourced firm will meet during an incident - if possible, have the outsourcing companies take part of the BCP awareness programs, training, and testing
BCP should include backup solutions for the following
- network and computer equipment - voice and data communications resources - human resources - transportation of equipment and personnel - environment issues (HVAC) - data and personnel security issues - supplies (paper, forms, cabling, and so on) - documentation
service bureau
A company that has additional space and capacity to provide applications and services such as call centers. A company pays a monthly subscription fee to a service bureau for this space and service
archive bit
A file attribute that can be checked (or set to "on") or unchecked (or set to "off") to indicate whether the file needs to be archived. An operating system checks a file's archive bit when it is created or changed.
hot site
A separate and fully equipped facility where the company can move immediately after a disaster and resume business The most expensive
cold site
A separate facility that does not have any computer equipment, but is a place where employees can move after a disaster least expensive option but takes the most time and effort to get up and functioning after a disaster
warm site
A separate facility with computer equipment that requires installation and configuration
Before committing to a specific vendor for software backups, consider the following
Can the media be accessed in the necessary timeframe? IIs the facility closed on weekends and holidays, and does it only operate during specific hours of the day? Are access control mechanisms tied to an alarm and/or the police station? Does the facility have the capability to protect the media from a variety of threats? What is the availability of a bonded transport service? Are there any geographical environmental hazards such as floods, earthquakes, tornadoes, and so on that might affect the facility? Does the facility have a fire detection and suppression system? Does the facility provide temperature and humidity monitoring and control? What type of physical, administrative, and logical access controls are used?
High availability - redundancy
Commonly built into the network at a routing protocol level. Routing protocols are configured so if one link goes down or gets congested, then traffic is routed over a different network link. Redundant hardware can also be available so if a primary device goes down, the backup component can be swapped out and activated
reciprocal agreements
Company A agrees to allow company B to use its facilities if company B is hit by a disaster, and vice versa cheaper than offsite choices, but may not be the most practical
executive succession planning
Determining an organization's line of succession for senior leadership
Where are MTD, RTO, and RPO values derived?
During the business impact analysis (BIA). The purpose of which is to be able to apply criticality values to specific business functions, resources, and data types
Important issues when deciding to participate in a reciprocal agreement with another company
How long will the facility be available to the company in need? How much assistance will the staff supply in integrating the two environments and ongoing support? How quickly can the company in need move into the facility? What are the issues pertaining to interoperability? How many of the resources will be available to the company in need? How will differences in conflicts be addressed? How does change control and configuration management take place? How often can drills and testing take place? How can critical assets of both companies be properly protected?
electronic vaulting
Makes copies of files as they are modified and periodically transmits them to an off-site backup site carried out in batches and not in real time
mutual aid agreement
More than two organizations agree to help one in case of an emergency
Who is responsible for defining which gets backed up and how often?
Operations team
Difference between preventive measures and recovery strategies
Preventive mechanisms are put into place not only to try to reduce the possibility that the company will experience a disaster, but also, if a disaster does hit, to lessen the amount of damage that will take place. E.g. The company cannot stop a car from plowing into and taking out a transformer that it relies on for power, but it can have a separate power feed from a different transformer in case that happens Recovery strategies are processes on how to rescue the company after a disaster takes place. These processes will integrate mechanisms such as establishing alternate sites for facilities, implementing emergency response procedures, and possibly activating the preventive mechanisms that have already been implemented
RPO and RTO
RPO is about before the incident RTO is after the incident
Difference between RTO (Recovery Time Objective) and MTD (maximum tolerable downtime)
RTO is smaller than MTD because MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization's reputation or bottom line A company can be out for a certain period of time (RTO) and still get back on its feet If the company cannot get production up and running within the MTD window, the company is sinking too fast to properly recover
software escrow
Storing of the source code of software with a third-party escrow agent. The software source code is released to the licensee if the licensor (software vendor) files for bankruptcy or fails to maintain and update the software product as promised in the software license agreement.
contingency company
Supplies services and materials temporarily to an organization that is experiencing an emergency
work recovery time (WRT)
The difference between RTO and MTD, which is the remaining time that is left over after the RTO before reaching the maximum tolerable.
nondisaster
a disruption in service that has significant, but limited impact on the conduct of business processes at a facility solution could be hardware, software, or file restoration
difference between hot site and redundant site
a hot site is a subscription service a redundant site is a site owned and maintained by the company
catastrophe
a major disruption that destroys the facility altogether requires both a short-term solution (offsite facility), and a long term solution (rebuilding)
remote journaling
a method of transmitting data offsite, but this usually only includes moving the journal or transaction logs to the offsite facility, not the actual files these logs contain the deltas that have taken place to the individual files
full backup
all data is backed up and saved to some type of storage media the archive bit is cleared
disaster
an event that causes the entire facility to be unusable for a day or longer usually requires the use of an alternate processing facility and restoration of software and data from offsite copies
incremental process
backs up all files that have changed since the last full or incremental backup sets archive bit to 0
differential process
backs up the files that have been modified since the last full backup the archive bit value does not change
shadow sets
data can be stored as images on two or more disks
disk mirroring
each disk would have a corresponding mirrored disk that contains the exact same information
disk shadowing
ensures the availability of data and to provide a fault-tolerant solution by duplication hardware and maintaining more than one copy of the information data is dynamically created and maintained on two or more identical disks
disadvantages to disk shadowing
expensive solution because two or more hard drives are used to hold the exact same data
advantages of a redundant site
full availability ready to go under the org's complete control
High availability - failover
if there is a failure that cannot be handled through normal means, then processing is "Switched over" to a working system
warm and cold site advantages
less expensive available for longer time frames because of the reduced costs practical for proprietary hardware or software use
disadvantages of a redundant site
one of the most expensive backup facility options
redundant sites
one site is equipped and configured exactly like the primary site, which serves as a redundant environment
Where should two copies of the company's operating system software and critical applications be stored?
onsite offsite location they should also be tested periodically and re-created when new versions are rolled out
warm and cold site disadvantages
operational testing not usually available resources for operations not immediately available
hot site advantages
ready within hours of operation highly available usually used for short-term solutions, but available for longer stays annual testing available
advantages to disk shadowing
reduce or replace the need for periodic offline manual backup operations can boost read operation performance can carry out multiple read requests in parallel because multiple paths are provided to duplicate data
High Availability (HA)
refers to measures that can be implemented to prevent the entire system from failing if some components of the system fail can be a database, a network, an application, a power supply, etc
fault tolerance
the ability for a system to respond to unexpected failures or system crashes as the backup system immediately and automatically takes over with no loss of service
Recovery Point Objective (RPO)
the acceptable amount of data loss measured in time represents the earliest point in time at which data must be recovered The higher the value of data, the more funds or other resources that can be put into place to ensure a smaller amount of data is lost in the vent of a disaster
tertiary sites
the backup to the backup facility
tape vaulting
the data is sent over a serial line to a backup tape system at the offsite facility the company that maintains the offsite facility maintains the systems and changes out tapes when necessary data can be quickly backed up and retried when necessary
Recovery Time Objective (RTO)
the maximum tolerable time to restore an organization's information system following a disaster, representing the length of time that the organization is willing to attempt to function without its information system "how much time do we have to get everything up and working again"
asynchronous replication
the primary and secondary data volumes are out of sync synchronization may take place in seconds, hours, or days depending on the technology in place
synchronous replication
the primary and secondary repositories are always in sync, which provides true real-time duplication
reliability
the probability that a system performs the necessary function for specified period under defined conditions high reliability allows for high availability
resiliency
the system continues to function, albeit in a degraded fashion, when a fault is encountered
hot site disadvantages
very expensive limited on hardware and software choices
High availability - clustered
when there is an overarching piece of software monitoring each server and carrying out load balancing