security+ lesson 17
incident response, forensics, and retention policy
The incident response process emphasizes containment, eradication, and recovery. These aims are not entirely compatible with forensics. Digital forensics describes techniques to collect and preserve evidence that demonstrate that there has been no tampering or manipulation. Forensics procedures are detailed and time-consuming, where the aims of incident response are usually urgent. If an investigation must use forensic collection methods so that evidence is retained, this must be specified early in the response process. Retention policy is also important for retrospective incident handling, or threat hunting. A retention policy for historic logs and data captures sets the period over which these are retained. You might discover indicators of a breach months or years after the event. Without a retention policy to keep logs and other digital evidence, it will not be possible to make any further investigation.
metadata: web
When a client requests a resource from a web server, the server returns the resource plus headers setting or describing its properties. Also, the client can include headers in its request. One key use of headers is to transmit authorization information, in the form of cookies. Headers describing the type of data returned (text or binary, for instance) can also be of interest. The contents of headers can be inspected using the standard tools built into web browsers. Header information may also be logged by a web server.
incident response versus disaster recovery and business continuity
You should distinguish specific incident response planning from other types of planning for disaster recovery and business continuity: Disaster recovery plan—a disaster can be seen as a special class of incident where the organization's primary business function is disrupted. Disaster recovery requires considerable resources, such as shifting processing to a secondary site. Disaster recovery will involve a wider range of stakeholders than less serious incidents. Business continuity plan (BCP)—this identifies how business processes should deal with both minor and disaster-level disruption. During an incident, a system may need to be isolated. Continuity planning ensures that there is processing redundancy supporting the workflow, so that when a server is taken offline for security remediation, processing can failover to a separate system. If systems do not have this sort of planned resilience, incident response will be much more disruptive. Continuity of Operation Planning (COOP)—this terminology is used for government facilities, but is functionally similar to business continuity planning. In some definitions, COOP refers specifically to backup methods of performing mission functions without IT support.
network data sources: sflow
sFlow, developed by HP and subsequently adopted as a web standard (tools.ietf.org/html/rfc3176), uses sampling to measure traffic statistics at any layer of the OSI model for a wider range of protocol types than the IP-based Netflow. sFlow can also capture the entire packet header for samples.
quarantine
If mitigating techniques are not successful, or the results are uncertain, the endpoint will require careful management before being integrated back onto the network. If further evidence needs to be gathered, the best approach may be to quarantine or sandbox the endpoint or suspect process/file. This allows for analysis of the attack or tool and collection of evidence using digital forensic techniques.
network data sources: protocol analyzer output
A SIEM will store details from sensors at different points on the network. Information captured from network packets can be aggregated and summarized to show overall protocol usage and endpoint activity. The contents of packets can also be recorded for analysis. Recording the full data of every packet—referred to as retrospective network analysis (RNA)—is too costly for most organizations. Typically, packet contents are only retained when indicators from the traffic are correlated as an event. The SIEM software will provide the ability to pivot from the event or alert summary to the underlying packets. Detailed analysis of the packet contents can help to reveal the tools used in an attack. It is also possible to extract binary files such as potential malware for analysis.
network data sources: netflow/ipfix
A flow collector is a means of recording metadata and statistics about network traffic rather than recording each frame. Network traffic and flow data may come from a wide variety of sources (or probes), such as switches, routers, firewalls, web proxies, and so forth. Flow analysis tools can provide features such as: Highlighting of trends and patterns in traffic generated by particular applications, hosts, and ports. Alerting based on detection of anomalies, flow analysis patterns, or custom triggers. Visualization tools that enable you to quickly create a map of network connections and interpret patterns of traffic and flow data. Identification of traffic patterns revealing rogue user behavior, malware in transit, tunneling, applications exceeding their allocated bandwidth, and so forth. Identification of attempts by malware to contact a handler or command & control (C&C) channel. NetFlow is a Cisco-developed means of reporting network flow information to a structured database. NetFlow has been redeveloped as the IP Flow Information Export (IPFIX) IETF standard (tools.ietf.org/html/rfc7011). A particular traffic flow can be defined by packets sharing the same characteristics, referred to as keys, such as IP source and destination addresses and protocol type. A selection of keys is called a flow label, while traffic matching a flow label is called a flow record. You can use a variety of NetFlow monitoring tools to capture data for point-in-time analysis and to diagnose any security or operational issues the network is experiencing. There are plenty of commercial NetFlow suites, plus products offering similar functionality to NetFlow. The SiLK suite (tools.netsa.cert.org/silk/) and nfdump/nfsen (nfsen.sourceforge.net/) are examples of open-source implementations. Another popular tool is Argus (openargus.org). This uses a different data format to NetFlow, but the client tools can read and translate NetFlow data.
incident eradication and recovery
After an incident has been contained, you can apply mitigation techniques and controls to eradicate the intrusion tools and unauthorized configuration changes from your systems. Eradicating malware, backdoors, and compromised accounts from individual hosts is not the last step in incident response. You should also consider a recovery phase where the goal is restoration of capabilities and services. This means that hosts are fully reconfigured to operate the business workflow they were performing before the incident. An essential part of recovery is the process of ensuring that the system cannot be compromised through the same attack vector (or failing that, that the vector is closely monitored to provide advance warning of another attack). Eradication of malware or other intrusion mechanisms and recovery from the attack will involve several steps: If reinstalling from baseline template configurations or images, make sure that there is nothing in the baseline that allowed the incident to occur! If so, update the template before rolling it out again. If your organization is subjected to a targeted attack, be aware that one incident may be very quickly followed by another. Reconstitution of affected systems—either remove the malicious files or tools from affected systems or restore the systems from secure backups/images. Reaudit security controls—ensure they are not vulnerable to another attack. This could be the same attack or from some new attack that the attacker could launch through information they have gained about your network. Ensure that affected parties are notified and provided with the means to remediate their own systems. For example, if customers' passwords are stolen, they should be advised to change the credentials for any other accounts where that password might have been used (not good practice, but most people do it).
application log files
An application log file is simply one that is managed by the application rather than the OS. The application may use Event Viewer or syslog to write event data using a standard format, or it might write log files to its own application directories in whatever format the developer has selected. DNS Event Logs A DNS server may log an event each time it handles a request to convert between a domain name and an IP address. DNS event logs can hold a variety of information that may supply useful security intelligence, such as: The types of queries a host has made to DNS. Hosts that are in communication with suspicious IP address ranges or domains. Statistical anomalies such as spikes or consistently large numbers of DNS lookup failures, which may point to computers that are infected with malware, misconfigured, or running obsolete or faulty applications. Web/HTTP Access Logs Web servers are typically configured to log HTTP traffic that encounters an error or traffic that matches some predefined rule set. Most web servers use the common log format (CLF) or W3C extended log file format to record the relevant information. The status code of a response can reveal quite a bit about both the request and the server's behavior. Codes in the 400 range indicate client-based errors, while codes in the 500 range indicate server-based errors. For example, repeated 403 ("Forbidden") responses may indicate that the server is rejecting a client's attempts to access resources they are not authorized to. A 502 ("Bad Gateway") response could indicate that communications between the target server and its upstream server are being blocked, or that the upstream server is down. In addition to status codes, some web server software also logs HTTP header information for both requests and responses. This can provide you with a better picture of the makeup of each request or response, such as cookie information and MIME types. Another header field of note is the User-Agent field, which identifies the type of application making the request. In most cases, this is the version of the browser that the client is using to access a site, as well as the client's operating system. However, this can be misleading, as even a browser like Microsoft Edge includes versions of Google Chrome and Safari in its User-Agent string. Therefore, the User-Agent field may not be a reliable indicator of the client's environment. VoIP and Call Managers and Session Initiation Protocol (SIP) Traffic Many VoIP systems use the Session Initiation Protocol (SIP) to identify endpoints and setup calls. The call content is transferred using a separate protocol, typically the Real-time Transfer Protocol (RTP). VoIP protocols are vulnerable to most of the same vulnerabilities and exploits as web communications. Both SIP and RTP should use the secure protocol forms, where endpoints are authenticated and communications protected by Transport Layer Security (TLS). The call manager is a gateway that connects endpoints within the local network and over the Internet. The call manager is also likely to implement a media gateway to connect VoIP calls to cellphone and landline telephone networks. SIP produces similar logs to SMTP, typically in the common log format. A SIP log will identify the endpoints involved in a call request, plus the type of connection (voice only or voice with video, for instance), and status messaging. When handling requests, the call manager and any other intermediate servers add their IP address in a Via header, similar to per-hop SMTP headers. Inspecting the logs might reveal evidence of a man-in-the-middle attack where an unauthorized proxy is intercepting traffic. VoIP systems connected to telephone networks are also targets for toll fraud. The call manager's access log can be audited for suspicious connections. Dump Files System memory contains volatile data. A system memory dump creates an image file that can be analyzed to identify the processes that are running, the contents of temporary file systems, registry data, network connections, cryptographic keys, and more. It can also be a means of accessing data that is encrypted when stored on a mass storage device.
metadata: email
An email's Internet header contains address information for the recipient and sender, plus details of the servers handling transmission of the message between them. When an email is created, the mail user agent (MUA) creates an initial header and forwards the message to a mail delivery agent (MDA). The MDA should perform checks that the sender is authorized to issue messages from the domain. Assuming the email isn't being delivered locally at the same domain, the MDA adds or amends its own header and then transmits the message to a message transfer agent (MTA). The MTA routes the message to the recipient, with the message passing via one or more additional MTAs, such as SMTP servers operated by ISPs or mail security gateways. Each MTA adds information to the header. Headers aren't exposed to the user by most email applications, which is why they're usually not a factor in an average user's judgment. You can view and copy headers from a mail client via a message properties/options/source command. MTAs can add a lot of information in each received header, such as the results of spam checking. If you use a plaintext editor to view the header, it can be difficult to identify where each part begins and ends. Fortunately, there are plenty of tools available to parse headers and display them in a more structured format. One example is the Message Analyzer tool, available as part of the Microsoft Remote Connectivity Analyzer (testconnectivity.microsoft.com/tests/o365). This will lay out the hops that the message took more clearly and break out the headers added by each MTA.
incident response plan
An incident response plan (IRP) lists the procedures, contacts, and resources available to responders for various incident categories. The CSIRT should develop profiles or scenarios of typical incidents (DDoS attack, virus/worm outbreak, data exfiltration by an external adversary, data modification by an internal adversary, and so on). This will guide investigators in determining priorities and remediation plans. A playbook (or runbook) is a data-driven standard operating procedure (SOP) to assist junior analysts in detecting and responding to specific cyberthreat scenarios, such as phishing attempts, SQL injection data exfiltration, connection to a block-listed IP range, and so on. The playbook starts with a SIEM report and query designed to detect the incident and identify the key detection, containment, and eradication steps to take. Incident categories and definitions ensure that all response team members and other organizational personnel all have a common base of understanding of the meaning of terms, concepts, and descriptions. The categories, types, and definitions might vary according to industry. For a listing of the US federal agency incident categories, you can visit us-cert.cisa.gov/sites/default/files/publications/Federal_Incident_Notification_Guidelines.pdf. One challenge in incident management is to allocate resources efficiently. This means that identified incidents must be assessed for severity and prioritized for remediation. There are several factors that can affect this process: Data integrity—the most important factor in prioritizing incidents will often be the value of data that is at risk. Downtime—another very important factor is the degree to which an incident disrupts business processes. An incident can either degrade (reduce performance) or interrupt (completely stop) the availability of an asset, system, or business process. If you have completed an asset inventory and a thorough risk assessment of business processes (showing how assets and computer systems assist each process), then you can easily identify critical processes and quantify the impact of an incident in terms of the cost of downtime. Economic/publicity—both data integrity and downtime will have important economic effects, both in the short term and the long term. Short-term costs involve incident response itself and lost business opportunities. Long-term economic costs may involve damage to reputation and market standing. Scope—the scope of an incident (broadly the number of systems affected) is not a direct indicator of priority. A large number of systems might be infected with a type of malware that degrades performance, but is not a data breach risk. This might even be a masking attack as the adversary seeks to compromise data on a single database server storing top secret information. Detection time—research has shown that the existence of more than half of data breaches are not detected for weeks or months after the intrusion occurs, while in a successful intrusion data is typically breached within minutes. This demonstrates that the systems used to search for intrusions must be thorough and the response to detection must be fast. Recovery time—some incidents require lengthy remediation as the system changes required are complex to implement. This extended recovery period should trigger heightened alertness for continued or new attacks.
firewall config changes
Analysis of an attack should identify the vector exploited by the attacker. This analysis is used to identify configuration changes that block that attack vector. A configuration change may mean the deployment of a new type of security control, or altering the settings of an existing control to make it more effective. Historically, many organizations focused on ingress filtering rules, designed to prevent local network penetration from the Internet. In the current threat landscape, it is imperative to also apply strict egress filtering rules to prevent malware that has infected internal hosts by other means from communicating out to C&C servers. Egress filtering can be problematic in terms of interrupting authorized network activity, but it is an essential component of modern network defense. Some general guidelines for configuring egress filtering are: Allow only authorized application ports and, if possible, restrict the destination addresses to authorized Internet hosts. Where authorized hosts cannot be identified or a default deny is too restrictive, use URL and content filtering to try to detect malicious traffic over authorized protocols. Restrict DNS lookups to your own or your ISP's DNS services or authorized public resolvers, such as Google's or Quad9's DNS services. Block access to "known bad" IP address ranges, as listed on don't route or peer (DROP) filter lists. Block access from any IP address space that is not authorized for use on your local network. Block all Internet access from host subnets that do not need to connect to the Internet, such as most types of internal server, workstations used to manage industrial control systems (ICSs), and so on. Even within these rules, there is a lot of scope for threat actors to perform command signaling and exfiltration. For example, cloud services, such as content delivery networks and social media platforms, can be used to communicate scripts and malware commands and to exfiltrate data over HTTPS (rhinosecuritylabs.com/aws/hiding-cloudcobalt-strike-beacon-c2-using-amazon-apis).
adversarial artificial intelligence
Artificial Intelligence (AI)-type systems are used extensively for user and entity behavior analytics (UEBA). A UEBA is trained on security data from customer systems and honeypots. This allows the AI to determine features of malicious code and account activity and to recognize those features in novel data streams. To make use of UEBA, host event data and network traffic is streamed to a cloud-based analytics service. An attacker with undetected persistent access to the network, but with a low probability of effecting lateral movement or data exfiltration, may be in a position to inject traffic into this data stream with a long-term goal of concealing tools that could achieve actions on objectives. The attacker may use his or her own AI resources as a means of generating samples, hence adversarial AI. Manipulated samples could also be uploaded to public repositories, such as virustotal.com. For example, ML algorithms are highly sensitive to noise. This is demonstrated in image recognition cases, where given a doctored image of a turtle, an AI will identify it as a rifle (theregister.com/2017/11/06/mit_fooling_ai). To a human observer, the image appears to be that of a perfectly ordinary turtle. Similar techniques might be used to cause an AI to miscategorize an attack tool as a text editor. Successful adversarial attacks mostly depend on knowledge of the algorithms used by the target AI. This is referred to as a white box attack. Keeping those algorithms secret forces the adversarial AI to use black box techniques, which are more difficult to develop. Algorithm secrecy is security by obscurity, however, and difficult to ensure. Other solutions include generating adversarial examples and training the system to recognize them. Another option is to develop a filter that can detect and block adversarial samples as they are submitted. A Microsoft presentation at BlackHat illustrates some of the techniques that can be used to mitigate adversarial AI (i.blackhat.com/us-18/Thu-August-9/us-18-Parikh-Protecting-the-Protector-Hardening-Machine-Learning-Defenses-Against-Adversarial-Attacks.pdf).
mitre att&ck
As an alternative to the life cycle analysis implied by a kill chain, the MITRE Corporation's Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) matrices provide access to a database of known TTPs. This freely available resource (attack.mitre.org) tags each technique with a unique ID and places it in one or more tactic categories, such as initial access, persistence, lateral movement, or command and control. The sequence in which attackers may deploy any given tactic category is not made explicit. This means analysts must interpret each attack life cycle from local evidence. The framework makes TTPs used by different adversary groups directly comparable, without assuming how any particular adversary will run a campaign at a strategic level. There is a matrix for enterprise, which can also be viewed as TTPs directed against Linux, macOS, and Windows hosts, and a second matrix for mobile. For example, Drive by Compromise is given the ID T1189 and categorized as an Initial Access tactic that can target Windows, Linux, and macOS hosts. Clicking through to the page accesses information about detection methods, mitigation methods, and examples of historic uses and analysis.
incident containment
As incidents cover such a wide range of different scenarios, technologies, motivations, and degrees of seriousness, there is no standard approach to containment or incident isolation. Some of the many complex issues facing the CIRT are: What damage or theft has occurred already? How much more could be inflicted and in what sort of time frame (loss control)? What countermeasures are available? What are their costs and implications? What actions could alert the attacker to the fact that the attack has been detected? What evidence of the attack must be gathered and preserved? When an incident has been identified, classified, and prioritized, the next phase of incident response is containment. Containment techniques can be classed as either isolation-based or segmentation-based.
security orchestration, automation, and response
Automation is the action of scripting a single activity, while orchestration is the action of coordinating multiple automations (and possibly manual activity) to perform a complex, multistep task. In the case of security orchestration, automation, and response (SOAR), this task is principally incident response, though the technologies can also be used for tasks such as threat hunting too. SOAR is designed as a solution to the problem of the volume of alerts overwhelming analysts' ability to respond, measured as the mean time to respond (MTTR). A SOAR may be implemented as a standalone technology or integrated with a SIEM—often referred to as a next-gen SIEM. The basis of SOAR is to scan the organization's store of security and threat intelligence, analyze it using machine/deep learning techniques, and then use that data to automate and provide data enrichment for the workflows that drive incident response and threat hunting. It can also assist with provisioning tasks, such as creating and deleting user accounts, making shares available, or launching VMs from templates, to try to eliminate configuration errors. The SOAR will use technologies such as cloud and SDN/SDV APIs, orchestration tools, and cyberthreat intelligence (CTI) feeds to integrate the different systems that it is managing. It will also leverage technologies such as automated malware signature creation and user and entity behavior analytics (UEBA) to detect threats. An incident response workflow is usually defined as a playbook. A playbook is a checklist of actions to perform to detect and respond to a specific type of incident. A playbook should be made highly specific by including the query strings and signatures that will detect a particular type of incident. A playbook will also account for compliance factors, such as whether an incident must be reported as a breach plus when and to whom notification must be made. Where a playbook is implemented with a high degree of automation from a SOAR system, it can be referred to as a runbook, though the terms are also widely used interchangeably. The aim of a runbook is to automate as many stages of the playbook as possible, leaving clearly defined interaction points for human analysis. These interaction points should try to present all the contextual information and guidance needed for the analyst to make a quick, informed decision about the best way to proceed with incident mitigation. Rapid7 have produced an ebook demonstrating the uses of SOAR (rapid7.com/info/security-orchestration-and-automation-playbook/?x=d67w-U). A white paper by Demisto provides a useful overview of the role of SOAR across different organizations (cdn2.hubspot.net/hubfs/5003120/Content%20Downloads/White%20Papers/Demisto%20-%20State%20of%20SOAR.pdf).
network data sources: bandwidth monitor
Bandwidth usage can be a key indicator of suspicious behavior, if you have reliable baselines for comparison. Unexpected bandwidth consumption could be evidence of a data exfiltration attack, for instance. Bandwidth usage can be reported by flow collectors. Firewalls and web security gateways are also likely to support bandwidth monitoring and alerting.
content filter config changes: update or revoke certificates
Compromise of the private key represented by a digital certificate or the ability to present spoofed certificates as trusted is a critical security vulnerability as it allows an attacker to impersonate trusted resources and potentially gain unauthorized access to secure systems. Remove compromised root certificates—if an attacker has managed to install a root certificate, the attacker can make malicious hosts and services seem trusted. Suspicious root certificates must be removed from the client's cache. Revoke certificates on compromised hosts—if a host is compromised, the private key it used for digital signatures or digital envelopes is no longer safe. The certificate associated with the key should be revoked using the Key Compromise property. The certificate can be rekeyed with a new key pair but the same subject and expiry information.
security information and event management
Coupled with an attack framework, notification will provide a general sense of where to look for or expect indicators of malicious activity. Incident analysis is greatly facilitated by a security information and event management (SIEM) system. A SIEM parses network traffic and log data from multiple sensors, appliances, and hosts and normalizes the information to standard field types. Correlation The SIEM can then run correlation rules on indicators extracted from the data sources to detect events that should be investigated as potential incidents. You can also filter or query the data based on the type of incident that has been reported. Correlation means interpreting the relationship between individual data points to diagnose incidents of significance to the security team. A SIEM correlation rule is a statement that matches certain conditions. These rules use logical expressions, such as AND and OR, and operators, such as == (matches), < (less than), > (greater than), and in (contains). For example, a single-user logon failure is not a condition that should raise an alert. Multiple user logon failures for the same account, taking place within the space of one hour, is more likely to require investigation and is a candidate for detection by a correlation rule. Error.LogonFailure > 3 AND LogonFailure.User AND Duration < 1 hour As well as correlation between indicators observed on the network, a SIEM is likely to be configured with a threat intelligence feed. This means that data points observed on the network can be associated with known threat actor indicators, such as IP addresses and domain names. AI-assisted analysis enables more sophisticated alerting and detection of anomalous behavior. Retention A SIEM can enact a retention policy so that historical log and network traffic data is kept for a defined period. This allows for retrospective incident and threat hunting, and can be a valuable source of forensic evidence.
content filter config changes: dlp
Data loss prevention (DLP) performs a similar function, but instead of user access it mediates the copying of tagged data to restrict it to authorized media and services. An attack may reveal the necessity of investing in DLP as a security control if one is not already implemented. If DLP is enabled and configured in the correct way to enforce policy, the attacker may have been able to circumvent it using a backdoor method that the DLP software cannot scan. Alternatively, the attacker may have been able to disguise the data so that it was not recognized.
cyber kill chain attack framework
Effective incident response depends on threat intelligence. Threat research provides insight into adversary tactics, techniques, and procedures (TTPs). Insights from threat research can be used to develop specific tools and playbooks to deal with event scenarios. A key tool for threat research is a framework to use to describe the stages of an attack. These stages are often referred to as a cyber kill chain, following the influential white paper Intelligence-Driven Computer Network Defense commissioned by Lockheed Martin (lockheedmartin.com/content/dam/lockheed-martin/rms/documents/cyber/LM-White-Paper-Intel-Driven-Defense.pdf). Stages in the kill chain. The Lockheed Martin kill chain identifies the following phases: Reconnaissance—in this stage the attacker determines what methods to use to complete the phases of the attack and gathers information about the target's personnel, computer systems, and supply chain. Weaponization—the attacker couples payload code that will enable access with exploit code that will use a vulnerability to execute on the target system. Delivery—the attacker identifies a vector by which to transmit the weaponized code to the target environment, such as via an email attachment or on a USB drive. Exploitation—the weaponized code is executed on the target system by this mechanism. For example, a phishing email may trick the user into running the code, while a drive-by-download would execute on a vulnerable system without user intervention. Installation—this mechanism enables the weaponized code to run a remote access tool and achieve persistence on the target system. Command and control (C2 or C&C)—the weaponized code establishes an outbound channel to a remote server that can then be used to control the remote access tool and possibly download additional tools to progress the attack. Actions on objectives—in this phase, the attacker typically uses the access he has achieved to covertly collect information from target systems and transfer it to a remote system (data exfiltration). An attacker may have other goals or motives, however.
metadata: file
File metadata is stored as attributes. The file system tracks when a file was created, accessed, and modified. A file might be assigned a security attribute, such as marking it as read-only or as a hidden or system file. The ACL attached to a file showing its permissions represents another type of attribute. Finally, the file may have extended attributes recording an author, copyright information, or tags for indexing/searching. In Linux, the ls command can be used to report file system metadata.
incident identification
Identification is the process of collating events and determining whether any of them should be managed as incidents or as possible precursors to an incident; that is, an event that makes an incident more likely to happen. There are multiple channels by which events or precursors may be recorded: Using log files, error messages, IDS alerts, firewall alerts, and other resources to establish baselines and identifying those parameters that indicate a possible security incident. Comparing deviations to established metrics to recognize incidents and their scopes. Manual or physical inspections of site, premises, networks, and hosts. Notification by an employee, customer, or supplier. Public reporting of new vulnerabilities or threats by a system vendor, regulator, the media, or other outside party. It is wise to provide for confidential reporting so that employees are not afraid to report insider threats, such as fraud or misconduct. It may also be necessary to use an "out-of-band" method of communication so as not to alert the intruder that his or her attack has been detected. First Responder When a suspicious event is detected, it is critical that the appropriate person on the CIRT be notified so that they can take charge of the situation and formulate the appropriate response. This person is referred to as the first responder. This means that employees at all levels of the organization must be trained to recognize and respond appropriately to actual or suspected security incidents. A good level of security awareness across the whole organization will reduce the incidence of false positives and negatives. For the most serious incidents, the entire CIRT may be involved in formulating an effective response. Analysis and Incident Identification When notification has taken place, the CIRT or other responsible person(s) must analyze the event to determine whether a genuine incident has been identified and what level of priority it should be assigned. Analysis will depend on identifying the type of incident and the data or resources affected (its scope and impact). At this point, the incident management database should have a record of the event indicators, the nature of the incident, its impact, and the incident investigator responsible. The next phase of incident management is to determine an appropriate response.
endpoint config changes
If endpoint security is breached, there are several classes of vector to consider for mitigation: Social engineering—if the malware was executed by a user, use security education and awareness to reduce the risk of future attacks succeeding. Review permissions to see if the account could be operated with a lower privilege level. Vulnerabilities—if the malware exploited a software fault, either install the patch or isolate the system until a patch can be developed. Lack of security controls—if the attack could have been prevented by endpoint protection/A-V, host firewall, content filtering, DLP, or MDM, investigate the possibility of deploying them to the endpoint. If this is not practical, isolate the system from being exploited by the same vector. Configuration drift—if the malware exploited an undocumented configuration change (shadow IT software or an unauthorized service/port, for instance), reapply the baseline configuration and investigate configuration management procedures to prevent this type of ad hoc change. Weak configuration—if the configuration was correctly applied, but was exploited anyway, review the template to devise more secure settings. Make sure the template is applied to similar hosts.
communication plan and stakeholder management
Incident response policies should establish clear lines of communication, both for reporting incidents and for notifying affected parties as the management of an incident progresses. It is vital to have essential contact information readily available. You must prevent the inadvertent release of information beyond the team authorized to handle the incident. Status and event details should be circulated on a need-to-know basis and only to trusted parties identified on a call list.
incident response process
Incident response policy sets the resources, processes, and guidelines for dealing with security incidents. Incident management is vital to mitigating risk. As well as controlling the immediate or specific threat to security, effective incident management preserves an organization's reputation. Incident response follows a well-structured process, such as that set out in the NIST Computer Security Incident Handling Guide special publication (nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf). The following are the principal stages in an incident response life cycle: Preparation—make the system resilient to attack in the first place. This includes hardening systems, writing policies and procedures, and setting up confidential lines of communication. It also implies creating incident response resources and procedures. Identification—from the information in an alert or report, determine whether an incident has taken place, assess how severe it might be (triage), and notify stakeholders. Containment—limit the scope and magnitude of the incident. The principal aim of incident response is to secure data while limiting the immediate impact on customers and business partners. Eradication—once the incident is contained, remove the cause and restore the affected system to a secure state by wiping a system and applying secure configuration settings. Recovery—with the cause of the incident eradicated, the system can be reintegrated into the business process that it supports. Applying patches and updates to a system to help prevent future incidents is important as well. This recovery phase may involve restoration of data from backup and security testing. Systems must be monitored more closely for a period to detect and prevent any reoccurrence of the attack. The response process may have to iterate through multiple phases of identification, containment, eradication, and recovery to effect a complete resolution. Lessons learned—analyze the incident and responses to identify whether procedures or systems could be improved. It is imperative to document the incident. The outputs from this phase feed back into a new preparation phase in the cycle. Incident response is likely to require coordinated action and authorization from several different departments or managers, which adds further levels of complexity.
isolation-based containment
Isolation involves removing an affected component from whatever larger environment it is a part of. This can be everything from removing a server from the network after it has been the target of a DoS attack, to placing an application in a sandbox VM outside of the host environments it usually runs on. Whatever the circumstances may be, you'll want to make sure that there is no longer an interface between the affected component and your production network or the Internet. A simple option is to disconnect the host from the network completely, either by pulling the network plug (creating an air gap) or disabling its switch port. This is the least stealthy option and will reduce opportunities to analyze the attack or malware. If a group of hosts is affected, you could use routing infrastructure to isolate one or more infected virtual LANs (VLANs) in a black hole that is not reachable from the rest of the network. Another possibility is to use firewalls or other security filters to prevent infected hosts from communicating. Finally, isolation could also refer to disabling a user account or application service. Temporarily disabling users' network accounts may prove helpful in containing damage if an intruder is detected within the network. Without privileges to access resources, an intruder will not be able to further damage or steal information from the organization. Applications that you suspect may be the vector of an attack can be much less effective to the attacker if the application is prevented from executing on most hosts.
logging platforms
Log data from network appliances and hosts can be aggregated by a SIEM either by installing a local agent to collect and parse the log data or by using a forwarding system to transmit logs directly to the SIEM server. Also, organizations may not operate a SIEM, but still use a logging platform to aggregate log data in a central location. Syslog Syslog (tools.ietf.org/html/rfc3164) provides an open format, protocol, and server software for logging event messages. It is used by a very wide range of host types. For example, syslog messages can be generated by Cisco routers and switches, as well as servers and workstations. It usually uses UDP port 514. A syslog message comprises a PRI code, a header containing a timestamp and host name, and a message part. The PRI code is calculated from the facility and a severity level. The message part contains a tag showing the source process plus content. The format of the content is application dependent. It might use space- or comma-delimited fields or name/value pairs, such as JSON data. RFC 5424 (tools.ietf.org/html/rfc5424) adjusts the structure slightly to split the tag into app name, process ID, and message ID fields, and to make them part of the header. Rsyslog and Syslog-ng There have been two updates to the original syslog specification: Rsyslog uses the same configuration file syntax, but can work over TCP and use a secure connection. Rsyslog can use more types of filter expressions in its configuration file to customize message handling. Syslog-ng uses a different configuration file syntax, but can also use TCP/secure communications and more advanced options for message filtering. journalctl In Linux, text-based log files of the sort managed by syslog can be viewed using commands such as cat, tail, and head. Most modern Linux distributions now use systemd to initialize the system and to start and manage background services. Rather than writing events to syslog-format text files, logs from processes managed by systemd are written to a binary-format file called journald. Events captured by journald can be forwarded to syslog. To view events in journald directly, you can use the journalctl command to print the entire journal log, or you can issue various options with the command to filter the log in a variety of ways, such as matching a service name or only printing messages matching the specified severity level. NXlog NXlog (nxlog.co) is an open-source log normalization tool. One principal use for it is to collect Windows logs, which use an XML-based format, and normalize them to a syslog format.
network, os, and security log files
Log file data is a critical resource for investigating security incidents. As well as the log format, you must also consider the range of sources for log files and know how to determine what type of log file will best support any given investigation scenario. System and Security Logs One source of security information is the event log from each network server or client. Systems such as Microsoft Windows, Apple macOS, and Linux keep a variety of logs to record events as users and software interact with the system. The format of the logs varies depending on the system. Information contained within the logs also varies by system, and in many cases, the type of information that is captured can be configured. When events are generated, they are placed into log categories. These categories describe the general nature of the events or what areas of the OS they affect. The five main categories of Windows event logs are: Application—events generated by applications and services, such as when a service cannot start. Security—Audit events, such as a failed logon or access to a file being denied. System—events generated by the operating system and its services, such as storage volume health checks. Setup—events generated during the installation of Windows. Forwarded Events—events that are sent to the local log from other hosts. Network Logs Network logs are generated by appliances such as routers, firewalls, switches, and access points. Log files will record the operation and status of the appliance itself—the system log for the appliance—plus traffic and access logs recording network behavior, such as a host trying to use a port that is blocked by the firewall, or an endpoint trying to use multiple MAC addresses when connected to a switch. Authentication Logs Authentication attempts for each host are likely to be written to the security log. You might also need to inspect logs from the servers authorizing logons, such as RADIUS and TACACS+ servers or Windows Active Directory (AD) servers. Vulnerability Scan Output A vulnerability scan report is another important source when determining how an attack might have been made. The scan engine might log or alert when a scan report contains vulnerabilities. The report can be analyzed to identify vulnerabilities that have not been patched or configuration weaknesses that have not been remediated. These can be correlated to recently developed exploits.
content filter config changes: mdm
Mobile Device Management (MDM) provides execution control over apps and features of smartphones. Features include GPS, camera, and microphone. As with DLP, an intrusion might reveal a vector that allowed the threat actor to circumvent enrollment or a misconfiguration in the MDM's policy templates.
metadata: mobile
Mobile phone metadata comprises call detail records (CDRs) of incoming, outgoing, and attempted calls and SMS text time, duration, and the opposite party's number. Metadata will also record data transfer volumes. The location history of the device can be tracked by the list of cell towers it has used to connect to the network. If you are investigating a suspected insider attack, this metadata could prove a suspect's whereabouts. Furthermore, AI-enabled analysis (or patient investigation) can correlate the opposite party numbers to businesses and individuals through other public records. CDRs are generated and stored by the mobile operator. The retention period for CDRs is determined by national and state laws, but is typically around 18 months. CDRs are directly available for corporate-owned devices, where you can request them from the communications provider as the owner of the device. Metadata for personally owned devices would only normally be accessible by law enforcement agencies by subpoena or with the consent of the account holder. An employment contract might require an employee to give this consent for bring your own device (BYOD) mobiles used within the workplace. Metadata such as current location and time is also added to media such as photos and videos, though this is true for all types of computing device. When these files are uploaded to social media sites, they can reveal more information than the uploader intended.
application allow lists and block lists
One element of endpoint configuration is an execution control policy that defines applications that can or cannot be run. An allow list (or approved list) denies execution unless the process is explicitly authorized. A block list (or deny list) generally allows execution, but explicitly prohibits listed processes. You will need to update the contents of allow lists and block lists in response to incidents and as a result of ongoing threat hunting and monitoring. Threat hunting may also provoke a strategic change. For example, if you rely principally on explicit denies, but your systems are subject to numerous intrusions, you will have to consider adopting a "least privileges" model and using a deny-unless-listed approach. This sort of change has the potential to be highly disruptive however, so it must be preceded by a risk assessment and business impact analysis. Execution control can also be tricky to configure effectively, with many opportunities for threat actors to evade the controls. Detailed analysis of the attack might show the need for changes to the existing mechanism, or the use of a more robust system.
cyber incident response team
Preparing for incident response means establishing the policies and procedures for dealing with security breaches and the personnel and resources to implement those policies. One of the first challenges lies in defining and categorizing types of incidents. An incident is generally described as an event where security is breached or there is an attempted breach. NIST describes an incident as "the act of violating an explicit or implied security policy." In order to identify and manage incidents, you should develop some method of reporting, categorizing, and prioritizing them (triage), in the same way that troubleshooting support incidents can be logged and managed. As well as investment in appropriate detection and analysis software, incident response requires expert staffing. Large organizations will provide a dedicated team as a single point-of-contact for the notification of security incidents. This team is variously described as a cyber incident response team (CIRT), computer security incident response team (CSIRT), or computer emergency response team (CERT). Incident response might also involve or be wholly located within a security operations center (SOC). However it is set up, the team needs a mixture of senior management decision-makers (up to director level) who can authorize actions following the most serious incidents, managers, and technicians who can deal with minor incidents on their own initiative. Another important consideration is availability. Incident response will typically require 24/7 availability, which will be expensive to provide. It is also worth considering that members of the CIRT should be rotated periodically to preclude the possibility of infiltration. For major incidents, expertise and advice from other business divisions will also need to be called upon: Legal—it is important to have access to legal expertise, so that the team can evaluate incident response from the perspective of compliance with laws and industry regulations. It may also be necessary to liaise closely with law enforcement professionals, and this can be daunting without expert legal advice. Human Resources (HR)—incident prevention and remediation actions may affect employee contracts, employment law, and so on. Incident response requires the right to intercept and monitor employee communications. Marketing—the team is likely to require marketing or public relations input, so that any negative publicity from a serious incident can be managed. Some organizations may prefer to outsource some of the CIRT functions to third-party agencies by retaining an incident response provider. External agents are able to deal more effectively with insider threats.
siem dashboards
SIEM dashboards are one of the main sources of automated alerts. A SIEM dashboard provides a console to work from for day-to-day incident response. Separate dashboards can be created to suit many different purposes. An incident handler's dashboard will contain uncategorized events that have been assigned to their account, plus visualizations (graphs and tables) showing key status metrics. A manager's dashboard would show overall status indicators, such as number of unclassified events for all event handlers. The SGUIL console in Security Onion. A SIEM can generate huge numbers of alerts that need to be manually assessed for priority and investigation. (Screenshot courtesy of Security Onion securityonion.net.) Sensitivity and Alerts One of the greatest challenges in operating a SIEM is tuning the system sensitivity to reduce false positive indicators being reported as an event. This is difficult firstly because there isn't a simple dial to turn for overall sensitivity, and secondly because reducing the number of rules that produce events increases the risk of false negatives. A false negative is where indicators that should be correlated as an event and raise an alert are ignored. The correlation rules are likely to assign a criticality level to each match. For example: Log only—an event is produced and added to the SIEM's database, but it is automatically classified. Alert—the event is listed on a dashboard or incident handling system for an agent to assess. The agent classifies the event and either dismisses it to the log or escalates it as an incident. Alarm—the event is automatically classified as critical and a priority alarm is raised. This might mean emailing an incident handler or sending a text message. Sensors A sensor is a network tap or port mirror that performs packet capture and intrusion detection. One of the key uses of a SIEM is to aggregate data from multiple sensors and log sources, but it might also be appropriate to configure dashboards that show output from a single sensor or source host.
communication plan
Secure communication between the trusted parties of the CIRT is essential for managing incidents successfully. It is imperative that adversaries not be alerted to detection and remediation measures about to be taken against them. It may not be appropriate for all members of the CSIRT to be informed about all incident details. The team requires an "out-of-band" or "off-band" communication method that cannot be intercepted. Using corporate email or VoIP runs the risk that the adversary will be able to intercept communications. One obvious method is cell phones but these only support voice and text messaging. For file and data exchange, there should be a messaging system with end-to-end encryption, such as Off-the-Record (OTR), Signal, or WhatsApp, or an external email system with message encryption (S/MIME or PGP). These need to use digital signatures and encryption keys from a system that is completely separate from the identity management processes of the network being defended.
segmentation-based containment
Segmentation-based containment is a means of achieving the isolation of a host or group of hosts using network technologies and architecture. Segmentation uses VLANs, routing/subnets, and firewall ACLs to prevent a host or group of hosts from communicating outside the protected segment. As opposed to completely isolating the hosts, you might configure the protected segment as a sinkhole or honeynet and allow the attacker to continue to receive filtered (and possibly modified) output over the C&C channel to deceive him or her into thinking the attack is progressing successfully. Analysis of the malware code by reverse engineering it could provide powerful deception capabilities. You could intercept the function calls made by malware to allow the adversary to believe an attack is proceeding while building detailed knowledge of their tactics and (hopefully) identity. Attribution of the attack to a particular group will allow an estimation of adversary capability.
content filter config changes
The limitations of a basic packet filtering firewall (even if it is stateful) mean that some sort of content filtering application proxy may provide better security. These types of appliances are usually referred to as secure web gateways (SWGs). A SWG mediates user access to Internet services, with the ability to block content from regularly updated URL/domain/IP block lists and perform intrusion detection/prevention on traffic based on matching content in application layer protocol headers and payloads. If a SWG is already in place, an attacker may have found a way to circumvent it via some sort of backdoor. The network configuration should be checked and updated to ensure that all client access to the Internet must pass through the SWG. Another possibility is that the attacker is using a protocol or C&C method that is not filtered. The SWG should be updated with scripts and data, domains and IP addresses, that will block the exploit.
incident response exercises
The procedures and tools used for incident response are difficult to master and execute effectively. You do not want to be in the situation where first-time staff members are practicing them in the high-pressure environment of an actual incident. Running test exercises helps staff develop competencies and can help to identify deficiencies in the procedures and tools. Training on specific incident response scenarios can use three forms: Tabletop—this is the least costly type of training. The facilitator presents a scenario and the responders explain what action they would take to identify, contain, and eradicate the threat. The training does not use computer systems. The scenario data is presented as flashcards. Walkthroughs—in this model, a facilitator presents the scenario as for a tabletop exercise, but the incident responders demonstrate what actions they would take in response. Unlike a tabletop exercise, the responders perform actions such as running scans and analyzing sample files, typically on sandboxed versions of the company's actual response and recovery tools. Simulations—a simulation is a team-based exercise, where the red team attempts an intrusion, the blue team operates response and recovery controls, and a white team moderates and evaluates the exercise. This type of training requires considerable investment and planning.
trend analysis
Trend analysis is the process of detecting patterns or indicators within a data set over a time series and using those patterns to make predictions about future events. A trend is difficult to spot by examining each event in a log file. Instead, you need software to visualize the incidence of types of event and show how the number or frequency of those events changes over time. Trend analysis can apply to frequency, volume, or statistical deviation: Frequency-based trend analysis establishes a baseline for a metric, such as number of NXERROR DNS log events per hour of the day. If the frequency exceeds (or in some cases undershoots) the threshold for the baseline, then an alert is raised. Volume-based trend analysis can be performed with simpler indicators. For example, one simple metric for determining threat level is log volume. If logs are growing much faster than they were previously, there is a good chance that something needs investigating. Volume-based analysis also applies to network traffic. You might also measure endpoint disk usage. Client workstations don't usually need to store data locally, so if a host's disk capacity has suddenly diminished, it could be a sign that it is being used to stage data for exfiltration. Statistical deviation analysis can show when a data point should be treated as suspicious. For example, a cluster graph might show activity by standard users and privileged users, invoking analysis of behavioral metrics of what processes each type runs, which systems they access, and so on. A data point that appears outside the two clusters for standard and administrative users might indicate some suspicious activity by that account.
stakeholder managment
Trusted parties might include both internal and external stakeholders. It is not helpful for an incident to be publicized in the press or through social media outside of planned communications. Ensure that parties with privileged information do not release this information to untrusted parties, whether intentionally or inadvertently. You need to consider obligations to report the attack. It may be necessary to inform affected parties during or immediately after the incident so that they can perform their own remediation. It may be necessary to report to regulators or law enforcement. You also need to consider the marketing and PR impact of an incident. This can be highly damaging and you will need to demonstrate to customers that security systems have been improved.
THE DIAMOND model of intrustion
look at pic***** The Diamond Model of Intrusion Analysis suggests a framework to analyze an intrusion event (E) by exploring the relationships between four core features: adversary, capability, infrastructure, and victim. These four features are represented by the four vertices of a diamond shape. Each event may also be described by meta-features, such as date/time, kill chain phase, result, and so on. Each feature is also assigned a confidence level (C), indicating data accuracy or the reliability of a conclusion or assumption assigned to the value by analysis.