Module 4 - Detection and Response (6) - Google Cybersecurity Certificate
YARA-L
A computer language used to create rules for searching through ingested log data.
Splunk
A data analysis platform. Splunk Enterprise provides SIEM solutions that let you search, analyze, and visualize security data. First it collects data from different sources. That data gets processed and stored in an index. Then it can be accessed in a variety of different ways, like through search.
Anomaly-based Analysis
A detection method that identifies abnormal behavior. There are two phases to anomaly-based analysis: a training phase and a detection phase. In the training phase, a baseline of normal or expected behavior must be established. Baselines are developed by collecting data that corresponds to normal system behavior. In the detection phase, the current system activity is compared against this baseline. Activity that happens outside of the baseline gets logged, and an alert is generated.
Signature Analysis
A detection method used to find events of interest.
JavaScript Object Notation (JSON)
A file format that is used to store and transmit data. JSON is known for being lightweight and easy to read and write. It is used for transmitting data in web technologies and is also commonly used in cloud environments. JSON syntax is derived from JavaScript syntax. If you are familiar with JavaScript, you might recognize that JSON contains components from JavaScript including: Key-value pairs Commas Double quotes Curly brackets Square brackets
CEF (Common Event Format)
A log format that uses key-value pairs to structure data and identify fields and their corresponding values. The CEF syntax is defined as containing the following fields: CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension Fields are all separated with a pipe character |. However, anything in the Extension part of the CEF log entry must be written in a key-value format. Syslog is a common method used to transport logs like CEF. When Syslog is used a timestamp and hostname will be prepended to the CEF message. Here is an example of a CEF log entry that details malicious activity relating to a worm infection: Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.2 dst=2.1.2.2 spt=1232 Here is a breakdown of the fields: Syslog Timestamp: Sep 29 08:26:10 Syslog Hostname: host Version: CEF:1 Device Vendor: Security Device Product: threatmanager Device Version: 1.0 Signature ID: 100 Name: worm successfully stopped Severity: 10 Extension: This field contains data written as key-value pairs. There are two IP addresses, src=10.0.0.2 and dst=2.1.2.2, and a source port number spt=1232. Extensions are not required and are optional to add. This log entry contains details about a Security application called threatmanager that successfully stopped a worm from spreading from the internal network at 10.0.0.2 to the external network 2.1.2.2 through the port 1232. A high severity level of 10 is reported. Note: Extensions and syslog prefix are optional to add to a CEF log.
Log
A record of events that occur within an organization's systems.
Key-value Pairs
A set of data that represents two linked items: a key and its corresponding value. A key-value pair consists of a key followed by a colon, and then followed by a value. An example of a key-value pair is "Alert": "Malware". Note: For readability, it is recommended that key-value pairs contain a space before or after the colon that separates the key and value.
Signature-based Analysis
A signature is a pattern that is associated with malicious activity. Signatures can contain specific patterns like a sequence of binary numbers, bytes, or even specific data like an IP address. Previously, you explored the Pyramid of Pain, which is a concept that prioritizes the different types of indicators of compromise (IoCs) associated with an attack or threat, such as IP addresses, tools, tactics, techniques, and more. IoCs and other indicators of attack can be useful for creating targeted signatures to detect and block attacks. Different types of signatures can be used depending on which type of threat or attack you want to detect. For example, an anti-malware signature contains patterns associated with malware. This can include malicious scripts that are used by the malware. IDS tools will monitor an environment for events that match the patterns defined in this malware signature. If an event matches the signature, the event gets logged and an alert is generated.
Advantages:
Ability to detect new and evolving threats: Unlike signature-based analysis, which uses known patterns to detect threats, anomaly-based analysis can detect unknown threats.
Analyze Data
After the data is collected and normalized, SIEM tools analyze and correlate the data to identify common patterns that indicate unusual activity.
Suricata Log Types
Alert logs Network telemetry logs
Log Protection
Along with management and retention, the protection of logs is vital in maintaining log integrity. It's not unusual for malicious actors to modify logs in attempts to mislead security teams and to even hide their activity. Storing logs in a centralized log server is a way to maintain log integrity. When logs are generated, they get sent to a dedicated server instead of getting stored on a local machine. This makes it more difficult for attackers to access logs because there is a barrier between the attacker and the log location.
Custom Rules
Although Suricata comes with pre-written rules, it is highly recommended that you modify or customize the existing rules to meet your specific security requirements. There is no one-size-fits-all approach to creating and modifying rules. This is because each organization's IT infrastructure differs. Security teams must extensively test and modify detection signatures according to their needs. Creating custom rules helps to tailor detection and monitoring. Custom rules help to minimize the amount of false positive alerts that security teams receive. It's important to develop the ability to write effective and customized signatures so that you can fully leverage the power of detection technologies.
Security Information and Event Management (SIEM)
An application that collects and analyzes log data to monitor critical activities in an organization. First, SIEM tools COLLECT AND PROCESS enormous amounts of data generated by devices and systems from all over an environment. Not all data is the same. As you already know, devices generate data in different formats. This can be challenging because there is no unified format to represent the data. SIEM tools make it easy for security analysts to read and analyze data by NORMALIZING it. Raw data gets processed, so that it's formatted consistently and only relevant event information is included. Finally, SIEM tools INDEX the data, so it can be accessed through search. All of the events across all the different sources can be accessed with your fingertips.
Network-based Intrusion Detection System
An application that collects and monitors network traffic and network data. NIDS software is installed on devices located at specific parts of the network that you want to monitor. The NIDS application inspects network traffic from different devices on the network. If any malicious network traffic is detected, the NIDS logs it and generates an alert. Using a combination of HIDS and NIDS to monitor an environment can provide a multi-layered approach to intrusion detection and response. HIDS and NIDS tools provide a different perspective on the activity occurring on a network and the individual hosts that are connected to it. This helps provide a comprehensive view of the activity happening in an environment.
Intrusion Detection System (IDS)
An application that monitors activity and alerts on possible intrusions.
Host-based Intrusion Detection System
An application that monitors the activity of the host on which it's installed. A HIDS is installed as an agent on a host. A host is also known as an endpoint, which is any device connected to a network like a computer or a server. Typically, HIDS agents are installed on all endpoints and used to monitor and detect security threats. A HIDS monitors internal activity happening on the host to identify any unauthorized or abnormal behavior. If anything unusual is detected, such as the installation of an unauthorized application, the HIDS logs it and sends out an alert. In addition to monitoring inbound and outbound traffic flows, HIDS can have additional capabilities, such as monitoring file systems, system resource usage, user activity, and more.
Endpoint
Any device connected on a network.
Square Brackets
Are used to enclose an array, which is a data type that stores data in a comma-separated ordered list. Arrays are useful when you want to store data as an ordered collection, for example: ["Administrators", "Users", "Engineering"].
Suricata Configuration Files
Before detection tools are deployed and can begin monitoring systems and networks, you must properly configure their settings so that they know what to do. A configuration file is a file used to configure the settings of an application. Configuration files let you customize exactly how you want your IDS to interact with the rest of your environment. Suricata's configuration file is suricata.yaml, which uses the YAML file format for syntax and structure.
Comma Separated Values (CSV)
CSV (Comma Separated Value) uses commas to separate data values. In CSV logs, the position of the data corresponds to its field name, but the field names themselves might not be included in the log. It's critical to understand what fields the source device (like an IPS, firewall, scanner, etc.) is including in the log. Here is an example: 2009-11-24T21:27:09.534255,ALERT,192.168.2.7, 1041,x.x.250.50,80,TCP,ALLOWED,1:2001999:9,"ET MALWARE BTGrab.com Spyware Downloading Ads",1
Commas
Commas are used to separate data. For example: "Alert": "Malware", "Alert code": 1090, "severity": 10.
Log Ingestion
Data is required for SIEM tools to work effectively. SIEM tools must first collect data using log ingestion. Log ingestion is the process of collecting and importing data from log sources into a SIEM tool. Data comes from any source that generates log data, like a server. In log ingestion, the SIEM creates a copy of the event data it receives and retains it within its own storage. This copy allows the SIEM to analyze and process the data without directly modifying the original source logs. The collection of event data provides a centralized platform for security analysts to analyze the data and respond to incidents. This event data includes authentication attempts, network activity, and more.
Log Details
Date Time Location Action Names Here is an example of an authentication log: Login Event [05:45:15] User1 Authenticated successfully Logs contain information and can be adjusted to contain even more information. Verbose logging records additional, detailed information beyond the default log recording. Here is an example of the same log above but logged as verbose. Login Event [2022/11/16 05:45:15.892673] auth_performer.cc:470 User1 Authenticated successfully from device1 (192.168.1.2)
Suricata Format Type
EVE JSON - Extensible Event Format JavaScript Object Notation
Curly Brackets
Enclose an object, which is a data type that stores data in a comma-separated list of key-value pairs. Objects are often used to describe multiple properties for a given key. JSON log entries start and end with a curly bracket. In this example, User is the object that contains multiple properties: "User" { "id": "1234", "name": "user", "role": "engineer" }
Normalize the Data
Event data that's been collected becomes normalized. Normalization converts data into a standard format so that data is structured in a consistent way and becomes easier to read and search. While data normalization is a common feature in many SIEM tools, it's important to note that SIEM tools vary in their data normalization capabilities.
Issues with Overlogging
From a security perspective, it can be tempting to log everything. This is the most common mistake organizations make. Just because it can be logged, doesn't mean it needs to be logged. Storing excessive amounts of logs can have many disadvantages with some SIEM tools. For example, overlogging can increase storage and maintenance costs. Additionally, overlogging can increase the load on systems, which can cause performance issues and affect usability, making it difficult to search for and identify important events.
Chronicle
Google Cloud's SIEM, which stores security data for search, analysis, and visualization. First, data gets forwarded to Chronicle. This data then gets normalized, or cleaned up, so it's easier to process and index. Finally, the data becomes available to be accessed through a search bar. In Chronicle, you can search for events using the Search field. You can also use Procedural Filtering to apply filters to a search to further refine the search results. For example, you can use Procedural Filtering to include or exclude search results that contain specific information relating to an event type or log source. There are two types of searches you can perform to find events in Chronicle, a Unified Data Mode (UDM) Search or a Raw Log Search.
Syslog Log Example
Here is an example of a syslog entry that contains all three components: a header, followed by structured-data, and a message: <236>1 2022-03-21T01:11:11.003Z virtual.machine.com evntslog - ID01 [user@32473 iut="1" eventSource="Application" eventID="9999"] This is a log entry!
Disadvantages:
High rate of false positives: Any behavior that deviates from the baseline can be flagged as abnormal, including non-malicious behaviors. This leads to a high rate of false positives. Pre-existing compromise: The existence of an attacker during the training phase will include malicious behavior in the baseline. This can lead to missing a pre-existing attacker.
Raw Log Search
If you can't find the information you are searching for through the normalized data, using a Raw Log Search will search through the raw, unparsed logs. You can perform a Raw Log Search by typing your search, clicking on "Search," and selecting "Raw Log Search." Because it is searching through raw logs, it takes longer than a structured search. In the Search field, you can perform a Raw Log Search by specifying information like usernames, filenames, hashes, and more. Chronicle will retrieve events that are associated with the search. Pro tip: Raw Log Search supports the use of regular expressions, which can help you narrow down a search to match on specific patterns.
UDM (Continued)
Know that all UDM events contain a set of common fields including: Entities: Entities are also known as nouns. All UDM events must contain at least one entity. This field provides additional context about a device, user, or process that's involved in an event. For example, a UDM event that contains entity information includes the details of the origin of an event such as the hostname, the username, and IP address of the event. Event metadata: This field provides a basic description of an event, including what type of event it is, timestamps, and more. Network metadata: This field provides information about network-related events and protocol details. Security results: This field provides the security-related outcome of events. An example of a security result can be an antivirus software detecting and quarantining a malicious file by reporting "virus detected and quarantined." Here's an example of a simple UDM search that uses the event metadata field to locate events relating to user logins: metadata.event_type = "USER_LOGIN" metadata.event_type = "USER_LOGIN": This UDM field metadata.event_type contains information about the event type. This includes information like timestamp, network connection, user authentication, and more. Here, the event type specifies USER_LOGIN, which searches for events relating to authentication. Using just the metadata fields, you can quickly start searching for events. As you continue practicing searching in Chronicle using UDM Search, you will encounter more fields. Try using these fields to form specific searches to locate different events.
Advantages:
Low rate of false positives: Signature-based analysis is very efficient at detecting known threats because it is simply comparing activity to signatures. This leads to fewer false positives. Remember that a false positive is an alert that incorrectly detects the presence of a threat.
Log Types
Network - Network logs are generated by network devices like firewalls, routers, or switches. System - System logs are generated by operating systems like Chrome OS™, Windows, Linux, or macOS®. Application - Application logs are generated by software applications and contain information relating to the events occurring within the application such as a smartphone app. Security - Security logs are generated by various devices or systems such as antivirus software and intrusion detection systems. Security logs contain security-related information such as file deletion. Authentication - Authentication logs are generated whenever authentication occurs such as a successful login attempt into a computer.
Log Retention
Organizations might operate in industries with regulatory requirements. For example, some regulations require organizations to retain logs for set periods of time and organizations can implement log retention practices in their log management policy. Organizations that operate in the following industries might need to modify their log management policy to meet regulatory requirements: Public sector industries, like the Federal Information Security Modernization Act (FISMA) Healthcare industries, like the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Financial services industries, such as the Payment Card Industry Data Security Standard (PCI DSS), the Gramm-Leach-Bliley Act (GLBA), and the Sarbanes-Oxley Act of 2002 (SOX)
Rules of Suricata
Rules or signatures are used to identify specific patterns, behavior, and conditions of network traffic that might indicate malicious activity. The terms rule and signature are often used interchangeably in Suricata. Security analysts use signatures, or patterns associated with malicious activity, to detect and alert on specific malicious activity. Rules can also be used to provide additional context and visibility into systems and networks, helping to identify potential security threats or vulnerabilities. Suricata uses signatures analysis, which is a detection method used to find events of interest. Signatures consist of three components: Action: The first component of a signature. It describes the action to take if network or system activity matches the signature. Examples include: alert, pass, drop, or reject. Header: The header includes network traffic information like source and destination IP addresses, source and destination ports, protocol, and traffic direction. Rule options: The rule options provide you with different options to customize signatures. Note: Rule order refers to the order in which rules are evaluated by Suricata. Rules are processed in the order in which they are defined in the configuration file. However, Suricata processes rules in a different default order: pass, drop, reject, and alert. Rule order affects the final verdict of a packet especially when conflicting actions such as a drop rule and an alert rule both match on the same packet.
Collect and Aggregate Data
SIEM tools collect event data from various data sources.
Pipes
SPL also uses the pipe character | to separate the individual commands in the search. It's also used to chain commands together so that the output of one command combines into the next command. This is useful because you can refine data in various ways to get the results you need using a single command. Here is an example of two commands that are piped together: index=main fail| chart count by host index=main fail: This is the beginning of the search command that tells Splunk to retrieve events from an index named main for events containing the search term fail. |: The pipe character separates and chains the two commands index=main and chart count by host. This means that the output of the first command index=main is used as the input of the second command chart count by host. chart count by host: This command tells Splunk to transform the search results by creating a chart according to the count or number of events. The argument by host tells Splunk to list the events by host, which are the names of the devices the events come from. This command can be helpful in identifying hosts with excessive failure counts in an environment.
Disadvantages:
Signatures can be evaded: Signatures are unique, and attackers can modify their attack behaviors to bypass the signatures. For example, attackers can make slight modifications to malware code to alter its signature and avoid detection. Signatures require updates: Signature-based analysis relies on a database of signatures to detect threats. Each time a new exploit or attack is discovered, new signatures must be created and added to the signature database. Inability to detect unknown threats: Signature-based analysis relies on detecting known threats through signatures. Unknown threats can't be detected, such as new malware families or zero-day attacks, which are exploits that were previously unknown.
Search Processing Language (SPL)
Splunk's query language. Here is an example of a basic SPL search that is querying an index for a failed event: index=main fail index=main: This is the beginning of the search command that tells Splunk to retrieve events from an index named main. An index stores event data that's been collected and processed by Splunk. fail: This is the search term. This tells Splunk to return any event that contains the term fail. Knowing how to effectively use SPL has many benefits. It helps shorten the time it takes to return search results. It also helps you obtain the exact results you need from various data sources. SPL supports many different types of searches that are beyond the scope of this reading. If you would like to learn more about SPL, explore Splunk's Search Reference.
Syslog
Syslog is a standard for logging and transmitting data. It can be used to refer to any of its three different capabilities: Protocol: The syslog protocol is used to transport logs to a centralized log server for log management. It uses port 514 for plaintext logs and port 6514 for encrypted logs. Service: The syslog service acts as a log forwarding service that consolidates logs from multiple sources into a single location. The service works by receiving and then forwarding any syslog log entries to a remote server. Log format: The syslog log format is one of the most commonly used log formats that you will be focusing on. It is the native logging format used in Unix® systems. It consists of three components: a header, structured-data, and a message.
UDM Search (Unified Data Model)
The UDM Search is the default search type used in Chronicle. You can perform a UDM search by typing your search, clicking on "Search," and selecting "UDM Search." Through a UDM Search, Chronicle searches security data that has been ingested, parsed, and normalized. A UDM Search retrieves search results faster than a Raw Log Search because it searches through indexed and structured data that's normalized in UDM.A UDM Search retrieves events formatted in UDM and these events contain UDM fields. There are many different types of UDM fields that can be used to query for specific information from an event. Discussing all of these UDM fields is beyond the scope of this reading, but you can learn more about UDM fields by exploring Chronicle's UDM field list
Telemetry
The collection and transmission of data for analysis.
Header
The header contains details like the timestamp; the hostname, which is the name of the machine that sends the log; the application name; and the message ID. Timestamp: The timestamp in this example is 2022-03-21T01:11:11.003Z, where 2022-03-21 is the date in YYYY-MM-DD format. T is used to separate the date and the time. 01:11:11.003 is the 24-hour format of the time and includes the number of milliseconds 003. Z indicates the timezone, which is Coordinated Universal Time (UTC). Hostname: virtual.machine.com Application: evntslog Message ID: ID01
Message
The message contains a detailed log message about the event. Here, the message is: This is a log entry!.
What to Log
The most important aspect of log management is choosing what to log. Organizations are different, and their logging requirements can differ too. It's important to consider which log sources are most likely to contain the most useful information depending on your event of interest. This might be configuring log sources to reduce the amount of data they record, such as excluding excessive verbosity. Some information, including but not limited to phone numbers, email addresses, and names, form personally identifiable information (PII), which requires special handling and in some jurisdictions might not be possible to be logged.
Priority (PRI)
The priority (PRI) field indicates the urgency of the logged event and is contained with angle brackets. In this example, the priority value is <236> . Generally, the lower the priority level, the more urgent the event is. Note: Syslog headers can be combined with JSON, and XML formats. Custom log formats also exist.
Log Management
The process of collecting, storing, analyzing, and disposing of log data.
Log Analysis
The process of examining logs to identify events of interest.
Structured-data
The structured-data portion of the log entry contains additional logging information. This information is enclosed in square brackets and structured in key-value pairs. Here, there are three keys with corresponding values: [user@32473 iut="1" eventSource="Application" eventID="9999"].
Log Forwarders
There are many ways SIEM tools can ingest log data. For instance, you can manually upload data or use software to help collect data for log ingestion. Manually uploading data may be inefficient and time-consuming because networks can contain thousands of systems and devices. Hence, it's easier to use software that helps collect data. A common way that organizations collect log data is to use log forwarders. Log forwarders are software that automate the process of collecting and sending log data. Some operating systems have native log forwarders. If you are using an operating system that does not have a native log forwarder, you would need to install a third-party log forwarding software on a device. After installing it, you'd configure the software to specify which logs to forward and where to send them. For example, you can configure the logs to be sent to a SIEM tool. The SIEM tool would then process and normalize the data. This allows the data to be easily searched, explored, correlated, and analyzed. Note: Many SIEM tools utilize their own proprietary log forwarders. SIEM tools can also integrate with open-source log forwarders. Choosing the right log forwarder depends on many factors such as the specific requirements of your system or organization, compatibility with your existing infrastructure, and more.
Suricata Features
There are three main ways Suricata can be used: Intrusion detection system (IDS): As a network-based IDS, Suricata can monitor network traffic and alert on suspicious activities and intrusions. Suricata can also be set up as a host-based IDS to monitor the system and network activities of a single host like a computer. Intrusion prevention system (IPS): Suricata can also function as an intrusion prevention system (IPS) to detect and block malicious activity and traffic. Running Suricata in IPS mode requires additional configuration such as enabling IPS mode. Network security monitoring (NSM): In this mode, Suricata helps keep networks safe by producing and saving relevant network logs. Suricata can analyze live network traffic, existing packet capture files, and create and save full or conditional packet captures. This can be useful for forensics, incident response, and for testing signatures. For example, you can trigger an alert and capture the live network traffic to generate traffic logs, which you can then analyze to refine detection signatures.
Suricata Log Files
There are two log files that Suricata generates when alerts are triggered: eve.json: The eve.json file is the standard Suricata log file. This file contains detailed information and metadata about the events and alerts generated by Suricata stored in JSON format. For example, events in this file contain a unique identifier called flow_id which is used to correlate related logs or alerts to a single network flow, making it easier to analyze network traffic. The eve.json file is used for more detailed analysis and is considered to be a better file format for log parsing and SIEM log ingestion. fast.log: The fast.log file is used to record minimal alert information including basic IP address and port details about the network traffic. The fast.log file is used for basic logging and alerting and is considered a legacy file format and is not suitable for incident response or threat hunting tasks. The main difference between the eve.json file and the fast.log file is the level of detail that is recorded in each. The fast.log file records basic information, whereas the eve.json file contains additional verbose information.
Logs Contain
Time stamps System characteristics Action
Double Quotes
Used to enclose text data, which is also known as a string, for example: "Alert": "Malware". Data that contains numbers is not enclosed in quotes, like this: "Alert code": 1090.
eXtensible Markup Language (XML)
XML (eXtensible Markup Language) is a language and a format used for storing and transmitting data. XML is a native file format used in Windows systems. XML syntax uses the following: Tags Elements Attributes
Attributes
XML elements can also contain attributes. Attributes are used to provide additional information about elements. Attributes are included as the second part of the tag itself and must always be quoted using either single or double quotes. For example: <EventData> <Data Name='SubjectUserSid'>S-2-3-11-160321</Data> <Data Name='SubjectUserName'>JSMITH</Data> <Data Name='SubjectDomainName'>ADCOMP</Data> <Data Name='SubjectLogonId'>0x1cf1c12</Data> <Data Name='NewProcessId'>0x1404</Data> </EventData> In the first line for this example, the tag is <Data> and it uses the attribute Name='SubjectUserSid' to describe the data enclosed in the tag S-2-3-11-160321.
Elements
XML elements include both the data contained inside of a tag and the tags itself. All XML entries must contain at least one root element. Root elements contain other elements that sit underneath them, known as child elements. Here is an example: <Event> <EventID>4688</EventID> <Version>5</Version> </Event> In this example, <Event> is the root element and contains two child elements <EventID> and <Version>. There is data contained in each respective child element.
Tags
XML uses tags to store and identify data. Tags are pairs that must contain a start tag and an end tag. The start tag encloses data with angle brackets, for example <tag>, whereas the end of a tag encloses data with angle brackets and a forward slash like this: </tag>.
Wildcard
a special character that can be substituted with any other character. A wildcard is usually symbolized by an asterisk character *. Wildcards match characters in string values. In Splunk, the wildcard that you use depends on the command that you are using the wildcard with. Wildcards are useful because they can help find events that contain data that is similar but not entirely identical. Here is an example of using a wildcard to expand the search results for a search term: index=main fail* index=main: This command retrieves events from an index named main. fail*: The wildcard after fail represents any character. This tells Splunk to search for all possible endings that contain the term fail. This expands the search results to return any event that contains the term fail such as "failed" or "failure". Pro tip: Double quotations are used to specify a search for an exact phrase or string. For example, if you want to only search for events that contain the exact phrase login failure, you can enclose the phrase in double quotations "login failure". This search will match only events that contain the exact phrase login failure and not other events that contain the words failure or login separately.