10 - Data Privacy and Protection
Digital rights management (DRM): Authorized viewers
A DRM file can also be locked to a particular type of software running on a general computing host, such as a customized PDF viewer or video player that prevent copying by other applications running on the same device. These use the same cryptographic mechanisms as hardware players, building a hash value to identify each computer, but protecting the software against abuse by other programs installed on the computer can be difficult.
General Data Protection Regulation (GDPR)
A European Commission regulation for the protection of data in the European Union. This regulation also regulates the flow of personal data outside the EU. Its main objective is to protect the privacy of citizens of the EU and unify the data regulation rules of the EU's member nations. Its rules will also apply to the police and military procedures of the members.
icacls command
A Windows command that displays or modifies discretionary access control lists (DACLs) on specified files, and applies stored DACLs to files in specified directories. If complex permissions have been configured, there will be a comma-separated list of individual permissions. Can also be used to set and remove permissions. You can use as part of a script (it can be invoked via PowerShell) to determine whether any permissions have been changed from a baseline configuration.
Service level agreement (SLA)
A contractual agreement setting out the detailed terms under which a service is provided. This can include terms for security access controls and risk assessments plus processing requirements for confidential and private data.
Digital rights management (DRM): Authorized players
Content can be locked to a particular type of device, such as a games console or a TV from an authorized vendor. The device will use a cryptographic key to identify itself as an authenticated playback device. Internet access to a licensing server may be required so that the device can update its activation status, revoke compromised keys, and check that its firmware has not been tampered with.
Data classification
Entails analyzing the data that the organization retains, determining its importance and value, and then assigning it to a category.
icacls permission: F
Full Access
The Committee of Sponsoring Organizations of the Treadway Commission (COSO)
Provides guidance on a variety of governance-related topics including fraud, controls, finance, and ethics. This framework defines risk and related common terminology, lists key components of risk management strategies, and supplies direction and criteria for enhancing risk management practices.
icacls permission: R
Read only access
Data Sovereignty
Refers to a jurisdiction preventing or restricting processing and storage from taking place on systems do not physically reside within that jurisdiction. May demand certain concessions on your part, such as using location-specific storage facilities in a cloud service. For example, GDPR protections are extended to any EU citizen while they are within EU or EEA (European Economic Area) borders.
The Federal Information Security Management Act (FISMA)
Requires federal organizations to adopt information assurance controls. It mandates the documentation of system information, the use of risk assessment, the use of security controls, and the adoption of continuous monitoring.
Purpose Limitation
Restricts the ability to transfer data to third parties. Tracking consent statements and keeping data usage in compliance with the consent granted is a significant management task. In organizations that process large amounts of personal data, technical tools that perform tagging and cross-referencing of personal data records will be required.
Digital rights management (DRM)
A family of technologies designed to mitigate the risks of customers and clients distributing unauthorized copies of content they have received. There are both hardware and software approaches
DLP Data Define: Statistical/lexicon
A further refinement of partial document matching is to use machine learning to analyze a range of data sources. The policy engine scans a range of source documents and performs statistical analysis to create a "vocabulary" or lexicon of the way sensitive data, such as a patient's medical notes or a patent application, is written.
What is EDM?
An Exact Data Match (EDM) is a database of strings of actual private data converted to fingerprints through a hash process. A data loss prevention (DLP) policy enforcer can match these fingerprints in user documents and messages and take the appropriate enforcement action.
Linux permission: Read (r)
The ability to access and view the contents of a file or list the contents of a directory.
Linux permission: Execute (x)
The ability to run a script, program, or other software file, or the ability to access a directory, execute a file from that directory, or perform a task on that directory, such as file search.
Linux permission: Write (w)
The ability to save changes to a file, or create, rename, and delete files in a directory (also requires execute).
DLP Remediation Mechanism: Alert only
The copying is allowed, but the management system records an incident and may alert an administrator.
Data classification: Confidential (or restricted)
The information is highly sensitive, for viewing only by approved persons within the organization (and possibly by trusted third parties under NDA).
Data classification: Secret
The information is too valuable to allow any risk of its capture. Viewing is severely restricted.
DLP Remediation Mechanism: Tombstone
The original file is quarantined and replaced with one describing the policy violation and how the user can release it again.
Data Minimization
The principle that data should only be processed and stored if that is necessary to perform the purpose for which it is collected. In order to prove compliance with the principle of data minimization, each process that uses personal data should be documented. The workflow can supply evidence of why processing and storage of a particular field or data point is required. Affects the data retention policy. It is necessary to track how long a data point has been stored for since it was collected and whether continued retention supports a legitimate processing function. Another impact is on test environments, where the principle forbids the use of real data records.
Deidentification Controls
The process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws.
DLP Remediation Mechanism: Block
The user is prevented from copying the original file but retains access to it. The user may or may not be alerted to the policy violation, but it will be logged as an incident by the management engine.
Data classification: Unclassified (public)
There are no restrictions on viewing the data. Presents no risk to an organization if it is disclosed but does present a risk if it is modified or not available.
Geographic Access Requirements
This falls into two different scenarios. Storage locations might have to be carefully selected to mitigate data sovereignty issues. Most cloud providers allow choice of data centers for processing and storage, ensuring that information is not illegally transferred from a particular privacy jurisdiction without consent. Employees needing access from multiple geographic locations. Cloud-based file and database services can apply constraint-based access controls to validate the user's geographic location before authorizing access.
Data classification: Top-Secret
This is the highest level of classification.
Encryption: Data in Use
This is the state when data is present in volatile memory, such as system RAM or CPU registers and cache. Examples of types of data that may be in use include documents open in a word processing application, database data that is currently being modified, event logs being generated while an operating system is running, and more. When a user works with data, that data usually needs to be decrypted as it goes from in rest to in use. The data may stay decrypted for an entire work session, which puts it at risk. Secure processing mechanisms such as Intel Software Guard Extensions (software.intel.com/en-us/sgx/details) are able to encrypt data as it exists in memory, so that an untrusted process cannot decode the information. This uses a secure enclave and requires a hardware root of trust.
Encryption: Data in Transit (or Data in Motion)
This is the state when data is transmitted over a network. Examples of types of data that may be in transit include website traffic, remote access traffic, data being synchronized between cloud repositories, and more. In this state, data can be protected by a transport encryption protocol, such as TLS or IPsec.
Data Ownership: Data Custodian
This role handles managing the system on which the data assets are stored. This includes responsibility for enforcing access control, encryption, and backup/recovery measures.
Data Ownership: Data Steward
This role is primarily responsible for data quality. This involves tasks such as ensuring data is labeled and identified with appropriate metadata and that data is collected and stored in a format and with values that comply with applicable laws and regulations.
Data Ownership: Privacy officer
This role is responsible for oversight of any PII/SPI/PHI assets managed by the company. Ensures that the processing and disclosure of private/personal data complies with legal and regulatory frameworks, such as purpose limitation/consent, data minimization, data sovereignty, and data retention.
Encryption: Data at Rest
This state means that the data is in persistent storage media. Examples of types of data that may be at rest include financial information stored in databases, archived audiovisual media, operational policies and other management documents, system configuration data, and more. In this state, it is usually possible to encrypt the data, using techniques such as whole disk encryption, database encryption, and file- or folder-level encryption.
Data Loss Prevention (DLP): Policy server
To configure classification, confidentiality, and privacy rules and policies, log incidents, and compile reports.
Data Loss Prevention (DLP): Endpoint agents
To enforce policy on client computers, even when they are not connected to the network.
Data Loss Prevention (DLP): Network agents
To scan communications at network borders and interface with web and messaging servers to enforce policy.
Data sharing and use agreement
Under privacy regulations such as GDPR or HIPAA, personal data can only be collected for a specific purpose. Datasets can be subject to pseudonymization or deidentification to remove personal data, but there are risks of reidentification if combined with other data sources. A legal means of preventing this risk. It can specify terms for the way a dataset can be analyzed and proscribe the use of reidentification techniques.
What is the process for reidentifying tokenized data?
Use the token server to look up the original value of the token.
Data classification: Classified (private/internal use only/official use only)
Viewing is restricted to authorized persons within the owner organization or to third parties under a non-disclosure agreement (NDA).
Watermarking
When the file is provisioned to the customer, the content server embeds a _______. This could be a visible ________ using an identifying feature of the customer, or it could be a digital ________, also called a forensic _________, encoded in the file. A digital ________ can defeat attempts at removal by cropping pages or images in the file. If the file is subsequently misused (by posting it to a file sharing site or reusing commercial photography on a different website for instance), a search tool can locate it, and the copyright owner can attempt enforcement action.
icacls permission: W
Write access
Linux user: o
all other users / world
Deidentification: Aggregation/Banding
Another deidentification technique is to generalize the data, such as substituting a specific age with a broader age band.
The Health Insurance Portability and Accountability Act (HIPAA)
Establishes several rules and regulations regarding healthcare in the United States. With the rise of electronic medical records, these standards have been implemented to protect the privacy of patient medical information through restricted access to medical records and regulations for sharing medical records. Visit hhs.gov for more information
True or false? Public information does not have any required security attributes.
False—while confidentiality is not an issue for publicly available information, integrity and availability are.
Data Retention: Short Term
Files and records that change frequently might need retaining for version control. This also important in recovering from security incidents. Consider the scenario where a backup is made on Monday, a file is infected with a virus on Tuesday, and when that file is backed up later on Tuesday, the copy made on Monday is overwritten. This means that there is no good means of restoring the uninfected file. Determined by how often the youngest media sets are overwritten.
Linux user: g
Group Account
Non-disclosure agreement (NDA)
Legal basis for protecting information assets. Used between companies and employees, between companies and contractors, and between two companies. If the employee or contractor breaks this agreement and does share such information, they may face legal consequences. Useful because they deter employees and contractors from violating the trust that an employee places in them.
chown command
Linux command is used to modify owners.
What is the effect of the following command: chmod 644 sql.log
Sets read and write permission for the owner and read permission for group and world on the file sql.log.
icacls permission: RX
Read and execute access
DLP Data Define: Policy Template
Contains dictionaries optimized for data points in a regulatory or legislative schema. A DLP solution will contain a number of templates designed for HIPAA, GDPR. For example, Microsoft's US PII template can match Individual Taxpayer Identification Numbers (ITINs), Social Security Numbers (SSNs), and passport numbers
icacls permission: M
Modify access
Data Loss Prevention (DLP)
A method of inspecting and keeping sensitive data from leaving the allowed perimeter. These systems are only concerned with the data passing over some kind of perimeter gateway device, such as through emails, instant messages and Web 2.0 applications. Key features: It is configurable with automated remediation. From a financial perspective, this can significantly reduce the expense associated with remediation. Automatic remediation may differ depending on the kind of activity involved. For instance, the user may opt to encrypt, quarantine, block and/or notify the sender in the event of an email. The majority of the functions mentioned earlier could be completed using a protected email product. It is able transfer data to a safe location if the data is found to be located in an unprotected area. It removes the need for manual user lookups through the use of LDAP server/active directory. This feature is common among all DLP manufacturers.
DLP Data Define: Classification
A rule might be based on a confidentiality classification tag or label attached to the data. Data could be tagged manually or using automated detection tools. The DLP solution might support other types of label, such as discrete data type, retention policy, and so on.
Data Ownership: Data Owner
A senior (executive) role with ultimate responsibility for maintaining the confidentiality, integrity, and availability of the information asset. Responsible for labeling the asset (such as determining who should have access and determining the asset's criticality and sensitivity) and ensuring that it is protected with appropriate controls (access control, backup, retention, and so forth). Typically selects a steward and custodian and directs their actions and sets the budget and resource allocation for sufficient controls.
DLP Data Define: Dictionary
A set of patterns that should be matched. Terms could be made from keywords or regex pattern matches. The use of patterns can be tuned within a rule to reduce false positives. For example, you might require a minimum number of instances of a pattern, look for the incidence of two or more patterns in close proximity, or define patterns with different confidence accuracy levels.
Personal Health Information (PHI)
A type of sensitive information regulated by the Health Insurance Portability and Accountability Act (HIPAA) Includes 18 identifiers: names dates phone numbers geographic data FAX numbers social security numbers email addresses medical record numbers account numbers health plan beneficiary numbers certificate/license numbers vehicle identifiers and serial numbers, including license plates web URLs device identifiers and serial numbers internet protocol addresses full-face photos and comparable images biometric identifiers any unique identifying number or code
DLP Data Define: Document Matching
A whole document can be matched using a fingerprint, but it is quite easy to modify a file so that it no longer matches the fingerprint. To compensate for this risk, partial document matching creates a series of hashes for overlapping parts of the document. These hashes can match content that has been copied from the document or used in a different order in another file.
DLP Remediation Mechanism: Quarantine
Access to the original file is denied to the user (or possibly any user). This might be accomplished by encrypting the file in place or by moving it to a quarantine area in the file system.
icacls permission: N
No Access
Access Controls
Can be applied to any type of data or software resource but is most strongly associated with network, file system, and database security. With file system security, each object in the file system has an ACL associated with it. The ACL contains a list of accounts (principals) allowed to access the resource and the permissions they have over it. Each record in the ACL is called an access control entry (ACE). The order of ACEs in the ACL is important in determining effective permissions for a given account. ACLs can be enforced by a file system that supports permissions, such as NTFS, ext3/ext4, or ZFS.
Deidentification: Data Masking
Can mean that all or part of the contents of a field are redacted, by substituting all character strings with "x" for example. A field might be partially redacted to preserve metadata for analysis purposes. For example, in a telephone number, the dialing prefix might be retained, but the subscriber number redacted. Can also use techniques to preserve the original format of the field. An irreversible deidentification technique.
Data Life Cycle
Classification of data as it is created or collected. Security of data as it is stored, including access controls and backup/recovery procedures. Management of data as it is distributed to data consumers. Retention or destruction of data.
Data Retention: Long Term
Data may need to be stored to meet legal requirements or to follow company policies or industry standards. Any data that must be retained in a version past the oldest sets should be moved to archive storage.
Which two non-technical controls for data privacy and protection have been omitted from the following list? Classification, ownership, retention, data types, retention standards, confidentiality, legal requirements, data minimization, non-disclosure agreement (NDA).
Data sovereignty refers to a jurisdiction preventing or restricting processing and storage from taking place on systems do not physically reside within that jurisdiction. Purpose limitation means that private/personal can only be collected for a defined purpose to which the data subject gives explicit consent.
Sensitive Personal Information (SPI)
Data that is related to but does not directly identify an individual — and may cause harm if it's made public. Includes personal information that reveals: a consumer's social security, driver's license, state identification card, or passport number account log-in, financial account, debit card, or credit card numbers in combination with any required security or access code, password, or credentials allowing access to an account precise geolocation racial or ethnic origin, religious or philosophical beliefs, or union membership the contents of a consumer's mail, email, and text messages — unless the business is the intended recipient of the communication
Interconnection security agreement (ISA)
Defined by NIST's SP800-4 Any federal agency interconnecting its IT system to a third party must create this to govern the relationship. Sets out a security risk awareness process and commits the agency and supplier to implementing security controls.
icacls permission: D
Delete access
The Sarbanes-Oxley Act (SOX)
Dictates requirements for the storage and retention of documents relating to an organization's financial and business operations, including the type of documents to be stored and their retention periods. It is relevant for any publicly traded company with a market value of at least $75 million.
Personally Identifiable Information (PII)
Information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc.
The Gramm-Leach-Bliley Act (GLBA)
Institutes requirements that help protect the privacy of an individual's financial information that is held by financial institutions and others, such as tax preparation companies. The privacy standards and rules created as part of safeguard private information and set penalties in the event of a violation. Also requires a coherent risk management and information security process.
Deidentification: Reidentification
It is important to note that given sufficient contextual information, a data subject can be reidentified, so great care must be taken when applying deidentification algorithms for distribution to different sources. A reidentification attack is one that combines a deidentified dataset with other data sources, such as public voter records, to discover how secure the deidentification method used is.
chmod command
Linux command is used to modify permissions. It can be used in symbolic mode or absolute mode. In symbolic mode, the command works as follows: chmod g+w, o-x home
Data Retention Standards
Mean that detailed policies must be created to show when and how to dispose of distinct types of data. It is important to include legal counsel in your organization's data retention policies, as not meeting requirements can bring about unwanted liability.
Deidentification: Tokenization
Means that all or part of data in a field is replaced with a randomly generated token. The token is stored with the original value on a token server or token vault, separate to the production database. An authorized query or app can retrieve the original value from the vault, if necessary, so tokenization is a reversible technique. Tokenization is used as a substitute for encryption, because from a regulatory perspective an encrypted field is the same value as the original data.
California Consumer Privacy Act (CCPA)
Offers people living in the state of California a degree of control over the data they generate and how internet companies and other entities use it. Internet companies, law enforcement agencies and marketing firms have used data containing consumers' location, online habits and recent purchases to transform their business and policy decisions for a good part of the last two decades. Companies often collect, buy, sell, share and transfer consumer data with little to no regulation throughout the United States.
Secure Disposal of Data Policy
Once the retention period for data has expired, the data must be disposed of, typically using some sort of secure erase process. Secure data disposal procedures are also important when it comes to repurposing or releasing computer equipment, storage devices, and cloud storage services and sites.
Linux user: u
Owner User
DLP Data Define: Exact Data Match (EDM)
Pattern matching can generate large numbers of false positives. This uses a structured database of string values to match. Each record in the source can contain a number of fields. The source is then converted to an index, which uses hashed forms of the strings (fingerprints) so that they can be loaded to the policy engine without compromising confidentiality or privacy issues. The rules engine can be configured to match as many fields as required from each indexed record. For example, a travel company might have a list of actual passport numbers of their customers. It is not appropriate to load these numbers into a DLP filter, so they could use EDM to match fingerprints of the numbers instead.