Design and Apply Data Security Technologies and Strategies
Encryption and Key Management
Security in the cloud would not be possible without encryption. The encrypted data is only secure as the keys. Kerchhoff's principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. Keys should be the highest classification level available in an organization. Keys should be protected at all stages of their lifecycle. Here are some hints:
- Create strong, random keys
- Store keys in a secure manner
- Use keys securely
- Only share keys using secure features
- Archive keys that are no longer used but might be needed to retrieve archived data
- Destroy keys that are no longer needed in a secure manner
Encryption in the cloud can implemented at a variety of layers.
- Storage-level encryption: Data is encrypted by the CSP as it is written to storage, protecting the data if the physical devices are stolen. The CSP manages the keys and therefore may potentially be able to read the data.
- Volume-level encryption: Data is encrypted by the consumer as it is written to volumes connected to VMs. This protects the data from theft and prevents the CSP from reading the data, though an attacker can still read the data if they gain access to the instance.
- Object-level encryption: Object level data is encrypted as it is written to storage, though the CSP may control the keys and may have access to the data. It is recommended that the customer implements their own encryption and key management.
- File-level encryption: Data is encrypted by an application, such as Microsoft Office Apps or Adobe. The keys are managed by the application, usually through a password that the user enters or automated through IRM.
- Application-level encryption: Object level data is encrypted by the application. Many SaaS offerings provide a bring your own key ability.
- Database-level encryption: The database file may be encrypted or transparent encryption (which is provided by the database management system) to encrypt specific columns or the entire database. The keys are managed by the consumer.
Hashing
Hashing, also known as one-way encryption, passes any length of data through a hash function that creates a string of characters, or hash value. The same hash value will always be produced when passing in the same data. This is helpful for verifying the integrity of data, making sure that the data was not changed in any way.
Hashing can also be used to verify the source of a message. The sender calculates the hash of the date and encrypts the hash value with their private key. The receiver uses the senders public key to decrypt the hash value and compares that to their own calculation of the hash value of the data.
In a cloud environment hashing can be used to verify data such as backups or emails are accurate. File integrity monitoring also uses hashes to see if important files have been changed.
Masking
Data masking is when specific data is hidden for specific use cases. An example would be when viewing a stored credit card on a website, only the last four digits will be visible - even though the website needs the full credit card number to make purchases.
Tokenization
Tokenization is when sensitive data is represented by nonsensitive data. The most common use of Tokenization is for credit card numbers. An organization will securely pass the credit card information to a tokenization service, the tokenization service then creates a token and passes that on to the organization. Tokens can then be referenced by the organization to the tokenization service.
An implementation of tokenization may look like this:
- The user inputs sensitive information into an application
- The app securely sends the data to the tokenization service
- The sensitive data is securely stored in a database and a token is created that represents the data
- The token is then sent to the original app and is stored by the app
- When the sensitive data is needed, the app sends the token with authentication data to the tokenization service.
Data Loss Prevention (DLP)
Data loss prevention is a system that includes controls for detection, prevention, and correction. Detection identifies where sensitive data is stored and being used. Prevention enforces policies on the sensitive data for storage and sharing. Correction alerts when there is a policy violation.
A typical DLP system has three components:
- Discovery: helps an organization to find, identify, organize, and inventory data. The tool will usually perform a network scan to find fileshares, storage area networks. databases, and other storage locations.
- Monitoring: enables the organization to identify how the data is being used to prevent unauthorized use. The monitoring can take place at different data states.
- At-rest can spot policy violations such as sensitive data being stored in unauthorized locations. Additionally, monitoring can take place within databases to detect sensitive data stored in the inappropriate columns.
- In-motion can spot when sensitive data is being moved across the network, the monitoring tools need to be placed in the correct places within the network to see the data in transit, a user could set up a VPN tunnel to move data and a network monitoring solution would not be able to decrypt and look at the data. Agents installed on the system that stores the data can prevent such a scenario.
- In-use can monitor user endpoint to detect policy violations. This monitoring is through an agent installed on the endpoint.
- Enforcement: creates alerts when policy violations have occurred. An agent can be installed on the endpoint and prevent a user from mounting a USB, which could be used to exfiltrate data.
Deploying DLP within the cloud can be a challenge. Most organizations lack the ability to install agents, like in a SaaS or PaaS service model. A cost-benefit analysis should be performed to determine if the right DLP solution.
Data Obfuscation
Data obfuscation is used when sensitive data needs to be used in a situation like testing an app. The tester should not be able to view the sensitive data but needs to have data to test the app. There are multiple ways to perform data obfuscation:
- Substitution: is when sensitive data is swapped out for other data, often randomly. There are data sets that a tester can use online such as a random person generator.
- Shuffling: is when data is moved around. A tester can either shuffle the letters in a name so Steve becomes Etsve, or data within a database can be shuffled so Sally's address becomes Bob's.
- Value variance: performs mathematical changes to data. An algorithm can go through a data set and set a variance on each value, such as +/- 100.
- Deletion: replaces the data with null values, also known as nullification.
- Encryption: this method obfuscates the data and may not be very useful for testing. Homomorphic encryption is an emerging field that allows encrypted data to be processed without decryption.
Another form of obfuscation that can be used, especially in the cloud, is pseudo-anonymization or pseudonymization. This is when a dataset has sensitive data (such as PII) removed and replaced with an index value. The nonsensitive data can be uploaded and stored in the cloud without PII. When the data is retrieved the index value is used to replace the PII.
Data De-identification
Personal details can be used to identify people from data in two ways. The first way is through direct identifiers, this is PII of individuals. The second way is through indirect identifiers, which is when information is combined to identify an individual. Data de-identification is when these identifiers are anonymized so the individuals cannot be identified. Of course direct identifiers are simpler to anonymize than indirect identifiers.
Most privacy regulations require data anonymization for any PII that is used outside if production systems and environments.