Recently at work, I came across an interesting method to handle encryption at scale called envelope encryption.
First of all, it increases security and helps you ease out the management of encryption keys. But it's also a highly recommended pattern by PCI-DSS (Security Standard for Credit Card Processing) and results in much stronger data privacy and data protection of Personally Identifiable Information (PII).
When we think of data there are 3 places we can think of encrypting data:
- At Rest - On hardware storage devices like on a disk or in your devices
- In Transit - In moving data between different locations like server to server through API calls
- In Use - While it's being used by a server (New concept and still being researched)
We will be dealing primarily with encryption at rest and envelope encryption is a popular pattern recommended for it.
So What is Envelope Encryption? 🤔
This is a type of encryption that involves encrypting your data with a Data Encryption Key, then encrypting the Data Encryption Key (DEK) with a Customer Master Keys (CMK). You then store both the encrypted data and the encrypted DEK alongside each other in the database. This practice of using a wrapping key to encrypt data keys is known as envelope encryption.
Like mentioned there are 2 keys you need to understand first before we see how the encryption process takes place:
- Customer Master Key (CMK)
- Data Encryption Key (DEK)
Customer Master Keys/Root Keys/Key Encryption Keys (CMK)
These are symmetric keys used to encrypt, decrypt, and re-encrypt data. It can also generate Data Encryption Keys that you can use outside of the KMS system. They follow the below:
- Access to these must be restricted to the least endpoints
- Access to these should be secured through ACL
- These keys must be stored in a location that is secure like a KMS of a Hardware Security Module (to comply with FIPS 140-2)
In systems like Google Cloud Key Management Service, you have a hierarchy of keys as seen below with more information to be found here.
Data Encryption Keys (DEK)
Data keys are encryption keys you can use to encrypt data, including large amounts of data and other data encryption keys. Unlike CMK's, which can't be downloaded, data keys are returned to you for use outside of the KMS. Some of the best practices for DEKs:
- Generate DEKs locally
- When stored, always ensure DEKs are encrypted at rest
- For easy access, store the DEK near the data that it encrypts
- Generate a new DEK every time you write the data. This means you don't need to rotate the DEKs
- Do not use the same DEK to encrypt data from two different users
- Use a strong algorithm such as 256-bit Advanced Encryption Standard (AES)
- API request is sent to KMS to generate Data key using CMK
- KMS returns a response with Plain Data key and Encrypted Data key (using CMK)
- Data is encrypted using Plain Data key
- Plain Data key is removed from memory
- Encrypted Data and Encrypted Data Key is packaged together as an envelope and stored
- Encrypted Data key is extracted from the envelope
- API request is sent to KMS using Encrypted Data key which has information about CMK to be used in KMS for decryption
- KMS returns a response with Plain Data Key
- Encrypted Data is decrypted using Plain Data key
- Plain Data Key is removed from memory
How is Envelope Encryption Different From Other Encryption Patterns? 🤔
Every service you build requires encryption at some point. This could be passwords or PII in a database, credentials for an external service, or even files in a filesystem.
You can easily handle some of these situations with a configuration file but they pose their own security risks like:
- Proper planning is needed to keep the data secure
- Multiple formats are present e.g - YAML, JSON and XML to name a few
- Exact storage locations may be hard-coded in the app, making deployment potentially problematic
- Parsing of the config files can be problematic.
You can encrypt data using a symmetric key but they suffer from a major issue which is Key Management.
You need to find a way to get the key to the party with whom you are sharing data and if someone gets their hands on a symmetric key, they can decrypt everything encrypted with that key.
You can encrypt data using Asymmetric Encryption which is considered as a standard now a days. Some of the cons of it are:
- It is a slow process which makes its not suitable for decrypting bulk messages
- When you lose your private key, your received messages will not be decrypted
- If your private key is identified by an attacker, all of your messages can be read by him/her
Some of the benefits offered by it are:
A combination of benefits from symmetric and asymmetric encryption - The data is encrypted using a DEK which follows symmetric encryption. The DEK is encrypted by a CMK which follows asymmetric encryption. By using asymmetric encryption, encrypted DEKs can be shared and unencrypted only by those with access to the CMK, mitigating the key exchange problem of symmetric algorithms.
Easier key management - Multiple DEKs can be encrypted under a singular root key and ease the management of keys in a KMS. You can also do more secure key maintenance by rotating your root keys, instead of rotating and re-encrypting all of your DEKs.
Data key protection - Because we encrypt the data key with the CMK, we don't have to worry about storing the encrypted data key. Thus, we can safely store the encrypted data key alongside the encrypted data.
Key Management Systems & Why it Works at Scale? 🤔
The biggest reason for Envelope Encryption and KMSs working at scale is Performance. Like we mentioned before Asymmetric Encryptions are typically slow and Symmetric Encryptions are very fast but the management of keys is the issue.
So in Envelope Encryption for a large quantity of data, you quickly encrypt it using symmetric encryption using a random key. Then just the key is encrypted using asymmetric encryption. This gives the benefits of asymmetric encryption, with the performance of symmetric encryption.
Key Management Systems like AWS KMS, Azure Key Vault, and Google Cloud Key Management Service gives you a fully managed service to store and manage encryption keys. These use envelope encryption internally, and they’re used by default in a lot of services that support encryption in cloud infrastructure providers like AWS, GCP, Azure, and others.
An ideal key management system should be highly available, it should control access to the master key(s), it should audit the key(s) usage, and finally, it should manage key(s) lifecycle.
Thus by having the above characteristics and by using envelope encryption internally, Key Management Systems are ideal to handle encryption at scale.
Envelope Encryption is one of the most trusted application security design patterns used at scale. It is the default encryption method used in services like AWS S3, GCP, and others.
Hopefully, this enables you to understand how you can encrypt/decrypt a large amount of data using the envelope encryption method at scale in a more trusted setup.
Thanks for reading! I really hope that you find this article useful. I invite you to participate in the discussion in the comments below, I'm always interested to know your thoughts and happy to answer any questions you might have in your mind. If you think this post was useful, please like the post to help promote this piece to others.
Thanks for reading! :)
This article leans heavily on the following material:
- Google Cloud Data Encryption - Jayendra's Cloud Certification Blog -
- AWS KMS concepts - AWS
- AWS KMS and Envelope Encryption - Manish Pandit
- Cloud Architecture Pattern: Envelope Encryption (or Digital Envelope) with Public Cloud Providers Part 1 - Nilay Parikh
- AWS KMS Envelope Encryption - Chirag Modi
- Protecting data with envelope encryption - IBM
- Envelope encryption - GCP
- Encryption at rest in Google Cloud - GCP
Did you find this article valuable?
Support Rohit Jacob Mathew by becoming a sponsor. Any amount is appreciated!