Key Management

Miscellaneous notes
For background information about the specifics of dm-crypt, LUKS, and LVM crypt, see Disk encryption formats.

Smart card and TPM support is mostly unrelated to any of the notes below; perhaps we can use them to encrypt locally stored keys (e.g. for hard disk encryption, or for the locally-held disk image keys).

Please ignore my time estimates, and use your own judgment. I have been very bad at estimates so far (estimating too little, of course)&mdash;I'm using this as an opportunity to practice.

Change log

 * Apr 8: Added support for creating random "admin passphrases" and their escrow for disk encryption, to better support remote users who don't have access to a rescue CD.
 * Apr 9: Added virt-manager and virt-{clone,install,image} support to the action list and recommendations.
 * Apr 15-21:
 * Made terminology more consistent.
 * Added Disk encryption formats.
 * Added "Handling use cases" sections.
 * Added a Links section.
 * Added optional random passphrase generation to the initial key escrow operation, and a note that the file format should support storing more than one secret.
 * Added a note about escrowing dm-crypt keys vs. dm-crypt passphrases.
 * Expanded a note about storing the "simple" secret for virtualization guest disk image encryption, rather than the data encryption key.
 * Removed the suggestion that keys would automatically propagate among virt-manager instances, replaced by a prohibition of reading the key from a host.
 * Removed a question about key recovery features supported by RHCS; the question how to integrate remains.
 * Apr 27: Added a note to limit the number of escrow packets per management client to avoid a denial of service.
 * May 15: Added a note that the server certificate must be easily available to system administrators.

Disk Encryption Key Escrow
The goal is to allow recovering from a e.g. a lost password for an encrypted drive of a company computer, by storing the necessary data about the encryption "centrally" and allowing authorized people to use it to access the encrypted data.

The threat model assumes attacks against the encrypted drives by unauthorized users, and attacks against the central data storage. Users deleting or corrupting data they have legitimate access to are out of scope (in the typical case, the user can overwrite their own hard drive or smash it with a hammer).

In this section, user is someone who legitimately has access to a passphrase or key that allows using a volume, recovery is the operation of restoring access of a user by an administrator.

Features
If the volume format uses separate data encryption keys and key encryption keys, the system should store raw data encryption keys rather than passphrases that allow access to the data encryption keys (by deriving a key encryption key from the passphrase). Unlike a passphrase, a data encryption key can be used to help with data restoration if the area of the disk that stores password-protected keys is corrupted. Further, if the data encryption key is escrowed, the escrow does not need to be updated if the users wishes to use a different passphrase. The LUKS encrypted volume format only supports a limited number of passphrase slots, which could be used up by legitimate users, leaving no slot for the recovery passphrase.

Storing data encryption keys rather than passphrases means that it must be possible to boot the computer into a rescue mode without knowing the passphrase for the root volume. To simplify handling of remote users that forgot their passwords, an additional random LUKS password can be set up and escrowed together with the data encryption key; the random password is revealed to the user (perhaps letting the user to change the password), and the data encryption key is used to generate a new random password.

The system should allow on-line as well as off-line recovery data transfer: on-line is necessary for seamless system management (e.g. automated set up of many computers, server-initiated key changes), off-line is useful for cases when the company network is not available (or difficult to set up, e.g. during installation or recovery), or to allow recovery for remote employees (by sending recovery data over e-mail and the password necessary to use it over phone).

Off-line mode can also be used without any management server, allowing e.g. individuals to print the recovery data and store it in a safe; this will increase the number of potential users and contributors in the Linux community.

Client functionality

 * Escrow the data encryption key for a volume, given its passphrase
 * Using the passphrase, get the data encryption key, create an escrow packet and send it to the server or store it in a file.
 * Optionally generate and add a random volume passphrase, and store it in the escrow packet as well.
 * A combined "create a volume and escrow packet" operation is not really necessary - a script that supplies the volume passphrase twice is easy to write.


 * Add/replace a volume passphrase, given an escrow packet
 * Extract the data encryption key from the packet (asking for a packet passphrase if necessary), use it to add/replace a volume passphrase.
 * A variant of this generates and sets up a random volume passphrase, and stores it in another escrow packet.


 * Set up a volume, given an escrow packet
 * Extract the data encryption key from the packet (asking for a packet passphrase if necessary), use it to set up or mount the volume. This is intended to replace the usual low-level tools that ask for the volume passphrase, e.g.  ; the volume would be mounted using   as usual.


 * Replace the data encryption key.
 * (Requires underlying volume support for on-line re-encryption; this is planned for LVM.) Generate a new data encryption key, create an escrow packet for it, then start on-line re-encryption. If this operation is initiated by a management server, the server can then poll for re-encryption completion.

Server functionality

 * Make the certificate used for encrypting escrow packets available to system administrators and to management clients during their enrollment process.
 * Store escrow packets
 * The internal storage should be encrypted (using a server's master key).
 * Must support both on-line (client connection) and off-line (upload by administrator) operation.
 * On-line clients should have the ability to mark an escrow packet for a volume "obsolete" if a newer packet is stored on the server.
 * Limit the number of packets per client to, say, 1000, to avoid a denial of service attack.
 * Eventually we can support key splitting (storing parts of the volume data encryption keys on separate servers, requiring N of the servers to supply their part to recover the volume data encryption key); an attacker would have to compromise the master keys of N servers in order to access any volume data encryption key.


 * Make escrow packets available for recovery
 * The returned escrow packet must be encrypted (using an one-time password, or the client's machine private key). An one-time password mechanism must be supported, the machine private key is not available when attempting to recover the root partition.
 * Must support off-line operation (make a packet available for download). On-line operation (client asking for recovery data directly) is probably not necessary.
 * The operation should require manual administrator intervention (if the recovery is not initiated by an administrator, it should require administrator approval).  A possibility is to encrypt escrow packets by a public master key when storing them, and to ask for a passphrase of the private master key when reading them and preparing them for recovery: this means that a compromise of the server does not compromise any of the escrowed keys until the private maaster key passphrase is captured by an attacker.


 * Manage key lifetimes: depending on defined policy, ask clients to replace their data encryption keys and re-encrypt the volumes regularly.
 * Provide an administration interface:
 * List stored recovery data (e.g. by machine, machine group?)
 * Decrypt and view a recovery packet
 * Necessary to tell the user an escrowed random passphrase. Access to this functionality should be controlled exactly like making an escrow packet available for recovery.
 * Mark packets as obsolete
 * Schedule immediate data encryption key expiration/re-encryption (if a key compromise is suspected)
 * Delete escrow packets (a specific packet that should never have been stored, or obsolete data encryption keys e.g. after a few years, corresponding to the company's document retention policy)

Handling of use cases
See Disk_encryption_key_escrow_use_cases.

Implementation details
The escrow packet format is a collection of name-value pairs: machine identification (host name, perhaps machine certificate identifier), volume identification (UUID, label, perhaps a /dev/disk/by-id, /dev/disk/by-path link), volume encryption mechanism/type of stored key (LUKS, LVM, dm-crypt key, dm-crypt passphrase, LUKS pass phrase), encryption parameters (necessary for e.g. raw dm-crypt that does not store them in the volume), and the data encryption key. There's little reason to favor any particular representation over another; the KMIP data format can be used (to help integrate the data in larger key management system, assuming KMIP gains traction). It should be possible to store more than one "secret" in a single file (e.g. both a data encryption key and a LUKS passphrase): if this is not supported by KMIP directly, we can simply store two KMIP packets in the file.

dm-crypt can be used with a passphrase or by specifying the data encryption key directly. Escrowing the data encryption key should be supported; for user-friendliness, escrowing the passphrase should be supported as well: it's better to tell the user their passphrase than to tell them they'll have to use a 32-character key. On the other hand, the user's passphrase might be used in other contexts and storing it could be undesirable. In any case, the vast majority of newly created encrypted volumes will use LUKS (or perhaps LVM crypt) instead of raw dm-crypt, so dm-crypt support is not essential.

The escrow packets should never be stored in plaintext. If they are created to be stored at a server, they should be encrypted using the server's public key (using CMS); support for recovery packets encrypted using a machine private key is possible, but probably not essential.

For operation without a server, or for recovery, the packets should be encrypted using a passphrase. CMS specifies a way to encrypt data using a password instead of a public-private key pair, but that is not supported by NSS nor OpenSSL; a home-grown system&mdash;or &mdash;can be used if adding support for password-based encryption to one of the crypto libraries becomes infeasible.

(It seems signing the escrow packet (neither by the client when creating it, nor by server when returning it) is not necessary: direct client access to the server would be authenticated by the client's machine certificate, which is equivalent to signing the packet, and a fake recovery data can at worst lead to a failed attempt to mount the volume.)

See also Disk encryption key escrow in IPA.

Affected packages

 * A new library and command-line tool: manipulating the volume data encryption keys, escrow packets, and communicating with the server
 * : Needs a better API to allow data encryption key manipulation
 * LVM: volume encryption support with more features than LUKS is planned
 * anaconda, pykickstart, system-config-kickstart: for creating escrow packets during installation (allowing the machine to register itself to the management system and escrow the keys in )
 * The escrow packet for each volume would probably be stored on the volume, making it the responsibility of the  script to use it and destroy it.


 * FirstAidKit: key recovery (using an USB stick at minimum, perhaps connecting to the server)
 * NSS or OpenSSL: perhaps enhancements to CMS to support password encryption
 * Management client: key escrow during enrollment, responding to key replacement requests, perhaps key recovery (if the client can start without mounting the partition in question)
 * The management server: See "Server functionality" above.

Can start now

 * KMIP packet library (3d): create/parse (a strictly specified subset of) KMIP data (4h+4h), encryption/decryption, both using certificates/private keys and passwords (1d+1d)
 * A single KMIP library will be shared with other key management applications.


 * libcryptsetup interface enchancement (6h): Get data encryption key given passphrase, add/replace passphrase given data encryption key, set up volume given data encryption key (2h+2h+2h, not counting upstream discusssion)
 * Minimal volume data encryption key escrow library, supporting LUKS (4h): Get data encryption key from volume, add/replace passphrase, add/replace random passphrase and escrow it, set up volume (1h+1h+1h+1h)
 * Command-line key escrow interface (1d)
 * Support for key escrow during installation and in kickstart files (1d)
 * Support for key recovery in FirstAidKit (1d)
 * dm-crypt support in key escrow library (10h): Get data encryption key from /etc/crypttab, add/replace data encryption key in /etc/crypttab, set up volume (4h+4h+2h)

Depends on LVM encryption support

 * LVM support in key escrow library (4d): Get data encryption key from volume, add/replace LVM key, set up volume, replace data encryption key and start re-encryption, check re-encryption status (1d+1d+1d+1d)
 * LVM programmatic interface, its capabilities and various key management modes are not defined yet.

Depends on system management architecture and infrastructure

 * Management client plug-in (1d)
 * Management server component:
 * Store escrow packets (1d)
 * Provide an interface for clients (1d)
 * Manage key lifetimes (1d)
 * Administration interface (28h): List clients, list escrow packets per client, show packet contents, mark specified packets obsolete, schedule immediate re-encryption, delete packets (4h+4h+8h+4h+4h+4h))

Recommendations

 * Implement the basic stand-alone features (full-featured command-line client, FirstAidKit) for LUKS volumes, to make sure we have a basic escrow solution for RHEL6.
 * Not sure about kickstart support: probably should try to include it even if management support is not ready. If management client enrollment from kickstart is possible, kickstart support for escrow is necessary.
 * Plan to implement the management client and server components when the architecture is defined.
 * Plan to support LVM encryption when it is ready.
 * dm-crypt support has probably lower priority (anaconda UI support strongly steers customers to LUKS).

Virtualization guest disk image encryption
The goal is to encrypt each disk image using a separate key, to manage access to information stored in the images even if they are all stored on a single storage device, there are many possible guests (that do not correspond to accounts on the storage device), or if traffic on the connection between hosts (used here to mean "virtualization hosts"="nodes") and storage can be read by others.

Guest disk image encryption is not directly related to disk encryption key escrow&mdash;escrow deals with keys that are available inside the guest, not outside, and there is no need to recover guest image keys (as long as the management infrastructure is usable). The only caveat is that when backing up virtualization guest disk images, it will be important to back up the management server's key database as well.

Features
Unlike escrow, where users want to choose their own passphrases and change them at will, the encryption of disk images is completely managed by tools. Instead of storing the data encryption keys, it makes more sense to handle the secret that is easiest to use rather than always the data encryption key (e.g. use the qcow2 password, which is currently supported in qemu, instead of patching qemu to support manipulating the data encryption key directly). The data encryption key can be stored in addition, to help with data recovery if a header is corrupted. Unqualified "key" below refers to "the secret that is easiest to use".

Each volume has its own key: per-pool keys would not provide the required host isolation, and per-guest keys would make it difficult to mount a volume from a different guest (e.g. when sharing a cluster file system among guests).

The key is stored on the management server, and also on the hosts that currently contain guests that access this volume (storing the key on the hosts is necessary to support guest autostart). Hosts store the keys together with configuration of guests that use the keys, to ensure the keys are erased when guests are deleted or migrated away.

Without a management server, each user account used for managing guests (e.g. running virt-manager or virt-install) has a separate key store.

Host functionality

 * Create a volume, given a specific key.
 * The host does not store the key after creating the volume.
 * If possible, the host should recognize encrypted volumes, and refuse to use them if a key is not supplied (this might be difficult, e.g. without out-of-band information it is ambiguous whether an image that starts with a LUKS header should be treated as an encrypted volume, or as a raw volume that is encrypted inside the guest; dm-crypt does not even have a header).


 * "Create"/define a guest configuration, given necessary keys.
 * Store the keys persistently, delete them when deleting the guest configuration.
 * Use the keys when starting the guest.


 * Re-encrypt a volume, given an old and new key.
 * Can be done "off-line" by copying the volume, as long as the volume is not used during the process.
 * "On-line" re-encryption requires currently unimplemented LVM crypt, it will probably make it impossible to migrate any guest that uses the volume to a different host during the process.
 * This can include converting between cleartext and encrypted volumes.

(The hosts should not provide a way to get the encryption key from the host over the network, to avoid the risk of escalating an unauthorized connection to a host into a disk image data compromise; perhaps extracting the key by users with local root access could be supported for disaster recovery.)

Management server functionality

 * Store keys for each managed volume (not for each guest), supply them to hosts where necessary.
 * Manage key lifetimes: depending on defined policy, generate new keys and ask hosts to re-encrypt when an old key expires.
 * All guest configuration stored on hosts must be updated afterwards.
 * Probably should allow re-encryption after migration (immediately or within a specified time interval) to prevent access to data from the old host.


 * Provide an administration interface:
 * A simple "encrypt this volume" check box during volume creation.
 * May or may not expose the specific cipher options in the UI.
 * Schedule immediate volume re-encryption (e.g. on suspected key compromise).
 * May support conversion between encrypted and cleartext volumes.

Handling of use cases
See Virt_guest_disk_image_encryption_use_cases.

Implementation details
Possible encrypted image formats: dm-crypt, LUKS or the future LVM crypto (all work on volumes, but can be backed by a file as well), qcow2 (built-in AES encryption). LUKS support is probably not necessary - using raw dm-crypt and storing volume metadata in the virtualization management system works just as well.

The key packet must contains encryption mechanism name (differentiating between "the key" and the data encryption key, if both are used), parameters (for dm-crypt or other formats that do not store the information) and the key. There's little reason to favor any particular representation over another; the KMIP data format can be used (to help integrate the data in larger key management system, assuming KMIP gains traction).

Both hosts and the management server should take reasonable care to store the keys encrypted (e.g. using a master key to encrypt the key storage), but in this case the required ease of use and necessity to automatically start managed hosts and guests will probably require lower security (storing the master key on disk must be supported, along with asking for it on startup or perhaps using the TPM or a smart card).

Key packet transfer is assumed to be protected by using TLS to communicate between the management server and the guests; this, in turn, assumes per-host machine private keys. If possible, the packets should be transferred separately from the XML configuration (e.g. defining more fields in the  RPC protocol), it would be difficult or costly to securely wipe all memory used by the XML creating/decoding code to store the key packet otherwise.

Affected packages

 * The virtualization managemenent server
 * qemu, cryptsetup, LVM: perhaps enhancements to better support automatic encrypted volume setup
 * the RPC compiler used for libvirt: perhaps some modification to ensure memory containing the key packet is wiped
 * qemu, cryptsetup, LVM: perhaps enhancements to better support automatic encrypted volume setup
 * the RPC compiler used for libvirt: perhaps some modification to ensure memory containing the key packet is wiped

Can start now

 * KMIP packet library (3d): create/parse (a strictly specified subset of) KMIP data (4h+4h), encryption/decryption, both using certificates/private keys and passwords (1d+1d)
 * A single KMIP library will be shared with other key management applications.


 * Minimal encryption support in libvirt (20h): transfer key data with guest configuration, store keys locally, create encrypted qcow2 volumes, supply passwords when starting guests (8h+4h+2h+6h)
 * Modify XDR compiler used by libvirt to wipe key memory (6h)
 * Support for encrypted volumes in virsh (1d)
 * Support for encrypted volumes in virt-manager and python-virtinst (36h): local key storage, supply keys with relevant operations, get keys from hosts, "encrypt this" checkbox when creating volumes, support in, support in  , support in   (4h+8h+4h+4h+4h+8h+4h)
 * Support for dm-crypt volumes in libvirt (8h)): create encrypted volumes, set up dm-crypt volumes when starting guests (4h+4h)

Depends on LVM encryption support

 * Support for LVM encrypted volumes in libvirt (3d): Create encrypted volumes, set up encrypted volumes when starting guests, replace key and re-encrypt/check re-encryption status (1d+1d+1d)

Depends on system management architecture and infrastructure

 * Management server component:
 * Key storage in management server (1d)
 * Supply keys with other libvirt operations (1d)
 * Manage key lifetimes (1d)
 * Administration interface (2d): "Encrypt this volume" checkbox in volume creation, detailed cipher configuration, immediate re-encryption (4h+8h+4h)

Recommendations

 * Implement qcow2 support in libvirt and virsh, preparing for the management server component.
 * Add support to virt-manager and python-virtinst; will not support managing the same host from more than one computer.
 * Plan to implement management server component when the architecture is defined.
 * Plan to support LVM encryption when it is ready.
 * dm-crypt support probably has lower priority (LVM will be more flexible, and most people don't need the detailed cipher control).

General asymmetric key management
Goal is to simplify or automate certificate and private key setup in various applications.

Opinion: Easy set up of e.g. company-wide certificate authorities is definitely desirable. Other than the machine private key, I'm not sure how much value does generic private key management have. If application/service-specific private keys are deployed on large enough scale to warrant management software, the scale probably warrants a configuration management/mass configuration software, which should be able to transfer private keys along with other configuration files, and the software problem is integration of configuration management with a CA rather than mass key deployment to applications.

Client operations

 * Add a certificate authority chain.
 * To a specific application, or "everywhere".


 * Remove a certificate authority chain.
 * From a specific application, or "everywhere".


 * Update/replace a local certificate revocation list (?)

(If private key management is required:)
 * Configure an application to use a particular private key.
 * The key is generated and signed by the CA "on the server", no manual administrator work on the client (e.g. entering any passphrases) is necessary.
 * The private key can be stored unencrypted, with a password that is stored in application configuration, or the application can prompt for a password: the application-specific code must know which modes are supported by the application in question.
 * Difficult to do in general, some application config files are rather complex&mdash;at worst we can specify expected key file names, overwrite the key and restart the application.


 * Replace an application's private key.
 * The operation can contain a certificate for the key being replaced, to avoid accidents.

Server operations

 * Add/remove a certificate authority chain
 * To a specific application, or "everywhere"; probably will apply on all machines in a "group"

(If private key management is required:)
 * Generate private keys, sign them by a CA, send them to clients.
 * Manage key lifetimes: generate new certificates or private keys, ask clients to use them
 * Should generate new certificates or private keys some time before old certificates expire, to allow seamless transition.


 * FIXME: what kind of administration interface is necessary?
 * Only key lifetime management?
 * Integrate with HA clustering, and automatically deploy the correct keys to all server instances?

Handling of use cases
For above-described operations on certificate authority chains:
 * A system administrator initiates the action (either on a system management server, which forwards the request to the desired clients, or manually on each client using a command-line client), optionally specifying a single affected application.
 * The client-side client loads the relevant plugins and performs the action.

Specifics of private key management depend on the way the CA and configuration management software are integrated or connected.

Implementation
All operations are initiated from the server, the client is assumed to be identified by some generic management application; if this provides a machine certificate, TLS can be used to protect keys in transit.

Standard transfer formats (DER PKCS#7 for certificates, PKCS#12 for certificate + private key) can be used for data transfer; additional client and service identification can be tracked by the server or provided for logging.

Implementation would consist of a simple framework (receiving/sending certificates, API) and application-specific plugins. The planned shared NSS database would be one of the plugins, presumably affecting all NSS applications.

If this mechanism is used to manage the client machine certificates, it needs to be forgiving of expired machine certificates (otherwise the client could be unable to refresh the machine certificate because the machine certificate needs refreshing)&mdash;perhaps expired machine private keys should be accepted if an immediate refresh is performed, as long as the expired private key was not revoked due to a suspected compromise. Similar concerns arise in connection with clients that are not running for a long time, especially with virtual machines.

FIXME: How could this integrate with the Red Hat Certificate System?

FIXME: The certificate/key management interface should cooperate, or at least not fight with, the generic configuration management platform.

Affected packages

 * A new library and command-line client: plugin interface and basic utilities.
 * Library plugins: shipped with the library, with applications they control, or separately from both.
 * Management client: using the library.
 * Management server: initiating operations.

Can start now

 * Common library/plugin interface (12h): plugin interface, certificate/key conversion between commonly used formats (4h+8h)
 * Command-line client (1d)
 * Application-specific plugins (4h each, assuming there are 5 or more plugins)

Depends on system management architecture and infrastructure

 * Management client plug-in (1d)
 * Management server component:
 * Certificate/key storage (1d)
 * CA certificate management (1d)
 * Private key management (1d)
 * Key lifetime management (1d)
 * Administration interface
 * Functionality is not clear.

Recommendations

 * Determine whether private key management is necessary&mdash;have I missed an important use case?
 * Implement the library, command-line client and a few plugins, publicize in community to try and get plugin contributions
 * Plan to implement management client and server components.

General symmetric key management
PKCS defines certificates and private keys, but there seems to be no established standard for symmetric key formats. It seems EKMI[1] is rather limited; it was set up to standardize format used by a single open source (Java) project, and the author of the project has left the group. KMIP is more general, people from various large corporations have contributed to the document, but its future is not quite clear.

Both proposed symmetric key management protocols focus on "smart applications" that use the server as a quite inactive key database: In addition to key storage, the major function of the database is to provide key usage policies and track key usage. The applications must already know who they are connecting with, which key (identified by a name or an unique ID) is necessary, and they must implement the key usage policies.

Opinion: It's not clear which RHEL applications require any central management for symmetric keys: keys used in RHEL servers are usually long-lived, or already have a key management protocol. A generic symmetric key management server for "smart applications" probably makes sense only as a service provided for an enterprise-wide application environment ("middleware infrastructure" - e.g. managing keys for communication between SOA components), not for RHN/IPA-like "management of clients".

"Management of clients"
For "management of clients" we can build something mostly similar to asymmetric key management, using a different format for key transfer (perhaps a KMIP subset).

Are there any significant users of symmetric cryptography in RHEL? I could find:
 * NTP
 * IPSec (but IKE is more scalable than manually managed keys)
 * Kerberos service principal keytabs
 * WEP keys (but NetworkManager stores data primarily per user, not per machine)

In most symmetric applications it is difficult to replace a key without breaking connections (even a perfectly timed key switch would probably require application restart).

"Smart applications"
KMIP quite directly implies a database schema, and specifies server's operations on the data. This leaves implementing the server, providing client libraries, and providing a mechanism to define key usage by applications; the mechanism should be a component of a larger system for connecting the applications.

Recommendations

 * Do nothing for RHEL/system management.
 * Follow JBoss plans to provide a server for "smart applications", help if necessary.