PL

VoodooPad Document Encryption

1. Introduction

As documented in the Cryptography Overview, VoodooPad supports two modes of document encryption:

  • Full document encryption: All resources within a document are fully encrypted and unreadable without the document password.
  • Single page encryption: User-specified pages are encrypted and unreadable without a per-page password; multiple passwords may be used for different pages, and the document may contain both encrypted and unencrypted data.

This specification defines the file formats, keying mechanisms, and algorithm parameters that comprise VoodooPad Document Encryption (VDE), supporting both full-document and single-page encryption.

2. Cryptographic Primitives

Encryption and authentication of document data is performed using the ETM-AEAD cipher, defined in the ETM-AEAD-v1 specification.

ETM-AEAD combines an unauthenticated block cipher (AES-CBC with PKCS#7 padding) with encrypt-than-MAC HMAC authentication; it was designed to use primitives (AES-CBC and HMAC) available in all platform cryptography libraries.

2.1 Algorithm Parameters

The ETM-AEAD algorithm specifies the use of AES-CBC and HMAC; the choice of AES key size, HMAC secure hashing algorithm, and keying mechanisms are left to the implementor.

In this section, we specify the exact algorithms, parameters, and keying mechanisms used by VoodooPad Document Encryption (VDE) in conjunction with ETM-AEAD.

  • AES Key Size: 256 bits.
  • HMAC and HKDF Hash Algorithm: SHA-256
  • Key Derivation: PBKDF2 and HKDF.

These choices are based on their industry prevalence, availability of implementation, and NIST recommendation(s) and standards applicable to VoodooPad's use-cases.

2.2 Key Generation and Derivation

VDE's keying mechanism are designed to conform to NIST's Recommendation for Password-Based Key Derivation (SP-800-132) and Recommendation for Key Derivation Using Pseudorandom Functions (SP-800-108). VDE currently supports derivation or generation of keying material via three mechanisms:

  • Password-Based Key Derivation: A User-supplied Password (USP) may be used to derive keying material via a NIST-approved password-based key derivation mechanism.
  • Non-Password Key Derivation: Additional sub-keys may be generated from existing keying material via a NIST-approved key derivation mechanism, or through the use of disjoint (non-overlapping) segments of an existing key (as specified in NIST SP-800-108 Section 7.3, Converting Keying Material to Cryptographic Keys).
  • Random Keys: The platform's secure PRNG may be used to generate keying material. As per the recommendations of NIST SP-800-132 Section 5.4, Option 2A, the resulting key must be encrypted and authenticated when stored.

Any single VDE key MUST be used for one and only one purpose, i.e., one of:

  • Key derivation of one or more sub-keys using a single key derivation mechanism.
  • Encryption
  • MAC

2.2.1 Password-Based Key Derivation

VDE password-based key derivation is performed using PBKDF2-HMAC. Since PBKDF2 operates on an byte-level representation of a password, a stable normalization mechanism is required to produce a byte-stable representation of a given password's string representation.

When performing password-based key derivation, Unicode TR15's Normalization Process for Stabilized Strings must be used to generate a Stable Password Encoding (*SPE*) suitable for use with PBKDF2:

  • The password string is validated to ensure that it does not contain any Unicode code points for which the General_Category attribute is Unassigned, as per the latest version of the Unicode Standard supported by VoodooPad. If such invalid code points are found, the password is rejected as invalid.
  • The password is normalized as per Unicode Normalization Form D, defined in Unicode TR15.
  • The normalized Unicode password is encoded as a UTF-8 byte string, generating a stable password encoding. Neither a trailing NUL nor a Unicode byte-order-mark are included in the stable password encoding.

Given the normalized, UTF-8 encoded password, key derivation may then be performed via PBKDF2-HMAC:

  • Input: The stable password including serves as the input secret for PBKDF2.
  • Salt: A randomly generated 256-bit value (via the platform's secure PRNG) serves as the PBKDF2 salt. This meets the recommended minimum requirement of 128-bit salt as specified in NIST SP-800-132.
  • Rounds: PBKDF2 is executed with an enforced minimum of 40k rounds, resulting in an approximately 1 second runtime on an iPhone 4S.

A future update to this specification may include support for automatic upwards adjustment of PBKDF2 rounds based on the results of local device calibration, as to help ensure that VDE password-based key derivation maintains pace with hardware advances. This feature has been excluded from this release, as further consideration must be given to the result of local calibration when sharing documents across hardware with widely disparate performance profiles, such as a desktop computer and mobile device.

Additionally, we plan to investigate the use of scrypt, which is designed to be both computationally and memory intensive, increasing the costs involved in GPU or specialized ASIC-based password cracking.

Implementation Note: Safely Deriving Two Keys

It is important to note that we intentionally do not derive more data than is "natively" available via PBKDF2-HMAC's block size; to do so would increase our total work cost, but not necessarily that of an attacker. To understand why this is one must look at PBKDF2.

To derive additional blocks, PBKDF2 runs the full key derivation with the block's index (0, 1 ...) concatenated with the salt. Derivation of later blocks is not dependent on the output of earlier derivations, which means that each block of output can be derived independently of the others.

If we used PBKDF2-HMAC-SHA256 and requested 512-bit of output, PBKDF2 would thus be run twice, concatenating the 256 bits generated from each block of output to produce the final 512-bit result. This would double our work load by doubling the number of key derivations that must be run, requiring that we decrease the total number of PBKDF2 iterations to keep the CPU cost within acceptable bounds.

An attacker, however, does not necessarily need both of the keys. If we use only the first 256 bits for an encryption key, and the latter 256 bits for an authentication key, the attacker can simply ignore the second round used to derive the authentication key, which is not needed to perform decryption. As a result, the attacker would be able to do half as much work as a legitimate user — a significant improvement in the runtime of a brute-force password cracker.

As such, we're careful to request no more than the native block size when using PBKDF2 key derivation in VDE.

2.2.2 Non-Password Key Derivation

Non-password key derivation is used to derive multiple keys from an existing source of cryptographically strong keying material; appropriate source material includes keys generated from a secure PRNG or derived from PBKDF2.

There are two mechanisms that may be used in VDE to derive multiple keys from an existing source of cryptographically strong keying material:

  • Non-overlapping segments of an existing key may be used to derive 2 or more keys, limited by the size of the original keying material.
  • HKDF may be used to derive any number of sub-keys.

Both mechanisms are used in different contexts; the use of non-overlapping segments is primarily useful for allowing derivation of both an encryption key and an HMAC secret for use with ETM-AEAD.

Extraction of Disjoint Key Material (DKM)

Given input key material of length n, two or more keys (or other types of secret parameters, such as a secret IV) may be derived from the input key by extracting disjoint (**non-overlapping**) segments of the key material, not to exceed n in total length. This conforms to the recommendations in NIST SP-800-108 Section 7.3, Converting Keying Material to Cryptographic Keys.

For example, to derive both an encryption and authentication key from an existing 512-bit key:

  • The first 256 bits may be used directly as a 256-bit symmetric encryption key.
  • The latter 256 bits may be used directly as a 256-bit authentication (e.g., HMAC) secret.

Note the following requirements, as per NIST SP-800-108:

  • The segments from which keys are derived must not overlap; that is, no element of the source keying material may be used for more than a single sub-key.
  • As implied by the first requirement, the total length of the derived sub-keys must not exceed the total length of the source keying material.
HMAC-based Extract-and-Expand Key Derivation Function (HKDF)

HKDF may be used to derive an arbitrary number of sub-keys from existing cryptographically strong keying material; appropriate source material includes keys generated from a secure PRNG or derived from PBKDF2.

VDE extends the HKDF specification with the following implementation requirements:

  • A non-empty salt value is required; as per the recommendations in RFC 5869, the salt length MUST be equal to the HKDF hash function's output length, and a single salt MAY be re-used deriving multiple keys from a single set of source keying material. The salt value may either be randomly generated, in which case it may be considered a public value, or it may be derived using a VDE-approved KDF.
  • The HKDF input key material must be generated using a VDE-approved key generation or derivation mechanism. Additionally, any non-derived keys (such as randomly generated keys) used as input key material must be protected using an authenticated encryption scheme — see also Section 2.2.3, Generated Random Keys.
  • A non-empty info parameter must be supplied to HKDF; this may be any arbitrary set of octets, but the value MUST be unique for a given sub-key instantiation as to guarantee that sub-key values are not re-used.

2.2.3 Generated Random Keys

A VDE key may be generated directly from a cryptographically strong (P)RNG; the resulting key must be protected via an authenticated encryption scheme.

3. VDE Serialization

This section defines VDE's key and data serialization formats and on-disk storage mechanisms. VDE item serialization is optimized for use in file-based storage of atomically updated byte streams, in a manner suitable for use via VoodooPad's existing document format. VDE serialization leverages the ETM-AEAD algorithm and serialization mechanism as defined by the ETM-AEAD v1 Specification.

VDE item serialization is designed such that:

  • When full document encryption is used, a new Data Protection Key (*DPK*) must be generated randomly for every encryption operation; the DPK is to be protected with a key securely derived from the document's current password. This ensures that access to a previous version of a document does not expose decryption keys that may be used to decrypt later versions of that document.
  • All data necessary to perform decryption and authentication of a file — including the DPK — is stored within that file; no interdependent state exists across multiple files. This is necessary to support the use of asynchronous distributed file synchronization systems, such as Dropbox, where atomicity is only guaranteed at the level of individual files.
  • Any metadata that may require updating is appended to the serialized output, ensuring that it can be replaced with additional metadata without requiring rewriting of the entire file.

3.1 Full Document Encryption

To implement full encryption of all document data, it is necessary that encryption and authentication be performed on all document resources. The following is an itemized list of resource types within a VoodooPad document:

  • Document metadata, located at the top-level of the document bundle:
    • tags.plist: Page tags.
    • properties.plist: Document-level properties and preferences.
    • storeinfo.plist: Document format metadata, including the document's UUID, compatibility version, and legacy VoodooPad encryption metadata.
    • collections.plist: Page collections (represents the relationship between pages and hierarchical collections).
    • vde.plist: Encryption session and parameter information for full-document cryptography (refer to section 3.1.1).
  • Page data, located in a pool of subdirectories within pages/[a-z0-9]/:
    • <page uuid>.plist: Per-page metadata
    • <page uuid>: Page data

The session data in vde.plist contains only non-private key derivation parameters, and is both unencrypted and unauthenticated. All other resources are encrypted and authenticated using the Item Encryption mechanism defined in Section 3.3.

Warning: Locally-generated unique resource identifiers (UUIDs) are exposed to potential attackers via the document's page resource file names. While these identifiers are randomly generated and should not expose significant information about a document's contents, we still consider this an information leak to be addressed in a future (and necessarily non-backwards-compatible) iteration of the VoodooPad document format.

In addition to the document resources described above, VoodooPad maintains a local cache of document data. This data is stored within a document-unique subdirectory of ~/Library/Caches/com.flyingmeat.VoodooPad5; the document-unique cache directory name is derived using the mechanism specified in Appendix B.

The document-specific cache directory contains:

  • sk.index: A SearchKit-generated index over all pages within a VoodooPad document.
  • store.vpsqlite: An SQLite-based cache of document and page metadata.

The sk.index search index is encrypted and authenticated using the Item Encryption mechanism defined in Section 3.3.

The store.vpsqlite index is encrypted and authenticated with SQLite SEE's implementation of AES-128-CCM. For an overview of SQLite SEE, refer to VoodooPad Crypto Overview, Appendix A.

3.1.1 Initial Keying and Encryption

To perform full document encryption, multiple keys are first derived from the user-supplied document password:

  • A 512-bit Document Master Key (*DMK*) is derived from the user-supplied password via PBKDF2-SHA512, using a randomly generated salt (see Section 2.2.1). The DMK is only used as input to HKDF to generate additional subkeys.
  • Two sub-keys are generated from the DMK via HKDF-SHA256, using the DMK as the HKDF input key material, a randomly generated salt (see Section 2.2.2), and an ASCII, non-NUL-terminated HKDF info parameter:
    • Info 'MK-SUBKEY' - Used to encrypt and authenticate randomly generated data protection sub-keys.
    • Info 'MK-SQLITE' - Used with SQLite SEE to encrypt and authenticate the document data cache.
  • Data Protection Keys (*DPK*) are used to perform encryption and authentication of document data (other than SQLite). These keys are randomly generated, never re-used, and encrypted and authenticated using the MK-SUBKEY key.

The public parameters necessary to perform key derivation of the DMK and its subkeys are stored in an unencrypted and unauthenticated binary property list, vde.plist, located at the top-level of the VoodooPad document directory. This property list contains a top-level dictionary, and the structure shall be defined as follows (expressed in plist pseudo-code):

{
    "compat_version" : NSNumber
    "feature_version" : NSNumber

    "kdf" : {
        "pbkdf2_salt" : NSData
        "pbkdf2_iterations" : NSNumber
        "hkdf_salt" : NSData
    }
}
  • The compatibility version is serialized as an NSNumber value with the key crypto.compat_version. The current compatibility version is 1, and will be incremented should an incompatible change to the vde.plist crypto serialization be adopted. As such, implementations MUST reject unknown compatibility versions.
  • The feature version is serialized as an NSNumber value with a key of crypto.feature_version. The current feature version is 1, and may be incremented to denote the adoption of new features; implementations MUST reject feature versions that are not equal to or greater than the specified compatibility version.
  • The DMK's PBKDF2 salt is serialized as an NSData value, with the key crypto.kdf.pbkdf2_salt

  • The DMK's PBKDF2 iteration count will be serialized as an NSNumber value with the key crypto.kdf.pbkdf2_iterations. If a document specifies an iteration count of less than 40k, the document should be immediately re-keyed with a newly derived MK, using a PBKDF2 iteration count of 40k or more. The minimum PBKDF2 iteration count may be increased in future updates.
  • The HKDF salt used to derive MK-SUBKEY and MK-SQLITE is serialized as an NSData value, with the key crypto.kdf.hkdf_salt.

These keys and parameters are then used to perform encryption of all resource types within the VoodoooPad Document:

  • A new, random DPK is generated for each resource to be encrypted (see Section 2.2.3, Generated Random Keys). The DPK — protected by the MK-SUBKEY key — is used as the VDE Item Encryption key (see Section 3.3) to perform encryption and authentication of the document resource.
  • The MK-SQLITE key is provided to SQLite, and is used to encrypt the store.vpsqlite document index.

3.1.2 Document Re-keying

Upon requesting a password change, a new set of keys are derived and the parameters are saved to vde.plist, as documented in Section 3.1.1. These new keys are then used to re-key or re-encrypt all document resources:

  • For each resource, the per-resource DPK is decrypted using the document's previous MK-SUBKEY key, and then re-encrypted using the new MK-SUBKEY key. The new DPK is written out to the encrypted resource file. Refer to Section 3.3, VDE Item Encryption, for more information on the serialization mechanism.
  • The MK-SQLITE key is provided to SQLite, and is used to perform immediate re-encryption of the store.vpsqlite document index.

The re-keying mechanism is intended to be resilient — with additional user input — to early termination and file-based synchronization issues. Should a document resource be found that cannot be decrypted with the current MK-SUBKEY key, the user must be prompted for a previous password (or passwords) with which the item may have been encrypted. If provided, the item can be immediately re-keyed as normal. If not provided, the resource should be treated as temporarily unavailable.

3.2 Single Page Encryption

Single page encryption supports the use of a per-page (rather than per-document) password, used to encrypt per-page resources within the document; the following is an itemized list of per-page resource types within a VoodooPad documents, and their usage and encryption requirements:

  • Page data, located in a pool of subdirectories within pages/[a-z0-9]/:
    • <page uuid>.plist: Per-page metadata
    • <page uuid>: Page data

In addition, locally cached data is stored within a document-unique subdirectory of ~/Library/Caches/com.flyingmeat.VoodooPad5. Each document stores cache files within a directory stored in this location, keyed to the document. The document-specific cache directory contains:

  • sk.index: A SearchKit-generated index over all pages within a VoodooPad document.
  • store.vpsqlite: An SQLite-based cache of document and page metadata.

All page resources are encrypted using the Item Encryption mechanism defined in Section 3.3, using a keying mechanism similar to that of full-document encryption:

  • A 512-bit Page Master Key (*PMK*) is derived from the user-supplied password via PBKDF2-SHA512, using a randomly generated salt (see Section 2.2.1). The PMK is only used as input to HKDF to generate an additional subkey.
  • A single sub-key, MK-SUBKEY is generated from the PMK via HKDF-SHA256, using the PMK as the HKDF input key material, a randomly generated salt (see Section 2.2.2), and an ASCII, non-NUL-terminated HKDF info parameter of 'MK-SUBKEY'.'
  • Page Protection Keys (*PPK*) are used to perform encryption and authentication of the actual page data. These keys are randomly generated, never re-used, and encrypted and authenticated using the MK-SUBKEY key.

These keys and parameters are then used to perform encryption of all per-page resources within the VoodoooPad Document:

  • A new, random PPK is generated for each resource to be encrypted (see Section 2.2.3, Generated Random Keys). The PPK — protected by the MK-SUBKEY key — is used as the VDE Item Encryption key (see Section 3.3) to perform encryption and authentication of the resource being encrypted.

Encrypted pages are excluded from the document search index. This prevents the exposure of encrypted data in the unencrypted search index.

Encrypted pages are included in the document's SQLite index in ~/Library/Caches, which – barring the use of full-document encryption – will be stored unencrypted and unauthenticated.

The following per-page data is exposed as unauthenticated plaintext through this index:

  • The page UUID.
  • The page name, both internal and display names.
  • Modification and creation dates.
  • Data type.
  • A version number, which starts at 1 and is incremented as changes to the item are saved.
  • Categories the page belongs to.
  • Page alias names.
  • User set page metadata ("Page Meta" in the inspector palette).

In addition, unique resource identifiers (UUIDs) are exposed to potential attackers via the document's page resource file names. These are considered a minor information leak, and will be addressed in a future and necessarily non-backwards-compatible iteration of the VoodooPad document format.

3.3 VDE Item Encryption

A single VDE item is comprised of three serialized elements:

  • At the beginning of a VDE item header is a file header. This identifies the file as a VDE-encrypted item, and provides compatibility and feature versions defining the behavior of the original encoder.
  • Following the header is the item's ETM-AEAD data. This section uses the ETM-AEAD serialization format as defined by the ETM-AEAD v1 Specification.
  • The last file section is the VDE_SESSION footer. This section is always placed at the trailing end of the VDE item data, and contains the necessary session parameters to perform decryption and/or derivation of the keying material used for the decryption and authentication of the ETM-AEAD-protected data.

The VDE item serialization format is presented in ABNF notation, as specified in RFC 5234. Terminal rules are specified in Appendix A; these are binary encoded, and all multi-byte integers MUST be stored in little-endian format.

A VDE item is encoded as follows:

vde_item = vde_header
           *OCTET       ; Opaque and implementation-defined padding.
           encrypted_data
           *OCTET       ; Opaque and implementation-defined padding.
           vde_session

The VDE item header structure is encoded as follows:

vde_header = magic              ; The VDE file header
             file_version
             vde_encrypted_sect
             vde_session_sect

file_magic = 'vpvde'                ; File 'magic', used to detect VP-VDE files
file_version = record_version       ; The compatibility and feature versions for this VDE
                                    ; item. The current compatibility version is 1.

vde_encrypted_sect = section        ; The location and length of the ETM-AEAD data,
                                    ; encoded as per the optional serialization
                                    ; mechanism defined in the ETM-AEAD-v1 Specification

vde_session_sect = section          ; The location and length of the vde_session data.

The final file section contains the VDE item session data, encoded as follows:

vde_session = session_version           ; Session data required to decrypt and
                                        ; authenticate this VDE item.
              session_params


session_version = record_version        ; The compatibility and feature versions for this
                                        ; VDE session data. Session data may be updated
                                        ; independently of the vde_header, and is thus
                                        ; independently versioned.
                                        ; The current compatibility version is 1.

session_params = mk_params              ; Keys and parameters used to encrypt this
                                        ; document: the *DPK* used for
                                        ; encryption/authentication, and the *PBKDF2* 
                 dpk                    ; and *HKDF* parameters required to derive the 
                                        ; *PMK* or *DMK* used to protect the *DPK*.


dpk = byte_array                        ; A data protection key, encoded as per the
                                        ; the ETM-AEAD-v1 specification.
                                        ;
                                        ; This must be be encrypted and authenticated
                                        ; using the *MK-SUBKEY*

mk_params = pbkdf2_params               ; Parameters required for derivation of a Document
            hkdf_params                 ; or Page Master Key, and its HKDF-derived subkeys.
                                        ;
                                        ; When using full-document encryption, this serves
                                        ; as a persistent copy of the `vde.plist`-defined 
                                        ; *DMK* PBKDF2 and HKDF parameters ; If the
                                        ; document is rekeyed and vde.plist overwritten
                                        ; without rekeying an individual resource, these
                                        ; cached parameters may be used -- in combination
                                        ; with the correct user-supplied password -- to
                                        ; recover the *DMK* used to protect this item.
                                        ;
                                        ; When using per-page encryption, these parameters
                                        ; serve as the canonical source for the *PMK*
                                        ; PBKDF2 and HKDF parameters.


hkdf_params = hkdf_salt                 ; HKDF parameters  
hkdf_salt = byte_array                  ; The HKDF salt used to derive subkeys.


pbkdf2_params = pbkdf2_iters            ; PBKDF2 parameters.
                pbkdf2_salt
pbkdf2_iters = UINT32                   ; PBKDF2 iterations.
pbkdf2_salt = byte_array                ; PBKDF2 salt.

Appendix A: ABNF Common Rules

This appendix defines a set of shared ABNF rules used in the definition of the VDE serialization formats. It extends the Core ABNF rules as defined in RFC 5234.

Encoding Notes:

  • All multi-byte integer rules require little endian encoding.
  • All string constants assume direct 8-bit ASCII encoding, without a trailing NUL.
  • Concatenation of ABNF terminals is performed directly; there is no padding of VDE serialization data unless otherwise specified.

ABNF Common Rules:

UINT8   = OCTET                 ; Unsigned 8-bit Integer
UINT16  = 2OCTET                ; Unsigned 16-bit Integer
UINT32  = 4OCTET                ; Unsigned 32-bit Integer
UINT64  = 8OCTET                ; Unsigned 64-bit Integer

record_version = record_compat_version  ; A compatibility and feature version pair. These
                                        ; specify the earliest serialization implementation
                 record_feature_version ; that can successfully decode the record, and the
                                        ; feature level of the encoder that produced the
                                        ; record, respectively.
                                        ;
                                        ; The feature version MUST be greater than or equal
                                        ; to the compatibility version.
                                        ;
                                        ; Implementors MUST ignore unknown feature_version
                                        ; values.
                                        ; Implementors MUST reject unknown compat_version
                                        ; values.

record_compat_version = UINT8    ; A record compatibility version number
record_feature_version = UINT8   ; A record feature version.

section = section_offset        ; An offset and length to a data section.
          section_length

section_offset = UINT64         ; Section offset in bytes, relative to the start of the
                                ; enclosing section declaration
section_length = UINT64         ; Section length in bytes.


byte_array = byte_length        ; Variable length byte array
             bytes

byte_length = UINT32            ; Length, in bytes, of a variable length byte array
bytes = 0*4294967296OCTET       ; Byte storage for a variable length byte array. Length
                                ; may not exceed the maximum value of byte_length (UINT32_MAX).

Appendix B: Computing Document-Unique Cache Directory Paths

VoodooPad maintains a local cache of document metadata (including a search index) in document-unique subdirectories of ~/Library/Caches/com.flyingmeat.VoodooPad5.

While the mechanism used to derive the document-unique path name is subject to change in future releases, we've included a description of this mechanism to aid in the examination of the document encryption implementation.

The relative document-unique cache directory name is computed as follows:

  • The VoodooPad document's absolute path is normalized as per the mechanism described in Apple's -[NSString stringByStandardizingPath].
  • The path is then normalized as per Unicode Normalization Form D, defined in Unicode TR15.
  • The normalized Unicode path is encoded as a NUL-terminated UTF-8 byte string, generating a stable path encoding. A Unicode byte-order-mark is not included in the stable path encoding.
  • A SHA-1 hash is computed over the normalized UTF-8 bytestring; the resulting hash is ASCII-encoded, using the lowercase [0-9a-f] alphabet.
  • A prefix of cache- is prepended to the ASCII-encoded SHA-1 hash, producing the local document-unique cache.