VoodooPad Document Encryption


1. Introduction

As documented in the Cryptography Overview, VoodooPad supports two modes of document encryption:



This specification defines the file formats, keying mechanisms, and algorithm parameters that comprise VoodooPad Document Encryption (VDE), supporting both full-document and single-page encryption.


2. Cryptographic Primitives

Encryption and authentication of document data is performed using the `ETM-AEAD` cipher, defined in the ETM-AEAD-v1 specification. ETM-AEAD combines an unauthenticated block cipher (AES-CBC with PKCS#7 padding) with encrypt-than-MAC HMAC authentication; it was designed to use primitives (AES-CBC and HMAC) available in all platform cryptography libraries.

2.1 Algorithm Parameters

The ETM-AEAD algorithm specifies the use of AES-CBC and HMAC; the choice of AES key size, HMAC secure hashing algorithm, and keying mechanisms are left to the implementor.

In this section, we specify the exact algorithms, parameters, and keying mechanisms used by VoodooPad Document Encryption (VDE) in conjunction with ETM-AEAD.



These choices are based on their industry prevalence, availability of implementation, and NIST recommendation(s) and standards applicable to VoodooPad's use-cases.


2.2 Key Generation and Derivation

VDE's keying mechanism are designed to conform to NIST's Recommendation for Password-Based Key Derivation (SP-800-132) and Recommendation for Key Derivation Using Pseudorandom Functions (SP-800-108). VDE currently supports derivation or generation of keying material via three mechanisms:



Any single VDE key MUST be used for one and only one purpose, i.e., one of:



2.2.1 Password-Based Key Derivation

VDE password-based key derivation is performed using PBKDF2-HMAC. Since PBKDF2 operates on an byte-level representation of a password, a stable normalization mechanism is required to produce a byte-stable representation of a given password's string representation.


When performing password-based key derivation, Unicode TR15's Normalization Process for Stabilized Strings must be used to generate a Stable Password Encoding (SPE) suitable for use with PBKDF2:



Given the normalized, UTF-8 encoded password, key derivation may then be performed via PBKDF2-HMAC:



A future update to this specification may include support for automatic upwards adjustment of PBKDF2 rounds based on the results of local device calibration, as to help ensure that VDE password-based key derivation maintains pace with hardware advances. This feature has been excluded from this release, as further consideration must be given to the result of local calibration when sharing documents across hardware with widely disparate performance profiles, such as a desktop computer and mobile device.


Additionally, we plan to investigate the use of scrypt, which is designed to be both computationally and memory intensive, increasing the costs involved in GPU or specialized ASIC-based password cracking.


Implementation Note: Safely Deriving Two Keys

It is important to note that we intentionally do not derive more data than is "natively" available via PBKDF2-HMAC's block size; to do so would increase our total work cost, but not necessarily that of an attacker. To understand why this is one must look at PBKDF2.


To derive additional blocks, PBKDF2 runs the full key derivation with the block's index (0, 1 ...) concatenated with the salt. Derivation of later blocks is not dependent on the output of earlier derivations, which means that each block of output can be derived independently of the others.


If we used PBKDF2-HMAC-SHA256 and requested 512-bit of output, PBKDF2 would thus be run twice, concatenating the 256 bits generated from each block of output to produce the final 512-bit result. This would double our work load by doubling the number of key derivations that must be run, requiring that we decrease the total number of PBKDF2 iterations to keep the CPU cost within acceptable bounds.


An attacker, however, does not necessarily need both of the keys. If we use only the first 256 bits for an encryption key, and the latter 256 bits for an authentication key, the attacker can simply ignore the second round used to derive the authentication key, which is not needed to perform decryption. As a result, the attacker would be able to do half as much work as a legitimate user -- a significant improvement in the runtime of a brute-force password cracker.


As such, we're careful to request no more than the native block size when using PBKDF2 key derivation in VDE.


2.2.2 Non-Password Key Derivation

Non-password key derivation is used to derive multiple keys from an existing source of cryptographically strong keying material; appropriate source material includes keys generated from a secure PRNG or derived from PBKDF2.


There are two mechanisms that may be used in VDE to derive multiple keys from an existing source of cryptographically strong keying material:



Both mechanisms are used in different contexts; the use of non-overlapping segments is primarily useful for allowing derivation of both an encryption key and an HMAC secret for use with ETM-AEAD.


Extraction of Disjoint Key Material (DKM)

Given input key material of length n, two or more keys (or other types of secret parameters, such as a secret IV) may be derived from the input key by extracting disjoint (non-overlapping) segments of the key material, not to exceed n in total length. This conforms to the recommendations in NIST SP-800-108 Section 7.3, Converting Keying Material to Cryptographic Keys.


For example, to derive both an encryption and authentication key from an existing 512-bit key:



Note the following requirements, as per NIST SP-800-108:



HMAC-based Extract-and-Expand Key Derivation Function (HKDF)

HKDF may be used to derive an arbitrary number of sub-keys from existing cryptographically strong keying material; appropriate source material includes keys generated from a secure PRNG or derived from PBKDF2.


VDE extends the HKDF specification with the following implementation requirements:



2.2.3 Generated Random Keys

A VDE key may be generated directly from a cryptographically strong (P)RNG; the resulting key must be protected via an authenticated encryption scheme.


3. VDE Serialization

This section defines VDE's key and data serialization formats and on-disk storage mechanisms. VDE item serialization is optimized for use in file-based storage of atomically updated byte streams, in a manner suitable for use via VoodooPad's existing document format. VDE serialization leverages the ETM-AEAD algorithm and serialization mechanism as defined by the ETM-AEAD v1 Specification.


VDE item serialization is designed such that:



3.1 Full Document Encryption

To implement full encryption of all document data, it is necessary that encryption and authentication be performed on all document resources. The following is an itemized list of resource types within a VoodooPad document:


The session data in vde.plist contains only non-private key derivation parameters, and is both unencrypted and unauthenticated. All other resources are encrypted and authenticated using the Item Encryption mechanism defined in Section 3.3.


Warning: Locally-generated unique resource identifiers (UUIDs) are exposed to potential attackers via the document's page resource file names. While these identifiers are randomly generated and should not expose significant information about a document's contents, we still consider this an information leak to be addressed in a future (and necessarily non-backwards-compatible) iteration of the VoodooPad document format.


In addition to the document resources described above, VoodooPad maintains a local cache of document data. This data is stored within a document-unique subdirectory of ~/Library/Caches/com.flyingmeat.VoodooPad5; the document-unique cache directory name is derived using the mechanism specified in Appendix B.

The document-specific cache directory contains:



The sk.index search index is encrypted and authenticated using the Item Encryption mechanism defined in Section 3.3.


The store.vpsqlite index is encrypted and authenticated with SQLite SEE's implementation of AES-128-CCM. For an overview of SQLite SEE, refer to VoodooPad Crypto Overview, Appendix A.


3.1.1 Initial Keying and Encryption

To perform full document encryption, multiple keys are first derived from the user-supplied document password:



The public parameters necessary to perform key derivation of the DMK and its subkeys are stored in an unencrypted and unauthenticated binary property list, vde.plist, located at the top-level of the VoodooPad document directory. This property list contains a top-level dictionary, and the structure shall be defined as follows (expressed in plist pseudo-code):


{

    "compat_version" : NSNumber

    "feature_version" : NSNumber


    "kdf" : {

        "pbkdf2_salt" : NSData

        "pbkdf2_iterations" : NSNumber

        "hkdf_salt" : NSData

    }

}



These keys and parameters are then used to perform encryption of all resource types within the VoodoooPad Document:



3.1.2 Document Re-keying

Upon requesting a password change, a new set of keys are derived and the parameters are saved to vde.plist, as documented in Section 3.1.1. These new keys are then used to re-key or re-encrypt all document resources:



The re-keying mechanism is intended to be resilient -- with additional user input -- to early termination and file-based synchronization issues. Should a document resource be found that cannot be decrypted with the current MK-SUBKEY key, the user must be prompted for a previous password (or passwords) with which the item may have been encrypted. If provided, the item can be immediately re-keyed as normal. If not provided, the resource should be treated as temporarily unavailable.


3.2 Single Page Encryption

Single page encryption supports the use of a per-page (rather than per-document) password, used to encrypt per-page resources within the document; the following is an itemized list of per-page resource types within a VoodooPad documents, and their usage and encryption requirements:



In addition, locally cached data is stored within a document-unique subdirectory of ~/Library/Caches/com.flyingmeat.VoodooPad5. Each document stores cache files within a directory stored in this location, keyed to the document. The document-specific cache directory contains:



All page resources are encrypted using the Item Encryption mechanism defined in Section 3.3, using a keying mechanism similar to that of full-document encryption:



These keys and parameters are then used to perform encryption of all per-page resources within the VoodoooPad Document:



Encrypted pages are excluded from the document search index. This prevents the exposure of encrypted data in the unencrypted search index.


Encrypted pages are included in the document's SQLite index in ~/Library/Caches, which – barring the use of full-document encryption – will be stored unencrypted and unauthenticated.


The following per-page data is exposed as unauthenticated plaintext through this index:



In addition, unique resource identifiers (UUIDs) are exposed to potential attackers via the document's page resource file names. These are considered a minor information leak, and will be addressed in a future and necessarily non-backwards-compatible iteration of the VoodooPad document format.


3.3 VDE Item Encryption


A single VDE item is comprised of three serialized elements:



The VDE item serialization format is presented in ABNF notation, as specified in RFC 5234. Terminal rules are specified in Appendix A; these are binary encoded, and all multi-byte integers MUST be stored in little-endian format.


A VDE item is encoded as follows:


vde_item = vde_header

           *OCTET       ; Opaque and implementation-defined padding.

           encrypted_data

           *OCTET       ; Opaque and implementation-defined padding.

           vde_session


The VDE item header structure is encoded as follows:


vde_header = magic              ; The VDE file header

             file_version

             vde_encrypted_sect

             vde_session_sect


file_magic = 'vpvde'                ; File 'magic', used to detect VP-VDE files

file_version = record_version       ; The compatibility and feature versions for this VDE

                                    ; item. The current compatibility version is 1.


vde_encrypted_sect = section        ; The location and length of the ETM-AEAD data,

                                    ; encoded as per the optional serialization

                                    ; mechanism defined in the ETM-AEAD-v1 Specification


vde_session_sect = section          ; The location and length of the vde_session data.


The final file section contains the VDE item session data, encoded as follows:


vde_session = session_version           ; Session data required to decrypt and

                                        ; authenticate this VDE item.

              session_params


session_version = record_version        ; The compatibility and feature versions for this

                                        ; VDE session data. Session data may be updated

                                        ; independently of the vde_header, and is thus

                                        ; independently versioned.

                                        ; The current compatibility version is 1.


session_params = mk_params              ; Keys and parameters used to encrypt this

                                        ; document: the *DPK* used for

                                        ; encryption/authentication, and the *PBKDF2* 

                 dpk                    ; and *HKDF* parameters required to derive the 

                                        ; *PMK* or *DMK* used to protect the *DPK*.


dpk = byte_array                        ; A data protection key, encoded as per the

                                        ; the ETM-AEAD-v1 specification.

                                        ;

                                        ; This must be be encrypted and authenticated

                                        ; using the *MK-SUBKEY*


mk_params = pbkdf2_params               ; Parameters required for derivation of a Document

            hkdf_params                 ; or Page Master Key, and its HKDF-derived subkeys.

                                        ;

                                        ; When using full-document encryption, this serves

                                        ; as a persistent copy of the `vde.plist`-defined 

                                        ; *DMK* PBKDF2 and HKDF parameters ; If the

                                        ; document is rekeyed and vde.plist overwritten

                                        ; without rekeying an individual resource, these

                                        ; cached parameters may be used -- in combination

                                        ; with the correct user-supplied password -- to

                                        ; recover the *DMK* used to protect this item.

                                        ;

                                        ; When using per-page encryption, these parameters

                                        ; serve as the canonical source for the *PMK*

                                        ; PBKDF2 and HKDF parameters.


hkdf_params = hkdf_salt                 ; HKDF parameters  

hkdf_salt = byte_array                  ; The HKDF salt used to derive subkeys.


pbkdf2_params = pbkdf2_iters            ; PBKDF2 parameters.

                pbkdf2_salt

pbkdf2_iters = UINT32                   ; PBKDF2 iterations.

pbkdf2_salt = byte_array                ; PBKDF2 salt.


Appendix A: ABNF Common Rules

This appendix defines a set of shared ABNF rules used in the definition of the VDE serialization formats. It extends the Core ABNF rules as defined in RFC 5234.


Encoding Notes:



ABNF Common Rules:


UINT8   = OCTET                 ; Unsigned 8-bit Integer

UINT16  = 2OCTET                ; Unsigned 16-bit Integer

UINT32  = 4OCTET                ; Unsigned 32-bit Integer

UINT64  = 8OCTET                ; Unsigned 64-bit Integer


record_version = record_compat_version  ; A compatibility and feature version pair. These

                                        ; specify the earliest serialization implementation

                 record_feature_version ; that can successfully decode the record, and the

                                        ; feature level of the encoder that produced the

                                        ; record, respectively.

                                        ;

                                        ; The feature version MUST be greater than or equal

                                        ; to the compatibility version.

                                        ;

                                        ; Implementors MUST ignore unknown feature_version

                                        ; values.

                                        ; Implementors MUST reject unknown compat_version

                                        ; values.


record_compat_version = UINT8    ; A record compatibility version number

record_feature_version = UINT8   ; A record feature version.


section = section_offset        ; An offset and length to a data section.

          section_length


section_offset = UINT64         ; Section offset in bytes, relative to the start of the

                                ; enclosing section declaration

section_length = UINT64         ; Section length in bytes.


byte_array = byte_length        ; Variable length byte array

             bytes


byte_length = UINT32            ; Length, in bytes, of a variable length byte array

bytes = 0*4294967296OCTET       ; Byte storage for a variable length byte array. Length

                                ; may not exceed the maximum value of byte_length (UINT32_MAX).


Appendix B: Computing Document-Unique Cache Directory Paths

VoodooPad maintains a local cache of document metadata (including a search index) in document-unique subdirectories of ~/Library/Caches/com.flyingmeat.VoodooPad5.


While the mechanism used to derive the document-unique path name is subject to change in future releases, we've included a description of this mechanism to aid in the examination of the document encryption implementation.


The relative document-unique cache directory name is computed as follows: