After Plausible Labs’ acquisition of VoodooPad, a cryptography audit was performed and VoodooPad’s document encryption implementation was found to use weak or improperly employed cryptographic primitives. The discovered issues include weak key derivation, use of known-weak ciphers, predictable RNG seeding, and improper IV re-use.
As a result, an attacker with access to an encrypted document may be able to decrypt the document’s contents without access to the original encryption password.
To resolve these issues, we’ve invested heavily in a complete redesign and rewrite of VoodooPad’s encryption implementation; as of VoodooPad 5.1.4:
- All VoodooPad releases now ship with an encryption implementation based on industry standards and best practices.
- VoodooPad will display a warning upon opening an insecurely encrypted document, and will optionally perform an in-place upgrade of the document’s encryption.
- We have published complete technical specifications documenting VoodooPad’s new encryption implementation; refer to “Additional Resources” below.
Due to the sweeping implementation changes that were required in VoodooPad’s document storage:
- Encrypted documents produced in VoodooPad 5.1.4 and later will not be readable in earlier releases of VoodooPad.
- We are initially releasing VoodooPad 5.1.4 as a public beta to allow for further testing while allowing affected customers to upgrade immediately. We do recommend that affected customers upgrade now.
I’m a VoodooPad user but I don’t use encryption. I didn’t even know VoodooPad had encryption! Do I have anything to worry about?
Nope! This issue only applies to documents using encryption.
I have some encrypted VoodooPad documents. How could this impact me?
Unfortunately, anyone that is able to gain access to your VoodooPad-encrypted documents can potentially decrypt any document that was encrypted with a previous VoodooPad release, even if they do not know the document password.
When using a cloud file or backup service, encrypted VoodooPad documents could be decrypted by anyone with access to your account (including the cloud service provider).
For documents stored locally on your computer, an attacker would require access to your computer, or access to your local files via another means — such as unencrypted backups.
OK — So how do I secure my documents?
The first step is to upgrade to the VoodooPad 5.1.4 Beta release. If you’re a Mac App Store customer, you’ll need to download the beta from the provided link instead of the App Store, but you won’t need to purchase a separate license. If you are an iOS user, reading securely encrypted documents will also require VoodooPad 5.1.4 for iOS, available as a free upgrade via the App Store.
Next, open your encrypted documents (or documents containing encrypted pages) with VoodooPad 5.1.4 on Mac OS X — VoodooPad will offer to upgrade the documents immediately in-place. An unmodified copy of your document will also be placed in the Trash — you may wish to empty the trash after you’ve verified that your document has been upgraded successfully.
Lastly, be aware that cloud services like Dropbox may store backup copies of files, and those backups may include an insecurely encrypted version of your document. You can request that Dropbox permanently delete a file, but just to be safe, we recommend saving a local backup of the file on your own computer first.
What if I’m using VoodooPad 4 or earlier?
Previous releases of VoodooPad used a different design for document and page encryption; unfortunately, this was also found to use weak or improperly employed cryptographic primitives. We recommend that all customers upgrade to VoodooPad 5.
Discounted upgrade pricing is available to direct-purchase customers via the Plausible Store. For Mac App Store customers, Apple does not support discount upgrade pricing via the Mac App Store – if you previously purchased VoodooPad 4 through the Mac App Store, please contact us directly to arrange for upgrade pricing.
We’ve published additional technical details on the design and implementation of VoodooPad 5.1.4′s document encryption implementation, including:
- VoodooPad Cryptography Overview - A high-level technical overview of how VoodooPad utilizes cryptography and the overall design of the VoodooPad’s document encryption.
- VoodooPad Cryptography Specification – The concrete specification of VoodooPad’s document encryption, including file formats, keying, ciphers, algorithms, and parameters.
- VoodooPad Encrypt-then-MAC AEAD Specification – Defines VoodooPad’s ETM-AEAD composition of AES-CBC, PKCS#7, and HMAC used for all authenticated encryption.
Seven years ago, we founded Plausible Labs as a worker-owned cooperative with the aim of building a company focused on sustainability, operating for the benefit of its owner-employees, and standing as a vibrant example of an alternative to the venture capital model in Silicon Valley.
I’m incredibly happy to announce that starting next year, for the first time in the co-operative’s history, our entire team will be focused on self-directed, self-funded product development. I look forward to sharing more about what we’re working on over the coming months, and the results of our labors over the coming years.
This has been a long road — founded with a $10,000 personal loan, we’ve neither borrowed money nor received investment since. The success of the co-operative has instead been entirely the product of the hard work of our employee-owners; including our current team of Terri Kramer, Chris Campbell, Mike Ash, and Rebecca Bratburd — all of whom I could not be more proud to work with. Relying heavily on consulting work, it took enormous dedication and stamina on everyone’s part to continue to invest our income, where we could, in our long-term goal of becoming a full-time product company.
We’ve accomplished a great deal to be proud of: early projects such as PLJukebox, a re-implementation of Apple’s Cover Flow that helped fund our initial growth, our consulting work on products such as comiXology’s mobile Comics applications, and on our ongoing stewardship of open source projects like PLCrashReporter, a library that powers most of the crash reporting services available for Apple’s platforms today.
Now we’ll demonstrate what Plausible Labs can do with time, resources, and a singular focus.
Yesterday we announced Plausible Crash Recovery, a working crash recovery system built on top of PLCrashReporter. Upon a crash, the recovery implementation steps backwards from the crashing function, restoring non-volatile register state and returning nil to the original caller.
Despite the fact that it was released on April 1st and indeed, was a prank, it does actually work. That doesn’t mean that using it is a good idea, though, and today I figured I’d explain how it works, why you don’t want to use it as-is, and where the underlying technology might actually be applicable.
If you haven’t already checked out the source and played with it yourself, give it a go. You can try plugging in some different crashing bugs of your own, and see how it behaves.
As a fair warning, I’m going to sacrifice some precision in my explanations below for the sake of overall clarity; there are a lot of details and edge cases that must be accounted for when implementing a crash reporter (or in this case, a crash recovery system), and if you’re interested in a digging in further, feel free to stop by and chat with us on the freenode #plcrashreporter IRC channel.
I know a number of folks assumed — like many of the other April Fools absurdities — that Plausible Crash Recovery didn’t actually work. Despite the fact that we bolted on a goofy UI, the code works as advertised, as befits a proper hack. The “restoration” UI, despite being totally unnecessary, even shows the actual steps taken to restore thread state; the only exception being the “Reticulating Splines” step – that part I made up.
The last time I was privy to a fun April Fools Day prank was back in the 90s, when some co-workers implemented a local man-in-the-middle attack on common stock ticker sites, proxying and adjusting the returned data for our .com to show a precipitous fall (or rise, I don’t recall which). What was fun about the prank wasn’t the actual effect it had on people — as I recall, nobody was seriously fooled, which was probably a good thing for all involved.
Rather, what made my coworker’s prank fun (for me, anyway) was that it was a good hack. It was the kind of wacky technical implementation that you can do when you decide that it’s OK to break the rules and see what neat ideas come of it. It reminded me of the ethos that drove the fabled MacHack conferences, the source of gems such as Quinn “The Eskimo!”‘s 2002 “Best Hack” winner, Firestarter, which demonstrated that just plugging in a firewire cable was enough to allow DMA writes to the target computer’s video buffer (in this case, displaying flames at the bottom of the screen).
So when it struck me that PLCrashReporter actually had the tooling necessary to implement a bad clone of a bad Visual Basic feature, actually implementing Crash Recovery seemed like a good April 1st hack — in the classic meaning of a hack.
Of course, like most hacks, the fact that it mostly works doesn’t mean you should actually use it.
Rolling Back Time
I often think of PLCrashReporter itself as a “time machine debugger” — it ideally provides a view into the past that can be used to reconstruct the state of the process and debug a long-past failure. Crash Recovery takes this time machine metaphor much further — using PLCrashReporter’s async-safe stack unwinding to step backwards from the crashing function, restoring non-volatile register state and returning nil to the original caller.
To understand how this works, we first need to understand the state that represents a thread of execution, and what parts of it must be rewound to return to the original caller — as well as what can’t be rewound, but really should be.
For a any given process, if you were to pause it at a moment in time — say, when a crash occurs — the crashed function’s execution state would be fully encapsulated in:
- Global State (including the heap, file descriptors, memory mappings, etc …)
- Thread Stack
- CPU Register State
If we want to restart execution in the crashed function’s caller, we need to work backwards from the current process state to restore as much of the caller’s previous state as we can.
Global state includes (but certainly isn’t limited to) the heap, file descriptors, shared data structures, and even the process’ current working directory. Any part of global state that is changed during execution of the crashing function represents modified state that must be rolled back if we wish to perfectly restore the thread to its pre-crashed state.
Unfortunately, restoring global state is a non-starter — there’s no way for us to know what was changed. For example, if the crashed thread has has corrupted the heap (or it was already corrupt), we can’t restore the heap to a non-corrupted state, and the application will likely just crash again. However, there are plenty of crashes that don’t involve corruption or mutation of global state, in which case we don’t need to restore any global state to allow execution to continue in the caller.
In other words, despite this limitation, we can actually recover from a large number of common crashes despite not having the facility to roll back global state. Of course, the less mutable shared state you use, the more recoverable your crashes are — funny as that might sound in the context of an April Fool’s hack, that’s actually the principal behind “fail fast” semantics often supported in functional programming languages. A failed thread can simply be discarded if there’s no chance it will leave behind partially modfied shared state, and the process state will remain fully consistent.
The thread’s stack maintains state for each called subroutine via a series of stack frames. At the time of the crash, the current stack frame is represented via the following state:
- The stack pointer points to the current top of the stack. Any new stack allocations will likewise adjust the stack pointer.
- The frame pointer usually – depending on the architecture, calling conventions, and emitted code — points to a fixed stack allocation that is at a fixed offset from the caller’s original stack pointer, and is used to store the caller’s return address and original frame pointer.
- The return address is the address to which the called function should return upon completion. Depending on the architecture and calling conventions, this may be stored in a register (as it is on ARM), or may be stored on the stack via the frame pointer (x86).
To restore the caller’s original stack, as well as to determine the code address at which we should restart execution to simulate a
return, we need to derive the caller’s stack pointer, frame pointer, and *return address *from the current thread’s stack state. If we’re able to successfully determine those original values, then we’ll have successfully restored the stack, as well as the execution address.
Of course, if the crashed function smashes any of this data, or the caller’s stack frame, or some other critical data on the stack, we can’t actually recover reliably; should we try, we’ll likely just trigger a secondary crash.
While that’s the basic premise, the actual process of performing the stack unwinding is a bit tricky. On some architectures (including 32-bit iOS and 32-bit Mac OS X), the frame pointer is almost always stored in a fixed register, and can be easily fetched from the crashed thread’s register state. The caller’s original stack state can simply be directly fetched or computed from the frame pointer register.
On other architectures, however, things aren’t so simple. On Mac OS X x86-64, for example, there is no requirement that the frame pointer be saved in a machine register. Instead, additional unwind data is provided by the compiler; this data defines how the caller’s state may be restored from the current thread state: values may be computed from existing registers, existing stack values, as fixed offsets, or through a variety of other mechanisms. This relates to how we restore register state, and we’ll cover how this works in more detail below.
The crashed thread’s register state represents the processor’s execution state at the time of the crash. During execution, the crashed function may have overwritten some of the caller’s register values; since the crashed function will never have the opportunity to restore those overwritten values, restoring the caller’s state will require that we somehow determine:
- Which registers are expected by the caller to have been preserved (ie, caller-preserved registers).
- Which of those registers have been modified and require restoration.
- How to actually restore the original values for those registers.
To answer the first question, we simply need to look at the platform’s defined calling conventions. For Apple’s platform, these are defined in the iOS ABI Function Call Guide and Mac OS X ABI Function Call Guide. The calling conventions define callee-preserved and caller-preserved registers:
- Callee-preserved registers (or, non-volatile registers) must be preserved by the called function, if it overwrites the caller’s original register value(s). These are the registers that we must restore, if they’ve been overwritten.
- Caller-preserved registers (or, volatile registers) must be preserved by the calling function, if it requires later access to those values. These registers may be freely overwritten, and do not need to be restored prior to returning to the caller.
This answers the first question, but we’re left with a connondrum — when execution stops in the middle of a crashing function, how do we know which non-volatile registers have been modified, and how do we know how to restore their original values?
Unfortunately, on Apple’s 32-bit platforms (ARM and i386), the answer is that we don’t. This information is not available, and we simply have to restore the stack state we can and hope that’s enough. Surprisingly, this actually works fairly often. It is, of course, a terrible idea, and one of many good reasons why “crash recovery” ought to be considered a hack, and not an actually useable product.
On Apple’s 64-bit platforms (x86-64 and ARM64), however, this information is provided via the same *unwind data *that allow us to pop the thread’s stack frame; we can interpret the unwind data at crash time to perform non-volatile register restoration.
Leveraging Unwind Data
Background: Exception Unwinding
We’ve already established that on 32-bit Apple platforms, our ability to unwind the stack is limited due to the lack of unwinding data. The reason for this actually has to do with how exceptions are handled on the platform. On 32-bit Apple systems, when a try/catch/finally block is declared, the current thread’s state is actually saved via setjmp() (or equivalent functionality), and pushed onto a per-thread stack of exception handlers; when it comes time to find an exception handler, the stack is popped until a matching handler is reached, and the equivalent of longjmp() is used to re-load that thread state, resuming execution.
This approach has two downsides; first of all, there’s no way for a debugger or crash reporter to use the exception unwinding information to unwind arbitrary intermediate frames. The only time exception unwinding information is available is when a
catch block is executed, and in that case, it’s only possible to restore the specifically saved thread state. Secondly, there is the issue of runtime cost. At each
finally statement, the thread state must be saved and pushed onto a stack, even if it’s never used.
The alternative approach, and what is used on Apple’s 64-bit platforms, is the use of so-called zero-cost exceptions. Rather than recording thread state at runtime, the compiler builds a lookup table that covers *all code*in an executable. This table defines how to accurately unwind a single frame from any valid instruction address, as well as providing language/runtime-specific definitions of where try/catch/finally blocks are defined, and how to handle them.
As a result, it’s not necessary to do any work at runtime if an exception is not thrown; hence the name “zero-cost exceptions”. If an exception is thrown, the language/exception runtime must consult the lookup table to correctly unwind the stack.
As it turns out, this is exactly the same information that debuggers, crash reporters, and evil crash recovery hacks need to perform their own stack unwinding.
Interpreting the Unwind Data
To correctly unwind a frame in our crash recovery system, we need to actually interpret the unwind data, and extract the rules necessary to calculate, load, or otherwise restore the caller’s original register and stack state.
Conceptually, it helps to think of the unwind data as being stored as two-column table; each row in the table represents an instruction address within the binary (the first column), and (in the second column) are the unwind instructions necessary to restore the caller’s state. To perform the unwind operation, we first need to find the row that represents the instruction at which the crash occured, and then apply any restoration rules defined at that row.
In reality, such a direct encoding of the unwind table would be prohibitively enormous. To solve that, complex encoding schemes are used to minimize duplication and maximize data re-use; on Apple platforms, these are DWARF and Apple’s own Compact Unwind.
DWARF is a (mostly) platform architecture-neutral standard for defining debugging information, including unwind data. To add support for a new architecture in PLCrashReporter’s DWARF implementation, it’s generally sufficient to simply add a mapping between DWARF register numbers defined by the platform vendor, and the actual registers they represent; interpreting the format operates entirely in terms of these abstract register numbers.
The encoding is capable of representing almost any possible set of unwind rules; the lookup tables and restoration rules are implemented as an versatile interpreted bytecode, including a turing complete set of DWARF expression opcodes. Amusingly, this aspect of DWARF has been used to implement in-process arbitrary code execution without native code. If you’re curious, you can peruse PLCrashReporter’s DWARF expression interpreter source here.
While enormously useful (and necessary!), the versatility of DWARF comes at the cost of the encoding’s conciseness; this is what Apple set out to address with their non-standard Compact Unwind encoding.
Apple’s Compact Unwind encoding is architecture-specific, non-portable, and is unique to Apple. Whereas DWARF can represent almost any set of rules necessary to perform unwinding, Apple took a different approach with the compact unwind encoding — it’s only capable of representing a limited set of unwinding rules, but these rules cover all (or just about all) of the code constructs actually emitted by the compiler. In exchange for these limitations, the compact unwind encoding is, well, compact; it’s much smaller than the corresponding DWARF representation, which means appreciably smaller binaries.
The Compact Unwind encoding can’t represent the full range of unwinding rules that may be required, and as such, it’s used in concert with DWARF. At link time, any DWARF rules that can be represented using the compact unwind encoding will be converted by
ld, and the DWARF data will be discarded.
Since the original DWARF data is discarded, this means that correct crash reporting (and, in the case of Crash Recovery, frame unwinding) requires both DWARF and Compact Unwind support. You can find you can find PLCrashReporter’s Compact Unwind implementation (for x86-64 and ARM64) here.
Applying the Unwind Changes
As part of our work to support crash reporting on 64-bit platforms, we already had implemented full DWARF and Compact Unwind support in PLCrashReporter, including the APIs necessary to represent register modifications across stack frames; we implemented this with the eventual goal of including non-volatile register state for *all *frames in all threads in the crash report.
We had to do very little to implement the Crash Recovery system — it was a simple matter of calling directly into our unwinding APIs from our signal handler, and applying the computed register results to the
ucontext_t containing the signal thread state. If you return directly from a signal handler, any changes made to
ucontext_t within the signal handler will be applied to the target thread — by modifying the
ucontext_t, we’re able to update the stack pointer, frame pointer, as well as any non-volatile registers. In addition, by setting the instruction pointer, we actually cause the thread to resume in the crashed function’s caller upon return from the signal handler.
Since it’s just a little bit of glue on top of PLCrashReporter’s existing async-safe APIs, the Crash Recovery code only took about a day to write; if you’d like to take a look, the signal handler additions can be found here.
Returning Nil to the Caller
Having implemented unwinding, the last thing we needed to do was set the return value to nil. I have to admit we cheated a bit here; we ignored floating point and structure return types.
To handle pointer return types (including Objective-C objects), we simply set the return address register to
0x0. This handles most return values, but in the case where structures are returned on the stack, or a special handling is required for floating point, you’ll see unexpected results.
While the Crash Recovery implementation is an interesting technical exploration of what’s possible, it would be a terrible idea to actually use it as a blanket “fix” for crashes, even if it worked absolutely perfectly. The nature of crash is such that the current process state is, by default, undefined; if it was defined, it wouldn’t have crashed. Blindly attempting to proceed can do worse than crash; data corruption and deadlocks are entirely likely.
That doesn’t mean that this avenue of exploration is bereft of value, however. For example, if we extended the PLCrashReporter APIs to directly support the idea of “patch and continue”, we could support some pretty common operations that currently require custom per-architecture+platform implementations in runtime VMs, such as trapping “optimistic” error handling cases – managed code could use this mechanism to exclude NULL or divide-by-zero checks in generated machine code, instead trapping the signals, verifying that the failure occurs within managed code, and converting the signal into a language-level stack-unwinding NullPointerException or DivideByZero exception.
A more aggressive avenue of exploration is the idea of emergency “hot patches” deployable directly from a crash reporting service. If your shipped application is unexpectedly crashing across your entire user-base with a call to
CFRelease(NULL), and you know it’s safe to work around the issue, a crash reporting service could support feeding a PLCrashReporter-based hotpatch to your application, working around the issue until you could actually ship a release.
After all, there’s no upside to having customers being frustrated and continuing to submit crash reports for a known issue.
It’s not clear that we’ll see any of these ideas — or any of the others floating around our heads — in an actual shipping product, but it’s mighty fun to hack something out and see what they might look like.