Ruby + GCM Nonce Reuse: When your language sets you up to fail…

A couple hours ago, Mike Santillana posted to oss-security about a rather interesting find in Ruby’s OpenSSL library; in this case, the flaw is subtle – so much so that it’s unlikely that anyone would notice it, and it’s a matter of a seemingly insignificant choice that determines if your code is affected. When performing AES-GCM encryption, if you set the key first, then the IV, and you are fine – set the IV first, you’re in trouble.

Depending on the order you set properties, you can introduce a critical flaw into your application.

If you set the IV before the Key, it’ll use an empty (all zeros) nonce. We all (hopefully) know just how bad nonce reuse in GCM mode can be – in case you don’t recall, there’s a great paper on the topic. The short version is, if you reuse a nonce you are in serious trouble.

The Issue

Here’s the code that demonstrates the issue (based on code from Mike’s post, with some changes to better demonstrate the issue):

Each time this is called, a unique ciphertext should be produced thanks to the random IV (or nonce in this case), yet, that isn’t what happens:

In the first two cases, a random IV is used (cipher.random_iv), and should produce unique ciphertext every time it’s called, in the third case, we explicitly set a null IV – and if you notice, all three produce the same output. What’s happening is that the nonce is all zeros for all three cases – the random nonce isn’t being used at all, and thus the ciphertext is repeated each time the same message is encrypted. The fact that a null IV is used when no value is supplied is actually documented – it just so happens to be that setting an IV prior to setting the key is effectively the same as not setting one at all.

A bug 5 years in the making…

The cause of this issue is a test case that was found five years ago:

ruby -e 'require "openssl";"ECB").update "testtesttesttest"'

What this code did was trigger a segmentation fault due to performing an update before the key was set. The workaround that was added was to initialize the Cipher with a null key to prevent the crash. At the time, this change wasn’t seen as being significant:

Processing data by Cipher#update without initializing key (meaningless usage of Cipher object since we don’t offer a way to export a key) could cause SEGV.

This ‘fix’ set the stage for this issue to come up. Setting a key that was meant to be overwritten caused a change in behavior in OpenSSL’s aes_gcm_init_key – instead of preserving a previously set IV, the IV is instead overwritten with the default null value. This isn’t exactly obvious behavior, and can only be seen by careful examination of the OpenSSL code.

So this is less a single bug, and more of a combination of odd behaviors that combine in a specific way to create a particularly nasty issue.

APIs That Lead To Failure

OpenSSL is notorious for its complicated API – calling it ‘developer unfriendly’ would be a massive understatement. I greatly appreciate the work that the OpenSSL developers do, don’t get me wrong – but seeing issues due to mistakes and misunderstandings of how OpenSSL works is quite common. Even in simple cases, it’s easy to make mistakes that lead to security issues.

It’s clear that this issue is just the latest in a long line caused by a complex and difficult to understand API, that makes what appears to be a simple change have far greater impact than anticipated. The result, is another API that you have to understand in great detail to use safely.

To make it clear, this isn’t the only case where order of operations can lead to failure.

The Workaround

The workaround is to just update code to move the call to cipher.random_iv to some point after the key is set – but this is something that shouldn’t matter. There are discussions going on now to determine how to correct the issue.

Ruby’s OpenSSL library is a core library that’s widely used, and performs security critical tasks in countless applications. For a flaw like this to go unnoticed is more than a little worrying. It’s vital that languages and libraries make mistakes as hard as possible – in this case, they failed.