On the need for an open Security Journal

The information security industry, and more significantly, the hacking community are prolific producers of incredibly valuable research; yet much of it is lost to most of those that need to see it. Unlike academic research which is typically published in journals (with varying degrees of openness), most research conducted within the community is presented at a conference – and occasionally with an accompanying blog post. There is no journal, no central source that this knowledge goes to; if you aren’t at the right conference, or follow the right people on Twitter, there’s a great chance you’ll never know it happened.

There are many issues that this creates, here I will cover a few:

Citing Prior Research

In most conference presentations, seeing prior research cited is the exception, not the rule; this is not because the presenter is trying to take all of the credit for themselves, but a symptom of a larger issue: they are likely unaware of what’s been done before them. Unlike other fields where it’s clear that research builds on the work of those that have come before, in information security, the basis of research is unclear at best and totally lost at worst.

This leads to work being recreated, as happened recently in the case of TIME / HEIST – where the same discovery was made and presented by two groups nearly four years apart. In this case, for one reason or another, the HEIST researchers were unaware of the TIME research, and presented it as a new discovery. This clearly demonstrates two major problems:

  • Researchers aren’t seeing prior research.
  • Research is not being published in a way that makes it easy to find.

When Brandon Wilson and I were working on verifying and extending the BadUSB research, we were clearly aware of the work done by SR Labs and clearly disclosed the role their work played in what we released – what we should have cited though was a number of others that had performed research on similar devices, such as the work on SD Cards, though we weren’t aware of it at the time we began our work. In this case, there’s a blog post and a recorded talk (which is far better than most others), though it’s still not something we had seen.

By not citing prior work, we not only deny credit to those that had moved the state of the art forward, we continually reinvent the wheel – instead of building on the full knowledge of those that came before us, we are recreating the basic results again. Instead of iterating and improving, we are missing the insights and advantages of learning from others, we are recreating the same mistakes that had been solved in the past, we are wasting time by rediscovering what is already known.

There is also the issue of finding the latest research on a topic – when sources are properly cited, it’s possible to follow the chain to founding research, and to the latest research, as this is so rarely done in work produced by the community, it’s impossible to find the latest, to see the impact of the research you do, or see what’s been done to validate research. By not having these connections, a great deal of insight is completely lost.

This is also a very significant issue for those performing academic research – as it’s considered misconduct to not cite sources, yet without a way to clearly identify existing research, it’s difficult to impossible to cite the work that the community and industry does. This furthers the gap that exists between academic and applied information security. Some criticize academic researchers for being out of touch with the rest of the world – a major part of that is that we make it impossible for them not to be.

Losing History

Perhaps the greatest cost of not having a central knowledge store is that much research is lost completely – the blogs of independent researchers are notoriously unstable, often disappearing and taking all of content with it. We are sometimes lucky that the content is reproduced in a mailing list or been archived in the Wayback Machine, though in too many cases it is truly gone.

Countless hours are invested every year, and there is at least one conference every week of the year – with material that may never be presented or recorded again. Only those that attended are exposed to it, so it exists only in the memory of a few select people.

There was a time that a person could go to enough conferences, read enough blogs, follow enough mailing lists to keep up with the majority of new research – those days have long since passed. Today, it is impossible for any individual to remain truly abreast of new research.

Steps Forward & Backwards

In the past, zines such as Phrack could help share tha great deal of knowledge that’s produced, though now with years between releases, it is far from able to keep up. An effort that was a real step forward, PoC||GTFO, has helped some – with a few issues per year and has been able to issue the collected papers from conferences. Though the highly humorous tone, irregular schedule, and level of effort required to release a single issue bring up questions of suitability for the solution to this problem.

The Proposal: An Open Journal

On August 21st I tweeted a poll asking a simple question:

If there was an open, semi-formal journal, would you submit your papers, talks, research for publication?

This poll was seen 14,862 times, received 55 retweets, and numerous replies; there were 204 votes, which break down like this:

  • Yes: 42%
  • Maybe: 27%
  • No: 7%
  • Why is this needed? 24%

The last number is the most interesting to me: to many of us, the issues are clear and of increasing importance, to others, it’s less so. When I posted this poll, I knew that number would be interesting, but at 24%, it’s more significant than I expected. There are, of course, academic journals available, though they are not suited to the needs of the community – nor entirely appropriate for the research that is published. This shows the deep cultural gap between academic and practitioners, and why purely academic journals haven’t been able to address these needs.

In the replies, a number of questions were raised, which reveal some interesting issues and concerns:

  • I don’t write formal papers, I don’t know what would be expected.
  • I can present at a conference, but formal papers make me uncomfortable.
  • Do I have to pay to have the work published?
  • How would this be funded?

There are a number of interesting things here, the most significant I see is that it’s clear that the publishing model used for academic journals doesn’t work for the community. This is often independent work that has little to no funding, so there are no grants, no assistants to help with the paper, and no familiarity with the somewhat unique world of academic journals.

For such an effort to succeed, a number of objectives would have to be met:

  • No cost to publish.
  • No cost to access.
  • Simple submission requirements to minimize pain and learning curve.
  • Cooperation with conferences to encourage submissions.
  • Regular publication schedule.
  • Volunteers willing to coordinate and edit.
  • Community control, no corporate interference or control.
  • All rights should remain with the authors, with license granted for publishing.
  • 100% non-profit.

In my view, a new organization needs to be created, with the goal of becoming an IRS recognized non-profit, with a board of directors elected by the community to guide the creation and publication of this journal. Funding should be from organization members and corporate sponsors; with a strong editorial independence policy to ensure that sources of funding can not interfere with their decisions or what is published.

The journal should strive for sufficient peer review and editorial quality to ensure that the journal is recognized as a legitimate source of trustworthy information, and as a permanent archive of the collected knowledge of the industry. Access to this information should be free to all, so that knowledge spreads, and is not locked behind a paywall or left to perish – unknown and unseen. The journal should strive to be as complete as possible, working with researchers, with companies, and with conferences to collect and published as much high quality research as possible.

Publication in this journal should be a matter of pride for authors, something they advertise as an achievement.

Path Forward

To move forward with this, will require the help and support of many people – it is not a simple task, and comes with many complications to succeed. Though as the industry and community grow, it’s clear to many that a solution is needed for this problem. The knowledge produced needs to be collected, and made easy to find, easy to cite, and freely available to all of those that seek it.

TLS Certificates from the Top Million Sites

Thanks to the recent WoSign / StartCom issues with misused or flawed certificates, there was a question about how many of the certificates issued were actually active – so this seemed to be a good time to run a scan against the Alexa top 1 million sites to see what I could find.

Scan information: The scan was ran with an 8 second timeout – so any server that couldn’t complete a handshake within 8 seconds isn’t counted. The scanner attempted to connect to the domain on port 443, and if that failed, then attempted to connect to the “www” subdomain. No certificate validation was performed. The scan didn’t attempt any other ports or subdomains, so it’s far from a complete picture.

Of the top 1 million sites, 700,275 respond with a certificate on port 443; here are the top certificate authorities identified:

6b0zdvqx

  • Comodo: 192,646
  • GeoTrust Inc: 85,964
  • GoDaddy.com, Inc: 45,609
  • GlobalSign: 42,111
  • Let’s Encrypt: 38,190
  • Symantec Corporation: 27,612
  • DigiCert Inc: 27,588
  • cPanel, Inc: 24,195
  • thawte, Inc: 18,640
  • Starfield Technologies, Inc: 11,411

The first thing you may notice looking at that list, is that Comodo is the overwhelming leader – accounting for 28% of all certificates returned. Part of the reason for this is the partnerships they have: they are the certificate provider for CloudFlare, which offers TLS support to all customers, including those on their free plan, as well as various hosting providers. Here’s a breakdown:

  • CloudFlare: 73,339
  • HostGator: 9,499
  • BlueHost: 4,410
  • Others: 19,904

It’s quite exciting to see Let’s Encrypt not only in the top 10, but in the top 5 – for an organization that runs on less than $3M, they have done a remarkable job on impacting the industry.

For the two that started it all, StartCom has 10,221 and WoSign comes in at 3,965 – though in discussions with someone who has access to Google’s internal crawl logs, this is far from a complete picture of the usage of their certificates. As this scan only included the root (or www subdomain in case of an error) and port 443, this scan only captures some of the picture.

For invalid certificates (self-signed & otherwise untrusted), certificates with no issuer organization actually made it in the top 10, with 25,954 certificates; this is followed by localhost with 13,450 and Parallels Panel with 11,450 certificates.

The raw data for the scan is available here and was produced with a beta feature in YAWAST.

Thanks to Kenn White for the graphic!

Ruby + GCM Nonce Reuse: When your language sets you up to fail…

A couple hours ago, Mike Santillana posted to oss-security about a rather interesting find in Ruby’s OpenSSL library; in this case, the flaw is subtle – so much so that it’s unlikely that anyone would notice it, and it’s a matter of a seemingly insignificant choice that determines if your code is affected. When performing AES-GCM encryption, if you set the key first, then the IV, and you are fine – set the IV first, you’re in trouble.

Depending on the order you set properties, you can introduce a critical flaw into your application.

If you set the IV before the Key, it’ll use an empty (all zeros) nonce. We all (hopefully) know just how bad nonce reuse in GCM mode can be – in case you don’t recall, there’s a great paper on the topic. The short version is, if you reuse a nonce you are in serious trouble.

The Issue

Here’s the code that demonstrates the issue (based on code from Mike’s post, with some changes to better demonstrate the issue):

Each time this is called, a unique ciphertext should be produced thanks to the random IV (or nonce in this case), yet, that isn’t what happens:

In the first two cases, a random IV is used (cipher.random_iv), and should produce unique ciphertext every time it’s called, in the third case, we explicitly set a null IV – and if you notice, all three produce the same output. What’s happening is that the nonce is all zeros for all three cases – the random nonce isn’t being used at all, and thus the ciphertext is repeated each time the same message is encrypted. The fact that a null IV is used when no value is supplied is actually documented – it just so happens to be that setting an IV prior to setting the key is effectively the same as not setting one at all.

A bug 5 years in the making…

The cause of this issue is a test case that was found five years ago:

ruby -e 'require "openssl";OpenSSL::Cipher::AES128.new("ECB").update "testtesttesttest"'

What this code did was trigger a segmentation fault due to performing an update before the key was set. The workaround that was added was to initialize the Cipher with a null key to prevent the crash. At the time, this change wasn’t seen as being significant:

Processing data by Cipher#update without initializing key (meaningless usage of Cipher object since we don’t offer a way to export a key) could cause SEGV.

This ‘fix’ set the stage for this issue to come up. Setting a key that was meant to be overwritten caused a change in behavior in OpenSSL’s aes_gcm_init_key – instead of preserving a previously set IV, the IV is instead overwritten with the default null value. This isn’t exactly obvious behavior, and can only be seen by careful examination of the OpenSSL code.

So this is less a single bug, and more of a combination of odd behaviors that combine in a specific way to create a particularly nasty issue.

APIs That Lead To Failure

OpenSSL is notorious for its complicated API – calling it ‘developer unfriendly’ would be a massive understatement. I greatly appreciate the work that the OpenSSL developers do, don’t get me wrong – but seeing issues due to mistakes and misunderstandings of how OpenSSL works is quite common. Even in simple cases, it’s easy to make mistakes that lead to security issues.

It’s clear that this issue is just the latest in a long line caused by a complex and difficult to understand API, that makes what appears to be a simple change have far greater impact than anticipated. The result, is another API that you have to understand in great detail to use safely.

To make it clear, this isn’t the only case where order of operations can lead to failure.

The Workaround

The workaround is to just update code to move the call to cipher.random_iv to some point after the key is set – but this is something that shouldn’t matter. There are discussions going on now to determine how to correct the issue.

Ruby’s OpenSSL library is a core library that’s widely used, and performs security critical tasks in countless applications. For a flaw like this to go unnoticed is more than a little worrying. It’s vital that languages and libraries make mistakes as hard as possible – in this case, they failed.

Testing for SWEET32 with YAWAST

Testing for SWEET32 isn’t simple – when the vulnerability was announced, some argued that the best solution was to assume that if a TLS server supported any of the 3DES cipher suites, consider it vulnerable. The problem is, it’s not that simple. On my employer’s corporate blog, I wrote about practical advice for dealing with SWEET32 – and pointed out that there are ways around the vulnerability, and some are quite simple.

The easy options:

  • Disable the following cipher suites:
    • SSL_DH_DSS_WITH_3DES_EDE_CBC_SHA
    • SSL_DH_RSA_WITH_3DES_EDE_CBC_SHA
    • SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
    • SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA
    • TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA
  • Limit HTTP Keep-Alive to something reasonable (Apache & Nginx default to 100 requests)

If you don’t need to worry about users on old platforms, it’s easy – just disable the 3DES suites and you’re done. If you do need to worry about customers using Windows XP, disabling the 3DES cipher suites may not be an option, as the only other cipher suites they support use RC4. In that case, you need to limit Keep-Alive sessions.

In cases where 3DES suites are enabled, but reasonable / default limits are in place, SWEET32 isn’t an issue – as there’s no way to issue enough requests within the same session (and thus the same encryption key) to be able to exploit it.

Testing with YAWAST

To test for this, I added a new feature to YAWAST that will connect to the server using a 3DES suite, and issue up to 10,000 HEAD requests, to see how many requests the server will allow in a single session. If the server allows all 10,000 requests, there’s a good chance that the server is vulnerable to SWEET32. This method is used to determine that the server is likely vulnerable, without the massive data transfer and time required to actually verify that it’s vulnerable.

Testing for SWEET32 with YAWAST looks like this:

In this case I’m using YAWAST to run a ssl scan, using the --tdessessioncount parameter to instruct YAWAST to perform the SWEET32 test. In this case, you can see that the TLS session was ended after 100 requests (Connection terminated after 100 requests (TLS Reconnected)) – which is a clear indication that the server isn’t vulnerable. Had the server actually been vulnerable, this message would have been displayed:

[V] TLS Session Request Limit: Connection not terminated after 10,000 requests; possibly vulnerable to SWEET32

The use of 10,000 requests is somewhat arbitrary – it was selected as it’s far above the defaults used for the most common servers, and is small enough that it can be conducted in a reasonable time. It could be argued that it should be larger, to reduce false positives – it could also be argued that it’s excessive, as if it allows 5,000, it’s a reasonable bet that there’s no limit being enforced. This value may change over time based on community feedback, but for now, it’s a safe value that strikes a good balance.

Vulnerable or Possibly Vulnerable?

Unfortunately the only way to confirm that a server is truly vulnerable is to perform the attack, or at least a large part of it. There will be false positives using this technique, but that’s the cost of performing a quick test (that doesn’t involve transferring tens or hundreds of gigabytes of traffic).

When looking at the results, it’s important to understand that it indicates that the server is likely vulnerable, but it is not a confirmation of vulnerability.

Developers: Placing Trust in Strangers

Much has been said, especially recently, about that mess of dependencies that modern applications have – and for those of us working in application security, there is good reason to be concerned about how these dependencies are being handled. While working on YAWAST, I was adding a new feature, and as a result, I needed a new dependency – ssllabs.rb.

While most Ruby dependencies are delivered via Gems, ssllabs.rb is a little different – it pulls directly from Github:

This got me thinking about a pattern that I’m seeing more often, across multiple languages, of developers pulling dependencies directly from the Github account of other people. There are hundreds of thousands of examples just in Ruby, that doesn’t include various build scripts or other places this is done – or all the other languages. To make this worse, this is generally a pull from HEAD, instead of a specific revision.

Absolute Trust

By pulling in a dependency, especially from HEAD, from another developer is placing a remarkable amount of trust in them – as there’s no release process, no opportunity for code signing (which is missing from most dependency systems), nothing but faith in that developer.

But, it’s not just faith in the developer – it’s faith in the strength of their password, faith in the security of their computer, faith in the security of their email account, faith that they have two-factor authentication enabled – I could keep going. If the developer they pull from is compromised – in any of many ways, it’s a trivial matter to commit a change as them. Once committed, that change will be picked up the next time those that use it build (or update) – giving an attacker code running on their machine – code that will most likely end up in production.

I’ve written before about the difficulty in spotting malicious code – an apparently simple refactoring commit can easy hide the insertion of malicious code, and it’s likely that it wouldn’t be quickly noticed.

Once a developer’s Github account or computer is compromised, it’s a simple matter to commit something that appears innocent, but adds a backdoor, which could be picked up by hundreds or thousands of those that depend on the code before they are able to recover access, detect, and address the malicious code. Depending on the attack vector, it could take days for the developer to regain full access – that’s a large window to be deploying malicious code. That’s assuming of course, that the developer even notices it, many popular dependencies have been inactive for months or even years.

A larger problem…

This is, of course, just one front in a larger problem – the external dependencies that are used are rarely reviewed, almost never signed, the dependencies they bring in are almost always ignored. Far too many development shops have no idea of what they are actually pushing to production – or what issues that code is bringing with it.

There are so many opportunities to attack dependency management – thankfully few attacks are seen in the wild, but I don’t take too much comfort in that.