On November 6th, 2015, a blog post was published with the title "What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability". The post describes how several popular applications are exploitable via a Java deserialization vulnerability discovered by two security researchers, Gabriel Lawrence and Chris Frohoff. The vulnerability involves combining classes from the JDK and Apache Commons Collections, a popular Java library, in a way that allows remote code execution if the application deserializes untrusted data.
The blog post stresses Apache Commons Collections (ACC) is vulnerable, but it's not that simple. While it's true that ACC classes are used in the exploit payload, the application must deserialize attacker controlled data to be vulnerable. This is like letting an angry gorilla into your house. The exploitable ACC classes are like a room with a glass menagerie. If the gorilla happens to trundle into your weird glass collection room and breaks everything, is the problem that you have a room full of tiny glass figures? Maybe. Is the problem you let an angry gorilla into your house? Definitely.
The request for a CVE for this was denied because it "is not a vulnerability in the library". Mr. Frohoff himself, one of the original researchers to find the issue, said it "is not a vulnerability in the library" and the problem is how developers treat user-supplied data.
As people started to get excited, the severity of the issue started to increase:
A Java application or library with the Apache Commons Collections library in its classpath may be coerced into executing arbitrary Java functions or bytecode, regardless if the application directly uses ACC to deserialize data from an untrusted source or not.
While this is strictly true, the way ACC is emphasized is confusing. This description puts all of the blame on ACC, and makes it sound like vulnerability has nothing to do with the real issue of deserializing untrusted objects. Someone reading this might think their app is vulnerable just by having ACC classes in their .jar.
The good folks at Apache have already proposed a fix which makes the class mentioned in the original blog post not deserializable by default and allows the developer to specifically re-enable it. A similar fix was used for a recent Groovy deserialization vulnerability - Potential Remote Code Execution via Java Object Deserialization CVE-2015-3253. Unfortunately, as people started digging into the issue, they found all kinds of classes that can behave maliciously when you let an attacker control their construction and internal state. It's not clear how much effort library maintainers should put into making their classes safe in the case of app developers doing something dangerous. ACC is so popular that it might make sense for maintainers to bend over backwards to harden classes against deserialization.
We are always looking at and researching the latest vulnerabilities in open source. This gives us a good intuition as to what vulnerable code and insecure practices look like. We also collect all of the Gems, Node packages, and Java libraries we can get our hands on and are constantly developing and improving our analysis tools. With our experience, dataset and tools, we can ask some interesting questions that allow us to find novel vulnerabilities. Actually, it's working well enough that the problem so far hasn't been in finding vulnerabilities, but in coordinating with developers to patch and disclose at scale. Why is all of this relevant? Simple. The real story here isn't about ACC, it's that WebLogic, WebSphere, JBoss, Jenkins, and OpenNMS were deserializing untrusted data. If these big, popular projects are doing it, there is bound to be more.
In order to find additional vulnerable libraries, we analyzed about 4,000 of the most popular open source Java libraries our customers use. Of these, we found 41 that deserialize data in some way and also use the affected version of ACC. This is about 1% that are potentially vulnerable. If we extrapolate this percentage to Maven, which has ~124,000 unique artifacts, that's roughly 1,240 libraries. Since our sample set contains only the latest versions and is biased towards popular libraries, realistically the final number of distinct, potentially vulnerable artifacts in Maven is probably closer to 300 - 400. Of course, this doesn't mean they are all vulnerable. Our next step is to more carefully analyze each library to see if it actually deserializes untrusted data. If we find any new vulnerabilities, we will first coordinate with the developers to publish a fix before public disclosure.
While it might be satisfying as a researcher to cry "havoc" and let slip the dogs of war, it puts many users at risk and is a missed opportunity to educate developers and improve security practices. The Apache people might not have been motivated to change their library without a public disclosure, but certainly WebLogic, WebSphere, JBoss, Jenkins, and OpenNMS would (or should) have.
To understand how to deserialize safely, it helps to understand how the attacks work. The technical write-up for CVE-2015-3253 is a good place to start. It describes a step by step break down of a deserialization vulnerability. Once you understand the concepts, they can be easily generalized to apply to other specific attacks. If you're in a hurry, here's the bottom line: Deserializing untrusted data means an attacker can get your app to construct an object of any class on the app's classpath with arbitrary internal state, in other words, with complete control of member variables. An attacker may be able to combine the normal, benign behaviors of several classes to execute a malicious payload. The classes used to construct the serialized payload are sometimes called gadgets, similar to gadgets in return-oriented programming.
This leads to the obvious solution: never deserialize untrusted data. If you need to move data around, there are plenty of alternatives to serialized Java objects, e.g. Gson which allows for arbitrary object serialization to
JSON without needing annotations.
If you simply must deserialize objects you can't completely control, consider using look-ahead class verification. Several libraries exist to do this, such as SerialKiller. Normally, an object is deserialized before the type is checked. With look-ahead verification, the serialized data is inspected to ensure it contains only allowed types before anything is instantiated. This allows your code to abort deserialization before anything malicious is executed.