A few weeks ago, I wrote about the recent Apache Commons Collections deserialization vulnerability in Let’s Calm Down About Apache Commons Collections. I said we were going to look into finding other libraries that were also vulnerable. In this post, I publish the findings and conclusions. Then I geek-out by excitedly describing plans and ideas for future research.
The original criteria for an interesting library were that it must 1.) reference Apache Commons Collections (ACC) and 2.) do some kind of deserialization. In this case, we defined deserialization as calling any of these methods:
Satisfying the first criterion means the ACC classes would be on the class path and the ACC exploit should work. The second criterion is a compromise because actual vulnerability requires the code to be deserializing untrusted or user-supplied data, which is extremely difficult to determine with static analysis. Limiting to any kind of deserialization is a rough approximation but at least it shouldn't exclude anything which would ultimately be vulnerable.
Using these two criteria, we generated a list of the most popular open source libraries in our dataset. The next step was to manually triage each library to see if it was actually vulnerable, i.e. it deserialized untrusted data. As we started to investigate, it quickly became clear our approach wouldn't work. It's not possible to look at a library and tell if it's vulnerable most of the time because the deserialization behavior was almost always generic. Its uses or misuse was entirely up to how an application used the library. Our method would have probably worked better if we were analyzing applications rather than libraries.
Researcher protip: Ever notice people sometimes use the word "methodology" rather than "method" to sound smarter? Ology is a word which means a subject of study. Methodology is thus a study of methods. However, the meanings of words are determined by usage. Enough people use "methodology" as "method" to change the definition, and that's fine; modern English is but a corruption of Old English. But, you can subtly show off your Greek and Latin knowledge by using the original meaning.
It should be emphasized that the libraries themselves are not vulnerable, but they have the building blocks that could be used with a vulnerable application. Also, these libraries are probably used in many applications. Developers that use these libraries in their applications should be aware of the risk and should check carefully if they're deserializing untrusted data. As I pointed out in my previous post on this, the real underlying issue is that many established, popular, and well maintained applications were still deserializing user-supplied data.
Below is the list of libraries and the affected version. This list is not exhaustive; it's just the ones that are popular:
|Apache Directory API All||1.0.0-M31|
|Apache Directory API All||1.0.0-M32|
|Apache Jena - Fuseki Server Standalone Jar||2.0.0|
|Apache Jena - Fuseki Server Standalone Jar||2.3.0|
|Spring XD DIRT||1.0.3.RELEASE|
|Spring XD DIRT||1.0.4.RELEASE|
|Webx All-in-one Bundle||3.2.3|
|Webx All-in-one Bundle||3.0.14|
|Commons BeanUtils Core||1.8.3|
|Commons BeanUtils Core||1.8.2|
|Apache Hadoop Common||2.6.2|
|Apache Hadoop Common||2.5.2|
|OpenJPA Utilities Library||2.3.0|
|OpenJPA Utilities Library||2.2.2|
|Apache Commons Collections||4|
|HBase - Common||0.98.9-hadoop1|
|HBase - Common||0.98.7-hadoop1|
|Apache Directory Shared LDAP||0.9.11|
|Apache MyFaces JSF-2.2 Core Impl||1.2.5|
|Apache MyFaces JSF-2.2 Core Impl||2.2.7|
|HBase - Server||0.98.10.1-hadoop2|
|HBase - Server||0.98.7-hadoop2|
|Apache Commons BeanUtils||1.9.2|
|Apache Commons BeanUtils||1.9.1|
|Apache Crunch Core||0.13.0|
|ApacheDS MVCC BTree implementation||1.0.0-M7|
|ESAPI (only Base64.decodeToObject())||2.1.0|
|ESAPI (only Base64.decodeToObject())||2.0.1|
|OpenJPA Aggregate Jar||2.3.0|
|OpenJPA Aggregate Jar||2.2.2|
An important but often neglected aspect of research is that failure leads to progress. In this case, our method was flawed, but exploring the problem allowed us to build up intuition and create interesting ideas for further research. Our next project will be to analyze libraries for exploitable classes (aka gadgets). A tool like this could take the ACC library as input and would ideally output gadgets which could be combined to make exploit payloads. This way, developers can use the tool on their own projects to find classes they can consider hardening against deserialization attacks.
There seem to be two types of gadgets: sources and sinks. A source is a class with properties that allow it to work as an entry point. It must implement Serializable and should call a method on a member variable which implements an interface. This variable can be replaced with a dynamic proxy during deserialization. The proxy would redirect method calls on that object to the sink gadget. The sink gadget contains the payload or some potentially malicious behavior that's influenced by the internal state. In other words, it does something dangerous like invoking methods via reflection, dynamically defining classes, setting system properties, executing shell commands, and network or file system I/O. Also, what it does depends on its member variables. For example, it's not enough to just use reflection to always execute the same hard-coded method. It should execute a method that's determined by looking at member variables.
Detecting Java deserialization vulnerabilities is tricky, to say the least. Of course, the real problem is allowing deserialization of untrusted data, but that isn't always obvious. This is a blind spot and it leaves us in a bad state as attackers continue to focus their efforts away from traditional targets such as end-users and individual servers and direct it upstream to developers themselves and the open source code and libraries they use. Hopefully, a tool like this could help developers and researchers better understand and identify deserialization weaknesses.