/nov 18, 2022

Anatomy of a Stored Cross-site Scripting Vulnerability in Apache Spark

By Florian Walter

One of the services that Veracode offers is a consultation with an Application Security Consultant – a seasoned software developer and application security expert. In the context of a consultation, my team works with the software engineers of Veracode’s customers to understand and, ideally, remediate security flaws found by the Veracode tool suite.

There is a well-defined difference between a security flaw (a defect that can lead to a vulnerability) and a vulnerability (an exploitable condition within code that allows an attacker to attack it). While working with potentially dozens of different customer applications every week, we usually have a strong gut feeling for when a security flaw might constitute an exploitable vulnerability and should receive extra attention.

During one of our consultations, a set of similar Cross-site Scripting (XSS) flaws was discovered by Veracode Static Analysis in what turned out to be 3rd party JavaScript files belonging to Apache Spark. After some manual investigation, we confirmed that these flaws indeed constitute a vulnerability and reported a summary of our research to the Security Team of Apache Spark. We also provided concrete remediation guidance. 

The set of virtually identical XSS vulnerabilities was tracked as CVE-2022-31777. Now that a patch was released that fixes the vulnerabilities, we can safely share some details about their nature.

Details of the XSS Vulnerability

A Stored XSS vulnerability occurs when an application stores untrusted data (e.g., in a database, in log files, etc.) and then sends it back with an HTTP response that renders it in a web browser, without proper encoding. Stored XSS is especially useful for an attacker since a malicious payload may be returned to many users.

In our case, an attacker must inject a malicious XSS payload into the Apache Spark application logs, i.e., trigger a logging event that contains the payload (this may be achieved by numerous different ways). Afterwards, an attacker must wait for a user (likely an admin in this case) to visit the UI that renders the logs and fetch the malicious log entry, which would execute the payload in the user’s web browser.

The precise data-flow of the vulnerability is as follows: 

  1. log-view.js sends a GET request to the /log endpoint.

  2. The /log endpoint, defined in WorkerWebUI.scala, returns the result of logPage.renderLog().

  3. The renderLog() function in LogPage.scala calls getLog() within the same class, which reads parts of the logs and returns it. The log data is sent back as part of the HTTP response.

  4. The response is rendered by log-view.js without proper encoding.

In log-view.js, the returned log data is added to the DOM via jQuery’s prepend() and append() methods, which allow raw HTML to be rendered. This makes these methods unsafe for untrusted data without proper encoding. 

One of the vulnerabilities in log-view.js is the following:

screenshot of vulnerability

The vulnerable part is the last line which injects cleanData (derived from data containing the log data returned by the /log endpoint) into the DOM (note that .log-content belongs to a div in LogPage.scala). If the returned log data contains the malicious payload of the attacker, the payload would be rendered in the DOM and executed.

The other XSS vulnerabilities follow the exact same pattern.

Exploitation - Proof of Concept

First, we need to find a way to inject our PoC payload (we choose: <script>alert('Hacked through Logs..')</script>) into the logs of one of the applications that are connected to the vulnerable Apache Spark instance. How to achieve this depends on the connected applications, but should be trivial (after all, usually, a lot of user-controlled data is logged).

The below screenshot contains the application page for our “demo-app” viewed in Apache Spark. We will inject our XSS payload into the application logs of “demo-app” and then view them in the Spark log UI.

Spark log UI_

After clicking on the “stderr” button, the following page is rendered:

screenshot stderr

Now, say that there is an admin that keeps this page open in a browser tab as she constantly wants to keep an eye on the error logs of this application. At the same time, the attacker injects our PoC payload into the logs of this app.

Once the admin comes back to this browser tab, she will click the “Load New” button to fetch new log entries. This the moment when our XSS payload will be fetched and executed.

screenshot of stderr log page

As can be seen in the above screenshot, our XSS payload executes. At this point, our application logs contain the following:

screenshot xss payload

This demonstrates that an XSS payload injected into the logs of an application submitted to Apache Spark may execute, if rendered in the log UI.

Remediating the XSS Vulnerability

Working with the Apache Spark Security Team, we established that the Spark log UI does not require rendering log data as HTML, which means that we can treat the untrusted log data as text and HTML encode it. For this, we can leverage document.createTextNode(untrusted).

Based on that, we suggested to change the vulnerable code line to the following:

$("pre", ".log-content").prepend(document.createTextNode(cleanData));

We also created an example that clearly demonstrates how prepend() (the same applies to append()) is unsafe to use with untrusted data (without proper encoding) and how document.createTextNode() can protect against XSS.

Being Vigilant with Stored XSS Payloads

MITRE has rated this Apache Spark vulnerability as medium severity (CVSS3: 5.4). Organizations using Apache Spark should make sure they are using a patched version (vulnerable versions are 3.2.1 and earlier, and 3.3.0). 

This disclosed vulnerability, as well as the thousands of others that are published every year, underlines the importance of proactive and continuous application scanning. Be sure to regularly evaluate your applications with Static and Dynamic Analysis, as well as Software Composition Analysis. When appropriate, augment automated scans with Manual Penetration Tests.

Related Posts

By Florian Walter

Florian is a member of Veracode’s Application Security Consulting team. He holds a master’s degree in Informatics, with specializations in software engineering and cybersecurity. Prior to joining Veracode, Florian mostly worked as a software developer in the FinTech space. In his spare time, he likes to travel, play chess, and kickboxing.