Take WASC Data With a Grain of Salt

The Web Application Security Consortium (WASC) just published statistics on the prevalence of various web application vulnerabilities. The list was compiled from 31,373 automated assessments performed during 2006 by four contributing companies, with the methodology around data collection described as follows:

The scans include a combination of raw scan results and results that have been manually validated to remove false positive results. The statistics do not include the results of any purely manual security audits (aka human assessments).

As with any statistical data, the results of this study should be digested with a healthy dose of skepticism and a solid understanding of the sampling bias. Take, for example, a political tracking poll conducted by phone during normal business hours. The results of the poll will only account for the opinions of voters with publicly listed phone numbers who happen to be home during the day (and who don't screen their calls to weed out tracking polls). The sampling bias of the WASC study is that it only accounts for the findings of automated web application scanners. As a result, it primarily reflects the capabilities and limitations of these scanners, not the general state of web application security, as one might reasonably expect from a WASC publication.

Keeping this bias in mind, what does this data really tell us, beyond the fact that automated vulnerability scanners find a lot of XSS? Does it give us true visibility into the actual prevalence and distribution of vulnerabilities in custom web applications? My answer is no.

Let's look at a sample of the prevalence data:

Those numbers just don't pass the "giggle test." The category that stands out the most in that list is Insufficient Authorization, a very common vulnerability in my experience. It's highly unlikely that only four of the applications contain authorization-related vulnerabilities. All this does is highlight the limitations of automated web app scanners.

What about Cross-Site Request Forgery? That doesn't show up at all on the list, despite the fact that the vast majority of web applications are vulnerable to it (even Jeremiah agrees on this point). It's not on the list because it isn't something the automated scanners can detect with any degree of accuracy. For the same reason, several categories on the OWASP Top Ten aren't even represented, such as Buffer Overflows and Denial of Service.

Now let's talk about false positives. The methodology clearly states that the data is a mixture of raw scan output and manually validated results. Since the results are presented in aggregate, it is impossible to derive real meaning from the figures without insight into the following information:

    • Which results came from which product
    • Which results have been manually validated
    • The historical false positive rates, by category, for each product

There is also lack of clarity around the definition of "one vulnerability." Consider this code snippet:

Map params = request.getParameterMap();
PrintWriter pw = response.getWriter();
for (String key : params.keySet())
  for (String value : params.get(key))
    pw.println(key + "=" + value + " ");

An automated scanner might report that as 100 different XSS vulnerabilities, one for each parameter that it fuzzed. However, there is only one actual flaw in the code. This is a simplistic example, but I suspect the inflated XSS numbers are partly due to this type of accounting.

In conclusion, here are the key takeaways from this list, after accounting for all of the weaknesses inherent to the methodology and the data itself:

    • Automated web app scanners find a lot of XSS and SQL Injection
    • Automated web app scanners are ineffective at finding vulnerabilities that require some understanding of higher-level logic, e.g. Insufficient Authorization or CSRF
    • Including raw scan results from a category of products that are notorious for high false positive rates makes the resulting statistics even less meaningful
    • The many-to-one mapping of vulnerabilities to actual instances of flawed code artificially inflate the prevalence of certain categories

In other words, this study provides minimal value to a veteran pen tester, and is misleading to just about anyone else.

Comments (3)

Dennis | April 11, 2007 9:15 am

Really good analysis. I'm always amazed when people take these reports and just regurgitate the stats with no context. Thanks Chris.

Michael Sutton | April 11, 2007 9:59 am

Chris, I appreciate your critical review of the WASC Web Application Security Statistics. You are absolutely correct that the statistics 'reflect the capabilities and limitations of [the] scanners [used]'. You are also correct that these numbers include a sampling bias. We certainly did not intend for these statistics to be taken as gospel truth that all websites will have a like distribution of vulnerabilities. We do however strongly believe that web application vulnerabilities are a growing problem that will not go away unless we do something about them. In describing the purpose of the project we did our very best to explain that this is an emerging initiative with limitations, not a flawless piece of scientific data. Having competing firms band together to share data to provide to the community is a positive step forward. We have reasonable stats on COTS software but very little to provide insight into what we're seeing in custom web applications. The WASC Web Application Security Statistics project is a first step, not the end of the journey. I hope that other vendors will join the initiative going forward to both provide additional data to remove statistical biases and to continue to critique the initiative as you have done here. I know that Veracode conducts assessments of web applications and hope that Veracode will consider joining this initiative to ensure that we are able to provide increasingly accurate statistics. Regards, Michael Sutton WASC Web Application Security Statistics Project Leader

CEng | April 11, 2007 1:14 pm

@Michael: Thanks for weighing in. I can certainly understand and appreciate the effort involved in pulling together this amount of data and to that extent I believe this project is a step in the right direction. My motivation behind writing this post was not to malign the project itself, but rather to provide some much-needed context and to help others interpret the data. I really don't think your site points out the limitations and inaccuracies of the data set very clearly. For example, it states that "statistical biases will be lessened as more entities contribute to the initiative," which really still doesn't address the underlying problem. That statement implies that the statistical deficiencies stem from not having a large or diverse enough sample set. However, the size of the data set and the number of vendors is not the issue, it's the data itself. Case in point, I can't think of any upside to commingling raw scan data with validated results. It just doesn't make sense -- it pollutes and obscures the data that is meaningful to most readers. It's a given that the stats shouldn't be "taken as gospel truth that all websites will have a like distribution of vulnerabilities," and I don't think anyone would reasonably expect that. My point is that these numbers aren't even in the ballpark.

Please Post Your Comments & Reviews

Your email address will not be published. Required fields are marked *

The content of this field is kept private and will not be shown publicly.