I would like to share with you all the results of my scan and review of the Alexa Top 1,000,000 Sites HTTP response headers as they relate to security. I was mostly curious about which sites were using Content Security Policy (CSP) but ended up becoming more interested in all of the various modern day security headers that sites specify. The results were pretty impressive and I certainly learned a lot from it.
To gather a legitimately sized sample set, I decided to send HTTP and HTTPS requests to every site in the Alexa Top 1,000,000 Sites list. I followed redirects to their final destination but did not crawl any additional pages outside of the first non-redirect page returned. I decided to use Python’s asynchronous event API gevent along with Kyoto Cabinet as my data store. By splitting my request workers into three processes, each handling about 2000 concurrent requests, I was able to scan and store the results of the entire Alexa list in three to four hours. Out of the one million sites in the list, I was able to gather a total of 1,253,735 valid responses. My handling of responses was considerably unforgiving. If any error occurred, whether it was a connection error, Unicode decoding format error, or anything else that could give potential problems, the response data was simply dropped. Only responses that contained headers and content that could be easily JSON serialized were processed in this data set. Two custom headers were sent in every request; a User-Agent string corresponding to Firefox 16, and a Connection: close header. In the future I will conduct additional experiments using Google Chrome’s User-Agent string. Due to time constraints, only the response headers were reviewed. While some of these security mechanisms can also exist within the meta-tag elements of the HTML body, these were not included in the analysis.
One thing that really surprised me was how many people either use incorrect header names or invalid header values. In some cases header names are completely wrong, such as http://www.openthegraph.com/ which returned a CORS header of "allow-access-control-origin" instead of "Access-Control-Allow-Origin". Other examples of incorrect header names included: serx-frame-options (instead of x-frame-options), access-control-allow-methodsaccess-control-allow-methods, access-control-allow-methods', access-control-allow-headers', and so on.
A total of 1,253,735 HTTP and HTTPS responses were analyzed. Out of the 1.25 million results there were 17,692 security relevant headers on a total of 16,109 unique URLs. Some URLs responded with multiple security headers. Testing was done to determine the usage of the following security relevant headers:
The purpose of the X-Frame-Options header is to protect against clickjacking attacks by enforcing which sites are allowed to frame the requested resource. This header is by far the most widely adopted by websites with a whopping 12,812 sites using it correctly to protect their resources. With X-Frame-Options there are three possible values (depending on the browser): SAMEORIGIN, DENY and Allow-From. Currently, Firefox only implements the SAMEORIGIN or DENY values. Chrome and IE9 allow an origin list to be configured by including them in the Allow-From directive. What surprised me the most about the results from this was not only the wild variation in which developers chose values, but how permissive Google Chrome and IE9 are with allowing values; they seem to almost fail open.
What I found absolutely amazing was that out of the 30 valid Allow-From values, 29 came from Craigslist's domains. Of the 217 invalid values, most were attempting to mix SAMEORIGIN with Allow-From, which ends up causing Chrome and IE9 to simply fail open and allow any site to frame the resource. This effectively nullified any security benefit of setting the header. A large number of sites simply set "GOFORIT" as a value, which again causes Chrome and IE9 to allow the resource to be framed. Other sites specified the header as "Allow From" with a space instead of a hyphen, again causing the browsers to fail-open and allow any site to frame the resource.
The Cross Origin Resource Sharing (CORS) specification was developed to meet the growing need of safely allowing third party sites access to the response data of specific resources. CORS headers are defined per-resource and can limit when and how data is accessed by third party origins. The primary header for allowing third party origins is the Access-Control-Allow-Origin header. Out of 1.25 million responses only 2,539 had this header defined in some way:
There appears to be a misconception for what is allowed in the Access-Control-Allow-Origin value. I noticed eight types of syntax being used when only two are actually valid:
A common use of CORS headers is to allow third party sites to not only read responses, but to allow cookies to be sent in the request so they may access protected resources as well. By default, CORS requests will not send cookies. For a request to be sent with cookies and a response to be readable by the browser, three things are required: The resource being requested must define a single, valid origin with a scheme. The resource must also respond with an "Access-Control-Allow-Credentials" header name with the value set to true, and the client XMLHttpRequest object must have the "withCredentials" property set to true. I noticed behavior in both Firefox and Chrome sending cookies when the withCredentials was set to true, but the HTTP response was not readable. Another common misconception is that you can use the Access-Control-Allow-Credentials header with a true value alongside a resource which returns an Access-Control-Allow-Origin: * wildcard value. While the browser will indeed make the request to the target page and possibly send cookies (if withCredentials is set), it will not be able to read the response. Chrome alerts the user with below console error:
In total there were 94 URLs out of the 2,076 wildcard values that also had Access-Control-Allow-Credentials set to true.
The Strict-Transport-Security header is a relative newcomer to the field. A server providing this header instructs the browser to connect over HTTPS for any requests going forward. When a user moves to a potentially insecure network, HSTS will ensure their connections (provided the max-age attribute value is still valid) will be forced over HTTPS. There are two directives that a site can specify in the Strict-Transport-Security response. The first is the max-age directive, which basically determines how long the browser should keep the target site in its known HSTS list. The second is an includeSubDomains directive which tells the browser it should include any subdomains in its HSTS list with the specified max-age value. Note the max-age directive is required whereas the includeSubDomains directive is not. When reviewing the numbers I was shocked to see how many sites were setting the max-age to zero. This effectively tells the browser to remove the requested site from its list of HSTS sites. After looking at the URLs, it turned out that the majority of URLs were related to www.etsy.com. After asking our contact at Etsy why this was this case, we were told it was due to their SSL opt-in policy. If a member with a valid user account enables full site SSL, they will get a longer, non-zero max-age value. Etsy took this approach as a fail-safe to ensure their services were not impacted while they continue to monitor usage and set larger values. However, this ability for a sub-resource to set a domain and even subdomain wide policy is somewhat concerning. I re-read the specification a number of times and never saw this concern raised that a sub-resource can override the domain policy. To confirm, I once again asked my friend at Mozilla who had the lead developer of HSTS check the relevant source and determined that this is indeed the case. One thing to keep in mind is that the HSTS specification is in draft form and it may change. A total of the 980 URLs had a valid STS header and value. The below invalid value count is not included in this total count.
Keep in mind that only 9 sites actually have their max-age set 0, the other 206 values came from www.etsy.com's sub-resources because my crawler was accessing each of the individual shop URLs an unauthenticated user. There were however a number of unique sites that were setting their max-age to a rather low value. In this case I determined a short max-age value to be less than 8,000 seconds. A lot of sites set their value to 500 seconds, which is quite useless in terms of protection. By the time you shutdown your laptop, go to your local Starbucks and connect to their Wi-Fi, the HSTS value will have expired and the browser will continue to behave as if the value had never been set.
The moment we've all been waiting for; well at least me, as this is was the entire reason I conducted this analysis. I’m a pretty big fan of CSP and I can only see it getting more and more popular as time goes on. To me it is pretty much what the web world needs; we just need to make sure people implement it properly. If my results so far have showed you anything, it should be that how often people use invalid values even for headers which only require a single directive. Content Security Policy was started at Mozilla and has since grown into its own W3C specification. It was designed to limit the ability and impact of cross-site scripting attacks. Much like the other security headers, the X-Content-Security-Policy-* and X-WebKit-CSP headers are defined per-resource and does not apply to the entire site. There is one caveat to the results presented here, which is that all requests were made using a User-Agent header corresponding to Firefox 16. This may skew results towards having more X-Content-Security-Policy results over X-WebKit-CSP results if server administrators are responding differently depending on the requesting User-Agent header. While the specification is quite clear, if you wish to allow inline scripts or eval script code you need to supply the unsafe-inline and unsafe-eval options, respectively. However, in my testing by not specifying anything, inline scripts and eval were automatically blocked. Adding unsafe-inline and unsafe-eval and to the script-src directive did absolutely nothing. The only time I was allowed to execute either was if I explicitly set an "options inline-script" or "options eval-script" as its own directive. After speaking with a friend over at Mozilla he informed me that they were currently working on implementing the unsafe-eval and unsafe-inline directives. The reason "options eval-script" or "options inline-script" work is because Mozilla's CSP implementation was created before the W3C specification. It should be noted that any website which allows these directives is significantly reducing the protection of CSP from XSS attacks since an attacker can just inject their script inline to the target resource. Since it is rather new, only 79 URLs were found to be using CSP, of which 32 included the inline-script or eval-script options.
Time and time again I was surprised at how many of the security headers were incorrectly specified. Making security decisions is never a task one should take lightly, depending on the organization it could even be a time consuming affair to get a simple header added to a particular resource. To do it incorrectly is just a waste of time. There are even times when it is potentially dangerous such as the max-age being set to 0 for the Strict-Transport-Security header values. I will continue to monitor these sites and see what improvements they make. If you’re curious about the security headers list, feel free to download it here. As for testing the various security headers, feel free to check out my test cases.