/oct 18, 2015

Using CPEs for Open-Source vulnerabilities? Think Again

By Sean Kinzer

As a Customer Success Engineer, I spend a lot of time doing product demos and helping with configurations/customizations. I often get asked in demos something along the lines of “I was trying tool 'x' or tool 'y' which uses CPE’s and the NVD. What do you think of that?”. The other day I was asked the same question over email and so thought I would share my reply (edited for this blog of course). This blog post is about why you should think again if you are relying on CPE’s for open-source software security.

For the record we know and very much respect some of the teams building free open-source tools using CPE’s and this post is not intended as a cheap jab at those tools; rather it’s pointing out that some of the underlying technology they rely on is not designed for the problem in-hand which results in very significant false positives and false negatives. We have some interesting ideas about ways we can help those free open-source tools be much better and help the community raise the bar across the industry.

What exactly is a CPE and what were they designed for?

CPE stands for Common Platform Enumeration and is "a standardized method of describing and identifying classes of applications, operating systems, and hardware devices present among an enterprise's computing assets". CPE was developed by the good people at MITRE, and in November 2014 moved to US government as part of NIST. The original Mitre site describes the key use case of software inventory management here:

A software inventory management product vendor uses CPE Names to tag data elements within their product’s data model. These data elements may directly represent the individual software products that exist on an end system (e.g., a laptop, desktop, or server).

The format for a CPE is:

cpe:/ {part} : {vendor} : {product} : {version} : {update} : {edition} : {language}

The key here is IT asset management i.e. hardware, operating systems and the applications that get installed on them like Adobe Flash or Microsoft Office, but not software components like Log4J or OmniAuth. In most cases, the "language" field is not even used even when multiple CPEs with varying technologies map to the same CPE.

For completeness, an example CPE for the vulnerability CVE-2015-6682 on the NVD in the Adobe Flash Player is:

+ Configuration 1 + AND + OR * cpe:/a:adobe:airsdk%26_compiler:18.0.0.180 and previous versions * cpe:/a:adobe:air_sdk:18.0.0.199 and previous versions * cpe:/a:adobe:air:18.0.0.199 and previous versions + OR cpe:/o:apple:mac_os_x:- cpe:/o:microsoft:windows:-

CPE’s are distributed in a CPE dictionary (large XML file) found here.

The fact that CPE’s are distributed with CVE’s in the NVD on the surface makes it look like an attractive option for building a tool to identify vulnerable software libraries. Indeed, in the early days of our company the founders played with them for a short period of time before totally writing them off and deciding they would never be able to provide the level of accuracy or completeness needed.

Why don't CPEs work for software components?

The root of the problem is that to generate a useful CPE for a software component it needs to be predictably created and totally unique in order to match it against a central database which is then in-turn mapped to known vulnerabilities. The two fundamental limiting issues are:

  • No central control over the naming of open-source components (ie not unique and predictable)
  • The pace and manner in which components are created makes a central dictionary impractical

At the time of writing this article (October 15th, 2015) there are 685,450 unique (not counting different versions) libraries across RubyGems, Maven, Node.js, Packagist, PyPI, nuget, CPAN, and Bower. That is 685,450 software libraries from just the top 10 most popular languages not including C/C++ that you would have to define in a central government database. Today there are only 106,234 CPE entries which represents around 15% of the population of open-source software libraries used by developers. A CPE entry can specify an exact version of a technology however, and after taking out different versions of the same CPE I was left with 13,547 items. There are also CPEs which represent hardware or operating systems indicated by the "part" field which can be taken out, leaving 9,874 CPEs or 1.44% of the open-source library population for the languages I mentioned. Of course the real statistics are even worse because not all of the "application" CPEs represent third party libraries, and are instead used for actual applications such as Adobe Flash Player. CPE names are created on an as needed basis meaning CPEs are only generated when a CVE is released and the vulnerable target does not have an existing CPE. This implies the absence of a matched CPE name also indicates the absence of any issues.

I will show you examples of how the fundamental issues manifest themselves next with actual examples but it's worth pointing out that the general problems described above were exactly the same ones now solved by package managers like Bundler, Maven, Composer, NPM etc. The systems works for those individual package management systems by using a unique co-ordinate system where a developer registers to get a coordinate at release and a central repository does the lookups on behalf of the build manager. There are still ways to attack those systems and some of my colleagues in our R&D team will be releasing a paper about that soon, but for the most part it works.

Below are some examples of my manual construction of CPEs for various libraries and attempting to match them to vulnerabilities. When building a CPE entry, there are obviously many ways to assign product and vendor values so my simple methodology is here.

Java (Maven)

I chose Spring Core 4.0.3.RELEASE to test creating a Maven CPE. I used the information from the GAV found in the pom.xml snippet to fill in the vendor, product, and version fields:

<dependency>
        <groupId>org.springframework</groupId> //springframework for vendor
        <artifactId>spring-core</artifactId> //spring-core for product
        <version>4.0.3.RELEASE</version>
</dependency>

I built a CPE looking like this: cpe:/a:org.springframework:spring-core:4.0.3.RELEASE

Unfortunately, this does not provide the correct information because springframework is not the vendor/distributor in this particular case, and instead Pivotal is. The artifactId cannot be used for the product because the groupdId is the uniquely identifiable product, and spring-core is a submodule. Despite the vendor information not being found in the Manifest file, I decided to construct the CPE with the correct vendor anyways in case it could be derived another way and came up with: cpe:/a:pivotal:springframework:4.0.3.RELEASE. The closest CPE entry to my CPE was cpe:/a:pivotal:spring_framework:4.0.3 and searching the NVD showed that CVE-2014-3625 was associated to it:

Directory traversal vulnerability in Pivotal Spring Framework 3.0.4 through 3.2.x before 3.2.12, 4.0.x before 4.0.8, and 4.1.x before 4.1.2 allows remote attackers to read arbitrary files via unspecified vectors, related to static resource handling.

Something told me that the entirety of Spring Framework did not share a directory traversal vulnerability. After looking through each GitHub commit (commit 1, commit 2, commit 3) for this vulnerability, I discovered that the fix for the issue was restricted to the spring-webmvc module of Spring Framework, thus verifying my hunch. Despite only using the spring-core module within Spring Framework, I was led to believe a vulnerability in the spring-webmvc module existed in my code.

My example reveals that the relationship between open-source libraries and CPE entries is not one-to-one, but rather multiple libraries can be mapped to a single CPE entry. If I continued to use CPEs in order to catalog the libraries in my code, I would end up with very little information pertaining to my actual code because identification will be the result of finding groupIds of modules associated with a specific component in my code rather than the component I chose to use. Not only are potentially very different modules lumped into the same identity, but version checking is nearly useless as well. The versions provided in the GAV and manifest file are associated with the specific artifactId, not the groupId. This means verifying versioning with CPEs will again lead to misinformation. I know what you are thinking, "Why not just use the artifactId for the product field?". Doing this would result in CPE collisions because different groupId's might have the same artifactId and the same vendor. Apache is a perfect example with varying groupId's such as org.apache.ws.xmlschema, org.apache.ws.commons, org.apache.ws.commons.schema, org.apache.servicemix.bundles.xmlschema, and org.apache.ws.schema which all share the xmlschema artifactId.
 

Node.js

Another issue presents itself when a library written in a certain language is discovered and the name of that component exists for some other technology. I decided to construct CPEs using the dependencies in a Node.js project and discovered a 'file' dependency which allows for "higher level file and path operations". Most Node.js CPEs simply apply the package to both the product and vendor fields, so I did as well: cpe:/a:file:file:0.2.2. I searched CVE Details for cpe:/a:file:file: in order to manually check that the project was not using a vulnerable version of this package and selected the first result. cvesearch I found that for versions earlier than 4.12, cpe:/a:file:file: is susceptible to CVE-2004-1304:

Stack-based buffer overflow in the ELF header parsing code in file before 4.12 allows attackers to execute arbitrary code via a crafted ELF file.

Well that doesn't seem right, the library in question has absolutely nothing to do with parsing ELF header files. Oh, and Node.js itself was released in 2009 and this CVE is from 2004. If the problem here isn't obvious yet, I will make it clear: trying to match open-source libraries with thousands of different technologies, all while using only the name of the library and the version, will inevitably lead to inaccuracies.

Ruby Gems

Ruby Gems run into the same issues as Node.js packages by using the library name in both the product and vendor fields, so I thought perhaps the CPE entries were wrong and the author could be used (I was wrong). Using the author or owner in the vendor field will lead to issues because multiple people could contribute to a single library and sometimes people create libraries without providing a legitimate author or owner. My favorite example of this is a gem called 'a' created by someone named 'Author'. Using the author in this case results in very little distinguishing information about this particular component. Instead, I decided to try matching the activesupport 4.1.10 gem with a CPE by using the Gemfile.lock or .gemspec file and use the gem name twice: cpe:/a:activesupport:activesupport:4.1.10. Searching the NVD did not display any vulnerabilities associated with the CPE. Fortunately, SRC:CLR does not use CPEs for matching so I knew CVE-2015-3226 and CVE-2015-3227 were vulnerabilities related to this component. The CPEs which were tagged to these two vulnerabilities were all differing versions of rails: cpe:/a:rubyonrails:ruby_on_rails. The CVE description for both of these vulnerability states that the issue is "in Active Support in Ruby on Rails" which shows that the issue of grouping different libraries together is not restricted to Maven. It is likely I would never discover this vulnerability if I were to use this particular gem and version in a non-rails project. If cpe:/a:activesupport:activesupport:4.1.10 had been used for the CVE, it might have been correctly identified. Of course if the Node.js activesupport package were to also have a CVE associated with it, collisions would occur and result in misinformation.
 

Conclusion

CPEs are great for "identifying classes of applications, operating systems, and hardware devices present among an enterprise's computing assets". Open-source libraries do not fall into any of those categories and require a different means for identification. Open-source libraries come from a wide variety of sources, include countless versions, and continue to grow at a rapid pace making identification and inventory of your code's 3rd party components quite difficult. When it comes to finding vulnerabilities in your open-source libraries, successful identification of vulnerabilities must be coupled with valid vulnerability sources. In case the fact that the NVD primarily uses CPEs to identify vulnerabilities didn't throw up a red flag for you, I will explain why relying on the NVD is a poor choice in my next blog post.

Related Posts

By Sean Kinzer

Sean is part of the customer success team at Veracode. He helps address customer issues and handles our support desk.