Our mission is to help the world's developers build software, safely. We have a lot of areas that we will be tackling and a lot of features we will be building but we started the journey by helping developers know what third-party code they are using, what it does and what components have vulnerabilities first because we think it is one of the most pressing security problems facing software development today. This post is about how we track and identify vulnerabilities and the information we are putting in the advisories.
My colleague Sean Kinzer recently wrote two excellent posts Using CPEs for Open-Source vulnerabilities? Think Again and Why Relying On the NVD is Not Good For Open-Source Security Tools. I recommend reading those posts if you haven’t already.
We believe that there are three important parts to identifying open-source component vulnerabilities:
- Disclosed vulnerabilities
- Un-disclosed vulnerabilities
- Vulnerable parts
Disclosed vulnerabilities (Signatures)
As hackers and security researchers have turned their attention to open-source components the number of disclosures has risen. In a typical week we see between five and ten relevant issues released through the NVD system but as Sean points out in his blog referenced above that is a sub-set of the pool of disclosed issues. Most developers simply monkey-patch the component in-situ or update it and then push the update to the binary distribution sites without ever notifying the US government run database.
We have a team of in-house dedicated security researchers and are constantly improving our back-end tools and processes but here are some sources of data that we track today:
- National Vulnerability Database List
- Public Product Advisory Lists
- Product Announcement Lists
- Private Security Lists
- Github commit logs for libraries (yes you read that correctly)
- Bug Trackers for libraries (yes you read that correctly)
For each potential issue that comes across our sights we first decide if it may or may not be relevant. When it’s marked as relevant a researcher does the analysis to determine what the issue is and what it affects and then turns it into something we call an artifact. In that artifact creation process we tear down the advisory to really understand it and identify the root cause. This often means creating working exploits that we share with users. We determine if there is a fix available and if there are potential work arounds as well as determining the vulnerable methods (see below). We also attach information about exploits such as metasploit to help drive prioritization in remediation.
This is manually intensive work and we will be announcing a research bounty program in the coming months. If you want to look at some great examples of completed research artifacts:
Un-disclosed vulnerabilities (Algorithms)
We know that the vast majority of open-source component security issues are not yet disclosed and we know that because we have been doing a lot of work using data-science and machine learning to examine all of the components we know about and uncover them. We aren’t quite ready to talk about all of the details about how we do this yet but at a high-level our architecture collects public open-source components when our customers use our system. We collect this open-source using a system we call Librarian that tracks all versions and their binaries and source code. Using this big-data set we are able to look for brand-new or similar issues, explore our hunches and check to make sure that patches have been applied.
- Let's say a new vulnerability was published in a Java component where XML Entity Expansion has not been turned-off by default leaving the component using the XML parser open to a denial of service attack. We can look across all other libraries to see if they have the same issues. Hint: there are a LOT.
- Let's say we see a vulnerable version of a C library being reported being used in a Jar. We can look across all the other libraries and see if the same vulnerable C library is being used elsewhere. Hint: there are a LOT.
- Let's say one library is determined to have a vulnerability then we can look across all other libraries and see if they are using that library transitively. Hint: there are a LOT.
- Let's say a fix was applied to a vulnerable library and the vendors say that that fix was also applied to a specific version range. We can look across all the versions and see if the fix was indeed applied. Hint: there are a LOT.
This is obviously “special sauce” and one of things that makes us unique so in the spirit of transparency I am just giving a small hint about what we are doing and where we are headed. Look for a lot more about this in the coming months. Honestly we have to rethink the disclosure process first!
When we first built our minimal viable product (also called a prototype) all we did was identify if people were using vulnerable components. After a little while we noticed that despite telling people that they were using high-risk vulnerable components they weren’t fixing them and couldn’t fathom why. We dived in with our early adopters who often told us that when we alerted them they looked into it but found that they weren’t using the vulnerable part of the vulnerable component or using it in a way that made them vulnerable. Luckily for us several members of the team have built commercial static code analysis tools in the past and so we knew exactly how to solve that problem.
Today we build a call graph on the users custom code which shows all the paths that their code takes. We do this by shallow cloning the code to the agent so that the source-code never leaves the users network under any circumstances. Each vulnerability artifact is annotated with the vulnerable methods that our research team have determined and the list of vulnerable methods is passed down to the agent for matching.
It turns out that developers typically only use the vulnerable methods of vulnerable components about 25% of the time meaning that if you only identify vulnerable components you have a 3x false positive rate and we all know that developers hate false positives.
So now you know what we do behind the scenes and how we work under-the-hood, let’s take a quick tour of how it's used and what the interface looks like.
You will notice interesting widgets:
- Repos Using Exploitable Vulnerable Methods - calculated by looking at repositories where your custom code calls the vulnerable methods of vulnerable components
- Vulnerable Components - the total number of vulnerable components across your repositories
- Repositories with known exploits
- Vulnerability Severity breakdown - broken down by high, medium and low risk
- Vulnerabilities by language
- Out-of-date components with vulnerabilities
Note : We are adding web-sockets soon so this will be updated in real-time (no need for a page refresh) and we'll be adding a lot more stats and graphs. If there is data you really want now just let us know.
You can see the various view that maybe of interest if you are wanting to understand what components you have here.
The Repository List View
Quickly see the repositories that contain vulnerable components.
The Vulnerabilities List View
You first see the graph at the top of the page that allows you to get a quick view and do some high level filtering. You can scroll through the entire list of the vulnerabilities in that organization if you wish (we lazy load them using React) or use the search and filters. For instance type denial of service in the issues search and we just show you only those vulnerabilities. You will notice in this view you can sort to see vulnerabilities that have known exploits.
The Vulnerability Details View
You can click into any vulnerability and see the Vulnerability Details. In the screen-shot above you can see information about the issue and how to fix it. There is a lot of detail to cover here which I will leave for a future post but there are some highlights that are important to cover.
- First you will notice that we tell you which version the issue was fixed in AND if that version is the subject of other vulnerabilities. This is a very powerful feature that avoids developers being sent on wild goose chases.
- We also show you the component's vulnerability history so you can easily see what version is free of known vulnerabilities.
- We also show you if you are using the vulnerable methods.
And of course there is even more. Each vulnerability has its own page with everything we know about the issue including the CVSS score, links to known exploits and other references about it.
Add info and screen shots here