Mean Time to Resolve (MTTR): Best Practices for DevOps

Customers that have embraced DevOps often ask me for the best metrics to measure their program. I always advocate focusing on policy compliance as the number one metric for understanding your risk, as this provides a succinct measurement of the security of your applications.

However, if you are looking to measure and motivate development teams, policy compliance doesn’t give you the granularity to introduce gamification or incentives. Policy compliance is very black and white; you either are compliant (good!) or you are not (bad!). So, when talking to customers about motivating teams in the spirit of continuous improvement, I like to bring up Mean Time to Resolve (MTTR).

What is MTTR?

I’ve also seen this as “Mean Time to Repair,” “Mean Time to Recovery,” or “Mean Time to Respond.” I personally like “resolve” as it indicates that the security finding has been closed, which is aligned with how we compute this metric.

You often see MTTR in association with DevOps and the tenet of making work visible and measurable – and thus improvable. This is why I bring it up with our users; however long it takes you to resolve a security finding will help organizations make program improvements that move the needle on the overall metric of policy compliance.

How MTTR is Calculated at Veracode

The standard definition for MTTR is along the lines of the following:
Corrective maintenance time / Total number of corrective maintenance actions.

When it came time to implement MTTR in our new analytics feature, we initially interpreted this as:
(Finding first found date – Finding closed date) for each finding, divided by the total number of findings.

Sounds good at face value, but when it comes to Veracode’s security findings, implementing this exact calculation gets a bit tricky.

MTTR for Static Analysis (SAST) Findings

Since customers are primarily using Static Analysis as part of their development pipelines, we’ll first focus on these findings to ensure the calculation makes sense. For Static Analysis findings, each finding can be open and closed many times depending on the code that is scanned. This happens regularly through the development cycle, but once you are focused on a release candidate or production application, this measurement takes on new importance.

MTTR for Software Composition Analysis (SCA) Findings

MTTR is calculated from the date the finding was first found. For SCA, that calculation is a little different, and it is the date when one of the following events occurs:

A Veracode scan detects a library with a vulnerability.
A CVE for a vulnerability within a library is published in any sandbox within an application, regardless of whether you have promoted the sandbox or evaluated it against a policy.
If the vulnerable library is removed and later re-added, the first found date resets to the date the library was re-added.

This nuance ensures that MTTR for SCA reflects the unique nature of open-source vulnerabilities and their lifecycle within an application.

Delivering a Meaningful MTTR

In Veracode Analytics, we focus on the most recent time a finding was first found and the most recent time that the finding was closed. We always look at the policy context for calculating MTTR. This ensures clear communication with development teams on what is important and what needs to be fixed.

While this can be calculated on a per-sandbox context basis, attempting to calculate MTTR across all sandboxes leads to very bizarre data due to flaw matching.

If a flaw is open in Sandbox 1 but closed in Sandbox 3 because it wasn’t present and mitigated in Sandbox 17, what is the current state of that flaw? Does the most recent scan, regardless of sandbox or policy, represent the “current” state, or does it just represent a scan that was performed? This is why limiting to the policy context is important, since there is a level of control for the scans performed at the policy level.

“Resolved” means both fixed (also known as “remediated” or no longer present in the scan) as well as mitigated, where someone has documented a compensating control for the finding and that control has been approved. This means that if a finding has an associated approved mitigation, the most recent time it was found could also be the exact same time it was resolved since the mitigation will immediately close the finding.

The final nuance to MTTR is to compare the speed of addressing policy-impacting findings vs general security debt. Veracode’s policy is regularly used as a sieve to ensure clear communication with development teams on what is important and what needs to be fixed, as opposed to what is simply additional information. If the policy is used correctly, you should see that policy-impacting findings are resolved at a faster rate than all other findings. If this isn’t the case, then the policy isn’t being used by the dev time to prioritize work.

The ‘Average’ Approach to MTTR

MTTR is by nature a calculation; despite its name, we are actually performing an average.

“Days to Resolve” is a dimension on a finding. This data is only populated if the finding is in a closed state. A finding is a flaw-matching flaw that Veracode has seen over many scans. Incidentally, this is why we separate out “Scan Explore” from “Findings Explore” in the Analytics feature, as Scans are a point in time while findings are over time.

When we look at MTTR, we are inherently looking at a group of findings and their “Days to Resolve” dimensions, then taking an average of the total time to resolve divided by the number of findings.

Measuring MTTR for Your Organization

A customer recently asked me why he saw a different MTTR for his entire organization than when he found the average MTTR for his three business units.

For each application, you have N number of findings in a closed state with a Time to Resolve. When we look at the measure Mean Time to Resolve, we are actually providing the average Time to Resolve for the dimension selected. So, when you look at a single application and see “Days to Resolve,” you are actually seeing the average across N.

Therefore:

Average time to resolve = ( Σ x_i ) / n = (Sum of Time to resolve for each finding) / Number of findings

The sum is asking why (A= ( Σ x_i ) / n₁)+ (B = ( Σ x_i ) / n₂)+ (C= ( Σ x_i ) / n₃)+ ≠ (Z =( Σ x_i ) / n_A) where n₁, n₂, and n₃are each business unit (BU), and n_A is all three BU’s together.

When you look at this mathematically and think about the order of operations, you will realize that you always complete the Σ x_i before you divide by the number of findings. This means that each BU (A, B, and C) may have drastically different numbers than all of the BU’s together (Z) because you are taking the division step before you take the addition step—which is mathematically incorrect.

In short, you are providing equal weighting where there should not be equal weighting.

MTTR Calculation Example

Here is an example:

BU A contains 2 closed flaws that took 1 day to close. The MTTR for BU A = (1+1)/2 = 1

BU B contains 200 closed flaws that took 20 days to close. The MTTR for BU B = (200×20)/20 =200

If we then add those and divide by two, we do not get the MTTR for all flaws across the two BUs. Instead, we get a number that is meaningless because of the significant weight that is given to the two flaws of BU A. If we want the MTTR for all flaws across the two BUs, then we must add all flaws in Time to Resolve together and then divide by total number of flaws.

Using MTTR the Right Way

Across our customer base, we see a wide variety in MTTR. A lot of this is tied to the type of application and its criticality to the organization. Metrics and KPIs provide information, but it is up to the AppSec leadership to use the information and make data-driven decisions – both in running the day-to-day operations of the AppSec program and in managing the understanding of risk for the organization as a whole.

Final Thoughts

MTTR is a powerful metric for driving continuous improvement and motivating development teams. By understanding how it is calculated and tailoring it to your organization’s needs, you can use MTTR to improve your security posture and reduce risk.

By Tim Jarrett

Tim Jarrett is Vice President of Product Management at Veracode. A Grammy-award winning product professional with almost 30 years of experience building and marketing software, he joined Veracode in 2008 and has a Bacon number of 3. He has spoken on DevSecOps at numerous events including the Gartner Security and Risk Summit, RSA Conference, BSides SF, DevOpsDays NYC, ApacheCon Europe, and BrisTech, as well as on webcasts for DevOps.com, Black Hat, SC Magazine, Dark Reading, and the SANS Institute. He can be found on X and Threads as @tojarrett.

Using Mean Time to Resolve (MTTR) Effectively Across Static and SCA Findings