Most engineering teams have dashboards. Deployment counts, pipeline pass rates, story points completed, sprint velocity. The numbers get reviewed in weekly standups, reported to leadership, and used to demonstrate that the DevOps program is working.
The problem is that most of these metrics measure activity, not outcomes. And in a large engineering organisation, activity metrics are very good at making things look fine when they aren't.
Fifty deployments a week sounds impressive until you find out half of them were hotfixes for incidents caused by the previous deployments. Five deployments a week from a team with a solid testing culture and a change failure rate under 2% is a better engineering operation by almost any measure that matters. The number of deployments tells you how busy the pipeline is. It doesn't tell you whether the engineering organisation is actually getting better at delivering software.
The Four Numbers That Actually Tell the Story
The DORA research program — run by Google's DevOps Research and Assessment team over several years across thousands of engineering organisations — identified four metrics that consistently predict software delivery performance and organisational outcomes. These aren't theoretical. They're the metrics that, when tracked together, tell you whether a DevOps program is producing real improvement or just producing more activity.
The four metrics cover different parts of the delivery cycle. How often does code actually reach production? How long does it take to get there from the moment it's written? When a deployment causes a problem, how often does that happen? And when something breaks, how quickly does the team get back to normal? Tracked separately, each of these has obvious ways to game it. Tracked together, they're much harder to fake — because improving all four simultaneously requires the underlying engineering discipline, not just the metric optimisation.
The DORA research showed consistently that the highest-performing engineering organisations score well on all four at once. And the reason they can deploy frequently without high failure rates is specifically because they've built the recovery capability that makes frequent deployment feel safe rather than reckless.
Why Deployment Frequency Gets Optimised in Isolation
Deployment frequency is the easiest DevOps metric to improve and the easiest to game. Split your releases into smaller chunks and the number goes up. Automate your pipeline and the number goes up. Neither of these things is wrong — smaller releases and automated pipelines are genuinely good — but the number going up doesn't tell you whether anything important got better.
Going from monthly to daily deployments is a real achievement. But if the change failure rate jumped from 2% to 15% in the same period, the team didn't improve — they just moved the instability from the release cycle into the daily operation. Instead of one difficult release per month, they now have multiple incidents per week. The engineers are busier, the customers are more affected, and the dashboard showing deployment frequency looks fantastic.
This is the pattern that produces velocity theater — the appearance of DevOps maturity without the underlying reliability that makes high deployment frequency genuinely valuable. Leadership sees the deployment frequency number and concludes the program is working. The engineers handling the incidents know it isn't.
The Mean Time to Recovery Problem
Mean time to recovery is the metric that gets the least attention and is arguably the most important one in terms of what it reveals about an organisation's engineering culture.
A team with a low MTTR has invested in observability — they can see what's happening in their systems and identify the source of a problem quickly. They have runbooks and playbooks that make incident response structured rather than chaotic. They have practiced the recovery process enough that it doesn't require everyone to scramble.
When a team with poor MTTR hits an incident, the first hour usually goes to figuring out what actually broke. The second hour goes to understanding why. Then someone needs to get approval before touching production — another thirty minutes if the right people are reachable. Then the actual fix, which might be twenty minutes of work. By the time service is restored, the outage was two and a half hours long and twenty minutes of that was engineering work. The rest was organisational friction and insufficient observability.
Improving MTTR requires investment in three things: observability tooling that makes the system's internal state visible during an incident, incident response processes that reduce the coordination overhead, and organisational autonomy that allows engineers to make recovery decisions without excessive escalation. None of these show up in a deployment pipeline.
Lead Time Is Where the Business Actually Feels the Difference
Lead time is where engineering performance becomes tangible to the rest of the business. When a high-priority customer request comes in, how many days pass before it reaches production? When a security patch needs to go out urgently, what's standing between the fix and deployment? Most organisations that haven't specifically measured this are surprised to discover the actual number — because the waiting time between steps is invisible when you're inside the process.
The organisations that have genuinely improved lead time have done specific work to identify and remove the waiting time in that process. Automated testing that runs immediately on commit rather than during scheduled QA windows. A pipeline that only runs on a schedule introduces waiting time that has nothing to do with the quality of the work — code sits ready to deploy while the clock ticks toward the next window.
Feature flags go further — they let code ship to production without activating, which means deployment and release become separate decisions made at separate times. The deployment window stops being a constraint because deployment is no longer the same moment as the customer experiencing the change.
Change Failure Rate Reveals the Quality of the Testing Strategy
A high change failure rate is usually a testing problem, but not always in the obvious direction. The instinct when change failure rate is high is to add more tests. Sometimes that's right. More often, the problem is that the tests that exist are testing the wrong things.
A test suite nobody runs is not a safety net. It's a compliance checkbox. When a four-hour test run sits between a developer and their deployment, the pressure to skip it under deadline builds quickly — and it gets skipped. The tests that actually prevent production incidents are the ones that run fast enough to be part of the normal workflow, not the ones that exist in theory but get bypassed whenever things get busy.
A change failure rate that's stubbornly high despite extensive test coverage is often pointing at the alignment between what the tests verify and what production actually looks like. The fix is usually not more tests. It's better tests — tests that are faster to run, harder to skip, and more accurate in predicting production behaviour.
How to Use These Metrics Without Turning Them Into Targets
There's a well-known problem with any metric: once a measure becomes a target, it ceases to be a good measure. Teams optimise for the number rather than the underlying outcome the number was supposed to represent.
The way to avoid this with DORA metrics is to track them as a diagnostic set rather than as individual performance targets. No single number gets optimised in isolation. Changes are evaluated against all four metrics, and improvement in one that corresponds to degradation in another gets treated as a problem rather than a win.
The other protection is transparency — making the metrics visible to the whole team rather than only to leadership. Engineers who see their own lead time, change failure rate, and mean time to recovery as a continuous picture of their own performance tend to make better decisions than teams for whom these numbers are reported upward and acted on through management interventions.
What the Numbers Are Trying to Tell You
Numbers on a dashboard only matter if they're connected to something real. These four are. When deployment frequency is high and change failure rate is low, customers get new features without the service falling over. When mean time to recovery is short, an outage that might have lasted three hours gets resolved in twenty minutes and most customers never noticed. When lead time is measured in hours rather than weeks, the team that fixed a bug on Monday isn't still waiting for the next release window on Friday.
That's the connection worth keeping in front of the engineering organisation. Not "these metrics matter because the DORA research says so" but "the customers on the other end of this system feel the difference between a team that has built these capabilities and one that hasn't."
The business results follow from that. Faster iteration. Lower incident costs. A reputation for reliability that compounds over time and makes it easier to hire the kind of engineers who want to work in a high-performing environment rather than one that's perpetually fighting fires.
The measurement framework is what makes it possible to tell the difference between a DevOps program that's actually improving and one that's just keeping the team busy. Without it, progress is a feeling. With it, it's a number that either moves in the right direction or tells you something needs to change.