A common mistake is to simply look at the total number of failures instead of the change failure rate. The problem with this is it'll encourage the wrong type of behavior
Change failure rate is simply the ratio of the number of deployments with the number of failures, but the key here is to define what is a failure. And this will be unique to you, your team, and your service. And in fact, it'll probably change over your time as your team improves. A common mistake is to simply look at the total number of failures instead of the change failure rate. The problem with this is it'll encourage the wrong type of behaviors. Our goal here is to ship changes quickly. And if your simply looking at the total number of failures, your natural response is try to reduce the number of deployments so that you might have fewer incidences.
The problem with this, as we mentioned earlier, is that the changes are so large, that the impact of failing when it does happen is going to be high, which is going to result in a worse customer experience. What you want is when a failure happens to be so small and so well understood that it's not a big deal. Technically, the key here is to get the developer involved in production, ideally doing the deployment. What you want is when there is a failure, the developer is involved in production so that they understand the impact of their change and the failure you can learn from it, creating a critical feedback loop so the developer ensures that this type of incident never happens again.
Go from zero to one hundred deploys a day.
.webp)
What is Change Failure Rate?
A common mistake is to simply look at the total number of failures instead of the change failure rate. The problem with this is it'll encourage the wrong type of behavior
Change failure rate is simply the ratio of the number of deployments with the number of failures, but the key here is to define what is a failure. And this will be unique to you, your team, and your service. And in fact, it'll probably change over your time as your team improves. A common mistake is to simply look at the total number of failures instead of the change failure rate. The problem with this is it'll encourage the wrong type of behaviors. Our goal here is to ship changes quickly. And if your simply looking at the total number of failures, your natural response is try to reduce the number of deployments so that you might have fewer incidences.
The problem with this, as we mentioned earlier, is that the changes are so large, that the impact of failing when it does happen is going to be high, which is going to result in a worse customer experience. What you want is when a failure happens to be so small and so well understood that it's not a big deal. Technically, the key here is to get the developer involved in production, ideally doing the deployment. What you want is when there is a failure, the developer is involved in production so that they understand the impact of their change and the failure you can learn from it, creating a critical feedback loop so the developer ensures that this type of incident never happens again.
Go from zero to one hundred deploys a day.
.webp)