Real World Stories | Stories from the Trenches

Photograph of Don Brown

Don Brown

November 11th, 2021

Using Sleuth at Sleuth

So after being in the industry for so long, now I'm doing a startup, Sleuth, which is a tool that co-founder of, as you would expect from a tool that helps you with the metrics, we track the metrics, the DORA metrics. We track our deployment frequency, our change failure rate, our change lead time, all those sorts of things. But what does it mean to us? Because we're a small team and because I've learned from my previous experiences, the metrics, particularly the first two, change lead time and change frequency, or deploy frequency, are something I don't really worry too much about, I don't really focus on, because I already know that we deploy a lot and so metrics, again, are a tool to help you track your progress toward a goal. We don't really have a goal around deploying more frequently or deploying quicker because we're already pretty quick. We're deploying between four full time developers, we're deploying seven or eight times a day on average, which is pretty good.

And our change lead time is measured in hours. Yeah, it'd be nice. Sometimes we do get it down to minutes, but it's pretty quick. So from a door metric standpoint, those two, we keep an eye on just to ... As a curiosity, but we don't really use those to drive to a goal to improve because we don't have one. However, that's also a function of our size because we're a small company and we have a set of customers, not a huge set of customers yet, but a pretty small set of customers, the cost of a failure isn't as high as it was that say Atlassian, where if I pushed a bad change out the hip chat, it affected hundreds of thousands of customers and caused all kinds of chaos.

At our smaller size, the cost is a little bit lower, but I fully anticipate as we become more successful and add more customers and more people depend on us, the cost of failure gets higher. And this is where I think our next phase of growth, our company goals so to speak, will be focusing on the change failure rate in the mean time to recovery. Actually, not even so much the mean time to recovery but probably mainly the change failure rate because the whole trick with change failure rate is knowing what is a failure. And in order to know what a failure is, you need to be able to measure that and track that. And as we mature as a company and add more customers and add more impact, we will get better and better about knowing when our customer ... Our deployments are working and when they're failing. And I anticipate as we mature and as we mature particularly in our incident management process, we will be paying more attention to that stat, as well as the mean time to recovery, particularly as we onboard more people to handle incidents.

So again, the metrics aren't really important in and of themselves, they're all in service of a goal. The DORA metrics are really key to this, but again, and not the way you think. It's not a way that you say, "What are the four metrics? How do I make those metrics better?" And my goal is to improve them by 20%. That's the wrong way to think about it. The right way to think about it I think is what is the experience I want to create? I want to create an experience where my developers understand customer pain, have empathy, and are able to act and improve and deliver changes on that. And when you make that your goal, then the metrics become a way to know how well you're progressing toward that goal.

For example, as in my example on Twitter where I was talking about hearing about there's a customer problem, fixing it, and delivering it within 10, 20 minutes. You look back at creating that experience and that's DORA metrics through and through. Your change frequency is super high if you're delivering it that quick. Your change lead time is 10, 20 minutes. Crazy low. Your failure rate when you're delivering it that quickly and if there's single changes, you're going to have a very, very low change failure rate because you understand your change completely because that's the only change going out. And then your mean time to recovery. Well, if there is a problem you under that whole change is still in your head, you just delivered it right now. You can be like, "Oh yes, I totally made a mistake. I'm going to go fix it." Fix it, ship it out within a few minutes, and again, people are happy.

So it's not that you start with the metrics and say, "How do I make these better?" You start with the experience, the developer experience, the customer experience you want to create, and then how do you know if you're ... Well, how well you're going with it? Well, that's where the metrics can come in.

Related Content