Code Coverage: Is 90-95% really necessary?

October 21, 2021

There is a pyramid for testing. Up at the very, very top of the pyramid... And by the way, the size, the layer of the pyramid, is how much time you should spend on it, or how many of them there should be. So at the bottom, there should be a ton of them, at the top, there should be very, very, very few.

At the very top of the testing pyramid is smoke tests. These are tests that you run to verify that you didn't break anything massive. It should be one, two, three tests really, at most. It should be very few. And ideally, you can run these tests against production. In fact, any environment, because what I like to do is I like to deploy to an environment, let's say, staging, run the smoke test to verify that staging is not broken and then deploy to the next environment, run the smoke test. Usually that staging in production, when I was at Atlassian, we would deploy for production EU-west, US-west, US-west-2, US-east, US-central, all those different things.

So, at the very top, very, very few. Next, end-to-end test. Those are the ones we talked about before, those browser based tests. In the middle, integration test. That's a unit test. It's kind of like a unit test, but it also hits the database. So it's going through key things like, can I create a page? Can I delete a page? Can I do whatever? And you're able to look at the database to see if it worked okay. And that's called an integration test because you're not just testing the code, you're actually testing the data sources as well. And you might even test multiple layers of code.

Down at the bottom. You should have the most, and these are unit tests. These are ones that should have zero dependencies on anything. If you have an add function, you're passing in two plus two, and you're asserting that the response is four. You're not going out to the database. You're not going out to the internet. You're not hitting a microservice to make this happen. You're just testing that function.

So, when we talk about Sweeku talking about needing to have 90 to 95% test coverage, code coverage, probably talking about unit tests because those are super duper fast, really easy to create, easy to maintain and not flakey. And I'll be honest with you. I'm not a fan of setting targets for code coverage, saying, "We need to have a 100% test coverage. We need to have 90, 95." Generally speaking, I'm against that. But it depends. It depends on what you're doing.

Cubicart asked, "Doesn't it depend on what functionality you deliver?" Yeah, it depends on what kind of code this is. For example, let's say I was writing an algorithm which would parse HTML, arbitrary HTML, and extract some data from it. In that case, a 100% test coverage is the bare minimum. This is a very complicated piece of code that needs to test all kinds of different edge cases and things. Test should be hitting lines of code like 10, 20 times. So it's 1000% percent test coverage, I don't know if that's really a thing. But if it was, that's what I mean. I mean that you should just be testing the hell out of this thing.

If this is a protocol, like I am creating a protocol on how service A talks to service B, and I want to test how service B is taking these messages and interpreting them. You want to test the hell out of that thing. You want to make sure that that is completely covered of all possible edge cases from the unit test standpoint, because that's so core to your system. Where people often get really screwed up is that they start saying, "Well, I need a 100% test coverage, period." And then what you see them do is you'll have a function. And what this function does is, it takes three arguments, it calls four different services, internal services not microservices, four different services to do something and return something.

So, it's a four line function. And they'll say, "I need to have unit test coverage." So what they'll do is they'll have to create a mock for service A, a mock for service B, a mock for service C and a mock for service D to even call this function. So they call a function with three arguments. Mocks are driving all four lines of code, and then it's returning what the mock returned. At that point, you're not testing anything useful. You're wasting your time. The classic one in Java land is testing getters and setters. If I set a value, can and I get it back? Well, yeah, it's a getter setter. That's all they do. There is no other logic happening. Why are you writing a unit test that covers that? So code in the middle that's just gluing, it's called glue code, because it's gluing one layer to another. Often that doesn't need to be tested as well. Getters and setters. Absolutely not.

There's a number cases where you're not going to be doing anything that's worth testing at that level. But something like a parser, something like a protocol implementation, those things should be tested massively. A validator, for example, let's say have a validator. It's a function, which validates an email address. And maybe it's some regular expressions, it's only 10 lines of code, not a big deal. You should have a ton of unit tests for that because, you should be passing in all kinds of different values. You should be giving it all kinds of crazy things, null values, different character sets, different string lengths, different characters in those things. You should throw in some 32-bit you Unicode characters. You should be just doing whatever the hell you can to break that thing. And you're doing two things, one is you're making sure that what you wrote is bulletproof, very important. But, two, you're documenting what that code or what that protocol is.

I wouldn't say unit tests are the best documentation strategy, but I would say that they are a documentation strategy. So someone else coming later, you're now long gone. You've gone to a better job and you're making that 200,000 you always wanted to, whatever the number is. New developer comes in, they look at this and say, "What the hell is this supposed to do?" If they have a good set of unit tests, they can see, "Ah, it shouldn't take null values. It shouldn't take empty strings. It shouldn't take this, and here's tests to prove it." They're going to have a lot more confidence. If they look at a piece of code that looks really complicated and there's no test, what's it supposed to do? I don't know. What's it not supposed to do? I have no idea. Can it take null values? Can it take big strings? Can it take gigabytes of data? I don't know. There's no docs on this shit.

So you're looking at it and you're looking at a particular implementation in there and you're thinking, "Well, this seems wrong, maybe it's right. I don't know." And when you don't know, your risk goes higher. When your risk goes higher, then when you do make a change and go into production, the chances of it breaking in production are higher. Going back to what we were talking about before in the four metrics, is that your change failure rate is now going to go through the roof because you have no idea if what you're doing is going to break something. So you're trying to go faster. Now you're breaking things more, but have no test to cover it. You don't know what it's supposed to do. You're going to shoot yourself in the foot. So that is a long, long ass answer to, yes, you need end-to-end apps. And sometimes yes, you do need 90 to 95% coverage, perhaps 10000% coverage. But sometimes you really don't.Speaker 1: (