To me, there’s something thrilling about debugging. Other developers might find this strange.
Every bug — every software defect — is a mystery waiting to unravel. I get to play detective; I get to be detective. It is an opportunity to engage some unknown opponent in a battle of the wits. Sure, as with all the best mysteries, a bug will probably reveal itself through some grotesque calling card — some functionality that causes your app to stall, or some exception that crashes your Karaoke software just before a key change. But that’s fine — it’s what makes the mystery worth solving.
Roughly speaking, fixing a bug comes in two halves: finding the error, and correcting its behaviour. This post is about the first of those: tracking down the guilty lines of code — a sort-of crime tale, with the usual array of suspects, plot-twists, and moments of revelation. Tracking down the bug is the fun part.
In many cases, tracking down the bug is also the hard part. The key, however, is to follow a process, which can be used to find any-and-all bugs. (This isn’t so true about the fixing stage, which can range from changing a single character, to — on very rare occasions — rewriting the whole software.)
If you’re struggling for a process of your own, feel free to take mine. It’s not particularly original — much of it has been yoinked from a number of places — but it’s served me well over the years.
So, imagine there’s a new bug in town. Your job is to find it. Grab your magnifying glass and dust off your deerstalker — you have a mystery to solve.
The Priority Question
Hold up Sherlock! Before diving in, there’s one question you should answer: does this particular bug need fixing at this particular time? Is it really top of the hit list? There’s little worse than debugging on uncertain ground — you’ll spend half the time questioning yourself, wondering whether to throw in the towel.
By contrast, it’s reassuring to know that you’re working on the right problem. Your efforts will be appreciated. Should you take longer than expected — which is quite possible — you won’t struggle to justify your time. You probably won’t even have to.
How do you know whether this bug takes priority? It usually boils down to one thing: how many people are shouting about it.
Some bugs give you no choice in the matter. They cause a storm, sending your Support lines off the hook, and overwhelming your Twitter feed. This breed of bug tends to have been recently introduced, hitting your entire user base at once. It is always a priority.
Otherwise — during times of peace, as it were — you’ll have to weigh up the benefits. If a bug often comes up when talking to customers, or it’s causing a certain amount of pain, then it’s worth looking into. Mitch Duncan, on Simple Programmer, refers to a Risk Assessment Matrix, which — dry name aside — is a neat system.
Whatever your reason, first ensure you’re investigating the right bug. You’ll feel better for it.
Stay Calm
Here’s an open secret about debugging: it’s all in the mindset. In order to truly corner that bug, keep a clear head.
This isn’t easy. New problems come hand-in-hand with stress, made worse by bosses and users breathing down your neck. That’s a natural reaction, but try to move past it — debugging under duress has nasty consequences.
For one thing, stress leads to dead-ends and false fixes. A stressed developer will miss clues that they would otherwise have spotted. They will look for the quick fix, pointing their finger at any symptom they come across.
(Also, stress leads to high blood-pressure. That’s not good. No bug is worth a heart attack.)
To overcome this, you might have to work some self-inflicted Jedi mind tricks. For a start, try focusing on the problem, rather than any external factors. This can help keep things in perspective.
For particularly hairy bugs, I ‘allow’ myself a huge amount of time — say two weeks — knowing that it won’t take this long. With this in mind, I’m more relaxed and, inevitably, it takes less time to fix the problem.
Find a way to enjoy the challenge. As I’ve already admitted, I imagine that I’m detective on the hunt. Sometimes I’ll catch myself wondering who will play me in the inevitable movie — possibly a ‘Catch Me If You Can’-style Tom Hanks. “He just seems like a nice guy,” I’ll think … and that’s when I have to rein it in.
Try to relax. Concentrate on the issue at hand. Keep Calm And Debug.
Assume Responsibility
Now that you’re calm, I’m going to risk boiling your blood slightly. The bug is your fault.
Okay, this might not literally be true. Still, it’s valuable to assume responsibility for the problem. There are two reasons for this.
First, assuming responsibility is an exercise in humility. It means you’re putting the problem before anything else. You’re in the mindset of fixing the issue, rather than looking good.
Second, it probably was your fault. Or the fault is in your codebase, even if you personally didn’t put it there. Yes, you might suspect an issue with, say, a third-party library, but until you know for sure, it doesn’t bear thinking about.
This is all good news. Responsibility is an excellent thing. It puts you in control. It means you have a chance to fix the problem.
One point to make here — accepting responsibility does not mean finger pointing. We are out to catch the bug, not the programmer who introduced it. After all, who hasn’t been that programmer at some time?
So, are you calm and brimming with humility? Good. It’s time to get started.
Reproduce the problem
Every bug leaves a whiff. But that’s not enough — what you need is a trail. How do you find this? There’s really only one way: reproduce the problem.
Ideally, every bug report will contain simple step-by-step instructions, which consistently produce the broken behaviour. John Skeet calls these instructions a recipe. If it’s good enough for John Skeet, it’s good enough for me.
Once you can reproduce a bug, then you’re in business. You can trace that bug from start to finish. You can see all its effects and quirks for yourself. You can experiment with ideas. The entire mystery is laid out in front of you.
Unfortunately, a recipe isn’t always enough. Some bugs can be evasive, only showing up at apparently random times — popping their beady eyes up on those rare occasions, just to prove their existence (and to drive you crazy).
In these cases, don’t resort to guessing. Yes, you might be able to patch the problem up, but you won’t have the confidence of a true fix. It’ll still keep you awake at night.
Instead, your job is now simple: do whatever it takes to recreate that bug. A good place to start is simply talking to the bug reporter, and maybe watching them use the software.
In these cases — and this is a general theme of debugging — you’re looking for differences. This can come in many forms. Are you only using a feature in a specific way? As developers — just like users — we have our habits.
In the classic book The Pragmatic Programmer (UK), Andy Hunt describes exactly this situation. He worked on a graphics application, which — according to one tester — crashed when using a particular brush. The developer responsible refused to acknowledge this — he knew the software, and he had never seen that crash. It was only when he watched the tester, and witnessed the crash, that he spotted his mistake: he only ever used a left-to-right brush stroke when testing. This wasn’t enough to reproduce the bug.
Another common difference can be in your setup. Recently, I’ve seen a problem where a single dialog refused to work for one particular user. They would click a button and … nothing. After a bit of head-scratching — and fully aware we were clutching at straws — we tried the dialog on their second monitor. Suddenly it came alive. This quickly led us to the difference: the user had their monitors switched around. Bingo — a recreatable bug.
Break things, pull things out, it doesn’t matter — just reproduce that bug. Once you have a trail, you have a chance.
Simplify
Each bug comes surrounded by distractions — a duff musician in an otherwise-tuneful band. To help track it down, you’ll need to simplify the problem. You’ll need to remove those other band members.
In essence, the rest of bug hunting is just a form of simplification. You distill everything down — your recipe, your code — until you’re left with just the problem itself.
A good start is reducing the steps in your recipe. Does the bug still occur if you skip, say, step 5, and don’t resize that particular window? This will show you which steps are key to the problem, and remove any red-herrings, and speed up your tests. It’s a big bag of wins.
Sometimes the bug only appears with a particular file or dataset. What is different (there’s that word again) about this data? Try removing items — text, models, whatever the file contains — until you have just enough to reproduce the problem. As you do this — ruling out certain areas, focusing in on others — you’ll get a feel for the problem.
Once you’ve shortened your recipe, it’s time to start thinking about your code. It should be possible at this stage to hone in on the guilty code, be it through unit tests, the (sometimes derided) debugger, or even good old-fashioned print statements.
When you’re closing in on the culprit, you can still turn to simplification. For example, can you reproduce the problem in a smaller system? Again, I’ll defer to John Skeet, who recommends writing a tiny console app. Sometimes the bug will still occur, sometimes it won’t. Either way, you learn some valuable information.
Doing all this will challenge your assumptions. By really putting your code through the wringer, you’ll be forced to review your preconceptions and, possibly, misconceptions.
Throughout the whole debugging process, you should stay on the lookout for any chance to narrow the picture slightly. Every time you simplify, your job becomes a hell of a lot easier.
Binary chop
It’s not always possible to think your way through a bug hunt. Sometimes, your bug will be hidden in a swamp of code, without any clues as to what’s going wrong.
In these cases, I tend to reach for something less intellectually taxing: the binary chop. As the name suggests, this is brawn over brains. It’s a form of simplification, where you repeatedly chop your problem-space in half, until you’re left with the source of your pains.
Here’s a typical scenario: you have a sequence of things — lines of code, or builds, or whatever — with a known start and end point. At the start point, everything is fine; by the end point, things have broken. In other words, somewhere in that sequence is your bug. The binary chop will tell you where.
First, confirm that your start point is working and your end point is broken. (This may sound like a waste of time, but there’s little worse than binary chopping only to find that your starting assumptions were wrong.)
Now, check the halfway point. Is your system in a working or broken state? If it’s working, you can ignore everything before that halfway point. The halfway point becomes your new start point. By contrast, if the halfway point is broken, then it becomes your new end point.
Repeat this process, until you have found the cause of your woes.
The best thing about binary chopping is that it’s completely unbiased, and is unaffected by any incorrect theories. I can’t tell you the number of times that I’ve suspected one particular cause, only to find that the problem was a missing semicolon, or something equally impossible to predict.
Any time your problem is hidden in a known space, look to the top of your bug catcher’s toolbox. That’s where you’ll want to keep your binary chopper.
Refactor With a Vengeance
Most codebases have their rough areas — the bad patches of code that we clean-cut folk don’t dare tread. Whenever these areas are brought up in conversation, they cause us embarrassment and any number of coughing fits. But, so long as they work as expected — which is never actually very well — then we can carry on as if that bit of code doesn’t exist. Head in sand. There be no dragons.
Unfortunately, there’ll come a day when that area of the codebase — that bringer of shame — goes wrong. And, when that happens, it will be your job to fix it. That’s the downside of the whole assuming responsibility thing.
So what do you do when confronting this tangled mess? Refactor.
Refactoring is the act of restructuring of code, without changing its behaviour. In other words, it’s another word for tidying up.
The beauty of refactoring is that, as you tidy the code, you begin to understand it. These changes don’t have to be huge. Just by renaming a few variables, or separating things out into functions, you’ll spot patterns in the code. As you demystify this part of the codebase, you’ll feel more in control.
Refactoring is a huge subject, so I won’t go into it here. For more information, you can’t do better than the classic (and aptly-named) book Refactoring, by Martin Fowler. Clean Code by Uncle Bob is another good one.
I also highly recommend watching the Two Minutes To Better Code demonstration by LLewelyn Falco and Woody Zuill. It’s no exaggeration to say that this video had a huge effect on my own programming style. The principle idea is that you’re looking to make things better, not necessarily good. Good comes eventually.
One more thing — refactoring is only valuable if it’s focused on the right area of the code. There’s no point wasting hours tidying something that has nothing to do with your bug. As you clean the code, focus on understanding how things work. Then, once you’ve wrung enough information, get back on the debugging wagon.
Reset Your Thinking
Sometimes, with a particularly sneaky bug, I’ll hit a point where I’ve started to guess. A sprinkle of breakpoints here, a dash of print statements there — I’ve placed my fortunes firmly in the laps of the programming gods. In short, I’ve run out of ideas. This is my cue to reset my thinking.
The best approach is to just step away from your problem. Going for a walk sometimes helps. Doing chores is a good one. Not only does it freshen your mind up, it also acts as a context shift. It allows you to step away from the problem. It also allows your subconscious mind to start beavering away. And your subconscious mind is pretty smart.
Another thing to try is simply talking the problem through. A fresh pair of eyes — and mind — will often see something that you’ve missed. They may be able to suggest a different approach, or a particular insight, which sets you off on the right direction. They might also spot that glaring typo which is really bringing your system down.
But what if it’s 4 in the morning, and your only company is your pet hamster? Try asking the hamster. I’m serious. The simple act of verbalising your thoughts, and justifying your decisions, will force you to question your assumptions.
This is sometimes called Rubber Ducking, based on another tale from Pragmatic Programmer. It tells of a developer who would carry — you guessed it — a rubber duck around with him. If another developer came to ask a question, they were told to “ask the duck”. Allegedly, this would usually lead to a moment of enlightenment.
It’s a nice story. My one problem with that term is this: rubber ducks — or hamsters, for that matter — don’t ask questions back. If yours is, then it’s definitely time to step away from the keyboard.
The benefit of talking to a colleague is that you start to think like them. Certain people make excellent Rubber Ducks, because they ask sensible — almost predictable — questions. Of course, don’t tell them they’re an excellent Rubber Duck. They’ll likely take it personally.
With that in mind, a better approach might be to compose — but not necessarily send — an email to your best rubber duck. As you do, picture that person looking up at you, slightly weary-eyed, and asking you that killer question. Now imagine, what is that question?
Find the Root Cause
Some bugs are like criminal masterminds — your own Moriarty, buried deep in the shadows, pulling the strings. These bugs will only ever show their symptoms — say a null exception when a user clicks OK. Don’t be fooled — each symptom is just a henchmen, out doing the dirty work of the real bug.
It can be tempting to just patch up that initial symptom, and swiftly move on. Resist that temptation — it leads to bloated software, covered in sticking plasters. Worse, when you only fix symptoms, you actually hide any trails to the real problem. Covering up that null exception doesn’t tell you anything. Sometimes you have to leave the henchman on the street, so that he can show you to his boss.
Instead, you should keep digging until you understand the root cause. Ask yourself: what’s the real mystery here? Why was there a null exception at that particular point? This will often mean experimentation, and can end up taking you to all areas of the code.
Sometimes, as you dig, you will realise that you were right first time — you’re original sticking plaster was correct. That’s not particularly important, because you now better understand why. In the wise words of Max Kanat-Alexander, Make It Never Come Back.
Occasionally a root cause can’t be found — you’ll have to add that sticking plaster. This is always an open-ended solution to your tale. The culprit has fled. File these under “Unsolved Cases”, and expect them to return to fight again.
Otherwise, when you’ve solved all your mysteries, then you have likely found the real bug. It’s time to sit back, and think about how Benedict Cumberbatch — or, indeed, Tom Hanks — will play you in the movie.
Oh, and then you have to fix the damn thing. I’ll have to leave that one to you.