Fixing bugs

Depending on the quality of your code you’ll find yourself sitting in front of your computer more or less often fixing bugs. It doesn’t matter what kind of genius you are, sooner or later you will recognize that a program you wrote does not behave the way you expected. You can minimize the frequency with unit-tested code, code-reviews, experience and tools (like Findbugs). That will minimize the probability of producing bugs dramatically but like the 1/x-function which never touches the x-axis, the probability will never fall to zero.

Accept the “impossible” scenario

When you recognize a bug by finding Exception-stacktraces in your logfiles or users report strange, unexpected behavior, there are two possible scenarios may coming up your mind:
1.    You can identify the problem instantly because the Exception thrown tells you what happened or because of issues you had related to that bug. You know where and why the bug occurs and you know how to fix it.
2.    The other category of bug is the one that makes you think “It is impossible – that cannot be!”. A NullPointerException is thrown, where you are almost certain that Object XY can NEVER be null at this point. Impossible!!!
Resist that thought, because it is wrong, because it won’t help you fixing the problem. The program tells you right now that it failed. The impossible happened before your face!

Information is King

When facing a bug from category two and you don’t know directly where the root-cause of the bug lies, you often need to gather information. Information about the state of the objects involved for example. Don’t hesitate and add that information to your log or extend the Exception-message. If necessary make your code ugly to gain that information, deploy and execute it. It doesn’t matter – you’ll wash that code-smell away after you found the bug. Context-based logging might help here: many developers print almost anything on DEBUG-level, however when you lower the log-level in your live-environment the system will dramatically slow down under load because of IO. So you might think about context-based logging to set a log-level on a per-user-base (e.g. when one specific user faces problem XY).

Win back control

Anyways you have to communicate with your system on runtime and you should do this using JMX-API. When using JMX you can add relatively easy a MBean holding parameters you collected to hunt down the bug. Many third-party-libs also provide MBeans to see internal parameters or even change them (Hibernate-Statistics for instance). When you are using a Staged-Event-Driven-Architecture with some ExecutorServices you might be interested in the size of the task-queue or the number of running Threads: add a MBean to see their current values (or change them).

Know your enemy

All the information you gathered is useless when you can’t put them into the right context. So you have to know your system. Sounds easy. But is it? Many projects are the sum of the work of different programmers. Code from programmers that are currently in your team, code from inhouse-libraries, code from programmers that left the building before it collapsed. When joining an existent team with several thousand’s lines of code, this can really be an issue. In that case you have to communicate: ask for information, ask for help. Sadly this isn’t the end of the story. Almost every project relies on third-party libraries: IO, Database and Caching for instance. And if the worst comes to the worst it is good to know the internals of that libs also.

I can see dead people

Information can only be collected when the program is running. However when the program stops running, you have a problem. Perhaps there is a deadlock that prevents the execution of your program. Most deadlocks can be found via jstack. In short, jstack prints the stacktraces of all live threads, including held monitors and detected deadlocks. It helps you understand which monitor is held by which thread. jstack requires the process-id of your (running) Java-application. You can retrieve the PID by typing jps –m. Then you can type jstack –l PID > PATH_TO_STORE_YOUR_LOGFILE_INCLUDING_FILENAME to store the thread-dump into a file.
When a deadlock occurs, the system often hangs and cannot be shut down properly. In that case you must terminate the program regardless of the consequences. This can be done via the task-manager under windows or with the kill-command (don’t forget the -9 property) under Linux.

Last words

I’m pretty sure I didn’t cover all aspects of the topic. And, to be honest, it is impossible to do this. Because there are so many libraries, so many different ways of solving a problem, there is no magic trick that helps you fixing or avoiding bugs. However I recommend you to keep your code simple. The more complex and the more sophisticated your code gets, the more difficult it will be to fix bugs (KISS-principle). Always remember this when you’d like to reinvent the wheel again.

One does not simply fix a bug

Definition of done

Remember the following scenario: you have to develop feature X, which is the part of system Z. It works properly and your product owner gives his/her “ok” to it. But as time goes by there are new features that have to be implemented into Z, say feature Y. This feature depends on X and after you implemented it, you have to work on X again. Now you must explain your product manager and – with some bad luck – your product owner, why you have to work on a feature which was previously labeled as “done”. They might ask you why you have to touch a feature, which they thought you would never have to touch again.

The first thing that might come to their minds is to blame you, the programmer. You didn’t work clean and with too much coupling in your code. Honestly, this was the first thought I had while thinking about this context. However on second thought, I realized that systems got more and more complex. You almost never can ensure that a feature, once implemented, works until the end of time without being touched again (to be honest, you can strike the word “almost”). This shouldn’t sound like an excuse to lame programmers which haven’t learned how to decouple objects and systems. To be clear: it is still a bad sign for your codingskills when you have to change classes across the whole system when a new feature has to be implemented – but sometimes you have no other choice.

So what to answer on the question from above? There are two possibilities:
–    Never, ever commit to the phrase “this feature is done” without the appendix “at the current time and with the current scope”. PM and PO won’t like to hear that but anything else is just lying.
–    Option B is that your code really sucks and you made a mistake while implementing feature X. It is time to be honest: admit that you made a fault and get yourself a copy of a book about design-patterns and system-architecture.

Happy coding!