Depending on the quality of your code you’ll find yourself sitting in front of your computer more or less often fixing bugs. It doesn’t matter what kind of genius you are, sooner or later you will recognize that a program you wrote does not behave the way you expected. You can minimize the frequency with unit-tested code, code-reviews, experience and tools (like Findbugs). That will minimize the probability of producing bugs dramatically but like the 1/x-function which never touches the x-axis, the probability will never fall to zero.
Accept the “impossible” scenario
When you recognize a bug by finding Exception-stacktraces in your logfiles or users report strange, unexpected behavior, there are two possible scenarios may coming up your mind:
1. You can identify the problem instantly because the Exception thrown tells you what happened or because of issues you had related to that bug. You know where and why the bug occurs and you know how to fix it.
2. The other category of bug is the one that makes you think “It is impossible – that cannot be!”. A NullPointerException is thrown, where you are almost certain that Object XY can NEVER be null at this point. Impossible!!!
Resist that thought, because it is wrong, because it won’t help you fixing the problem. The program tells you right now that it failed. The impossible happened before your face!
Information is King
When facing a bug from category two and you don’t know directly where the root-cause of the bug lies, you often need to gather information. Information about the state of the objects involved for example. Don’t hesitate and add that information to your log or extend the Exception-message. If necessary make your code ugly to gain that information, deploy and execute it. It doesn’t matter – you’ll wash that code-smell away after you found the bug. Context-based logging might help here: many developers print almost anything on DEBUG-level, however when you lower the log-level in your live-environment the system will dramatically slow down under load because of IO. So you might think about context-based logging to set a log-level on a per-user-base (e.g. when one specific user faces problem XY).
Win back control
Anyways you have to communicate with your system on runtime and you should do this using JMX-API. When using JMX you can add relatively easy a MBean holding parameters you collected to hunt down the bug. Many third-party-libs also provide MBeans to see internal parameters or even change them (Hibernate-Statistics for instance). When you are using a Staged-Event-Driven-Architecture with some ExecutorServices you might be interested in the size of the task-queue or the number of running Threads: add a MBean to see their current values (or change them).
Know your enemy
All the information you gathered is useless when you can’t put them into the right context. So you have to know your system. Sounds easy. But is it? Many projects are the sum of the work of different programmers. Code from programmers that are currently in your team, code from inhouse-libraries, code from programmers that left the building before it collapsed. When joining an existent team with several thousand’s lines of code, this can really be an issue. In that case you have to communicate: ask for information, ask for help. Sadly this isn’t the end of the story. Almost every project relies on third-party libraries: IO, Database and Caching for instance. And if the worst comes to the worst it is good to know the internals of that libs also.
I can see dead people
Information can only be collected when the program is running. However when the program stops running, you have a problem. Perhaps there is a deadlock that prevents the execution of your program. Most deadlocks can be found via jstack. In short, jstack prints the stacktraces of all live threads, including held monitors and detected deadlocks. It helps you understand which monitor is held by which thread. jstack requires the process-id of your (running) Java-application. You can retrieve the PID by typing jps –m. Then you can type jstack –l PID > PATH_TO_STORE_YOUR_LOGFILE_INCLUDING_FILENAME to store the thread-dump into a file.
When a deadlock occurs, the system often hangs and cannot be shut down properly. In that case you must terminate the program regardless of the consequences. This can be done via the task-manager under windows or with the kill-command (don’t forget the -9 property) under Linux.
I’m pretty sure I didn’t cover all aspects of the topic. And, to be honest, it is impossible to do this. Because there are so many libraries, so many different ways of solving a problem, there is no magic trick that helps you fixing or avoiding bugs. However I recommend you to keep your code simple. The more complex and the more sophisticated your code gets, the more difficult it will be to fix bugs (KISS-principle). Always remember this when you’d like to reinvent the wheel again.