Good Logging Practices

published: 2009-09-16

Nearly everybody does some kind of logging in their application, but few do it in a well structured way. I'd like to point out some practices that I consider helpful in order to get the most out of your logging. This post will also work as a handout for my talk about logging at the herbstcampus conference. So here we go:

Know your audience. If you look at the log files of many applications you quickly realize who the intended audience is: Developers. The log is full with technical detail, but not much of it makes sense to anybody else. The groups you probably should have on your list are: Developers on the search for bugs, performance issues, Administrators trying to install, start or shutdown your application and Support personal trying to detect and analyze problems or trying to verify that a problem did not occur. Once you realize there are different audiences you'll understand what kind of information is helpfull in the logs.
Use a framework. The major options are: Log4j, SLF4J, JUL, Commons Logging or Logback. Each one has it's pros and cons, but they are all better then System.out.println. They allow you to configure at deploy or runtime, what gets logged where and in what format. If you don't want to do any research, and don't have special requirements, use SLF4J.
Logging must be rock solid. If a logging causes problems. Throw it out. Don't do experiments like remote logging, if you can avoid it.
Define rules, when to use which Log Level: I propose the following:
- DEBUG: Not important, except if you are researching a very specific problem.
- INFO: General information like startup, shutdown, version and configuration of an application. Information about important decisions in the code or changes in the configuration.
- WARN: Things that are fishy, but don't represent a problem in themself, but might indicate a problem when appearing very often. E.g. a failed log in attempt is such a candidate. If it happens sometime it is not a problem, but if it happens a lot somebody might want to look into that.
- ERROR: This should only be used when a problem occurs that somebody needs to look into. Typically but not necessarily relating to an exception in the code. Use this when only a limited part of the application is affected.
- FATAL: If something goes completely wrong, rendering the application unable to recover this log level is appropriate, very likely accompanied by a shutdown of the application or the session.
Of course your rules need some tayloring for your context, but absolutely need rules, and every developer must know and understand those.

Log everything you log exactly once: When analyzing log files I often see log messages that always come in groups. This happens often when exceptions get caught, logged, and rethrown. Don't do that. Log at the place where you actually do handle the exception. This is the only way to log an exception exactly once and also being able to decide which log level to use.
Include useful information and data: The information should typically include answers to the following questions
- Why are you logging this, typically the root cause of an exception.
- What is the affected use case? Did it happen during creation of an order or when importing product data from a remote site?
- What effect has it on the use case that was executing? Does the user need to repeat the action, or did it succeed? Did the complete batch fail, or just a single step?
- What instance of the use case was effected? For example an order id.
In order to make all this information available to your logs MDC (Mapped Diagnostic Context) is tremendously helpful. If you don't know MDC, follow the link and read about it NOW. MDC exists also in SLF4J and Log4J and should be easy to port for other frameworks as well.

Use more then one logging hierachy: The link contains more information about why. The simple idea is, that you can seperate different aspects you want to log by using different prefixes. This becomes especially usefull when adhering to the following points
Log the time needed for stuff which might be intersting for tuning: Of course this includes the time used for access to remote resources (e.g. webservices or databases) and execution of complex algorithms, but possibly also the time spend in the various layers in the application. You can use different log level for three categories, a separate logging hierarchy for separating performance logging from other stuff. Performance problems a notorious difficult to hunt down, especially when they appear over time. You'll love this logs, when performance becomes an issue.
Log stuff that worked, not only what failed: Distributed systems become common place more and more. When debugging distributed system one often has the situation that requests just disappear, and it can be extremely challenging to decide which component failed. After all not everybody does such a fine logging as us, right? So I recommend the following log events:
- RECEIVED: The application received a request through the UI, a Webservice or any kind of interface offered.
- SEND: The application send a request to some other component.
- GOT ANSWER:Â The application got a reply to a request it send.
- ANSWERED: The application answered to a request it received.

Log how often stuff happens: One possible reason for decreasing performance is a changed usage pattern. So logging how often per day, hour or minute a certain kind of use case is triggered can help tremendously.

While obviously I recommend a lot of logging, it isn't free of cost. So when doing some logging you should be sure to avoid it in tight loops that get executed millions of times. Also MDC keeps a reference to everything you put into, so make sure to clear it when you are done and think twice before putting complex objects in. And finally think about data privacy. A lot of people might have access to the log files. They shouldn't find passwords, credit cards or similar information their.

Probably somewhere around point number 8 you might have started to think: "Man this guy is crazy. I'll have more highly repetitive logging code than anything else in my application!" And if you do everything by hand you are probably right. But if you have cleanly structured code where you can deduct use cases, application layers, access to remote connections and so on from the method signature and the packages involved, most of these logging statement can get injected using AOP. So what is left is to create a framework that handles all this.

Since I couldn't find an Open Source Solution I started my own: Meta Logga. It is still in an extremely early stage, definitely not ready to be used, but possibly ready to understand where I am heading.

If you agree with me that this is indeed something that would be useful you can help in many extremely easy ways. Comment on this post, tweet it, dzone it, blog about or in any other way make this post more visible to others. All this will help me to keep motivated, thus it will help the project.

Talks

Wan't to meet me in person to tell me how stupid I am? You can find me at the following events:

« The Perfect Office « » Your Perspective is Biased »