[caption id="attachment_349" align="alignleft" width="225" caption="Support beams and wires of a bridge"]Support beams and wires of  a bridge[/caption]

When reading the specification of a piece of software to be written, you are bound to find some non functional requirements. Among these there will be, or at least should be Supportability. But what the heck does that mean? How do you install supportability? Let me present some ideas, what you can do to improve supportability.

Let your application log in a well defined reliable way, to a location that is easily accessible. Flatfiles on a server qualify. If you must log on a client, consider implementing a way to automatically transfer the log file to a support person. If you log into a database, make sure the support person can access it easily.

Make your application easy to shut down and start. This sounds trivial, but it is easy to break this ability. Considere the following check list:

  • What happens when you start two different versions of your application against the same database? A nicely supportable application should notice this and react accordingly.
  • What happens when you stop your application while it is processing a request?
  • If you have batches or batchlike processes, what will happen with those when you try to stop your application while the batch runs? Do you have the 5hours, until the batch finishes? Or will the batch stop and rollback, so you have to wait an hour for the rollback to happen? Or will it stop nicely within a minute, and pick up its work automatically after restart?
  • Are you stuffing stuff into a database or a queue? What happens when the queue or the database gets started after the application?
  • Do you receive messages from a queue? What happens when the first message arrives, before your application is available? What does happen when you receive a message that you already processed? What does happen when you receive a message for which you already processed a later message?
  • How long does it take to shutdown and restart an instance of your application when it is under full load? If this takes more then a few minutes, is it possible to stop and restart only parts of your application?

Make the state of your application visible for support personnel. Most applications just report arbitrary errors when a component of the system is down. It's up to the supporters to guess if it is the database, a queue, a webservice or the application itself which is causing the problem. Identify the resources your application depend on. Write a check which tests all these resources, and make this check available, for example as a special health check webpage.

Put components that are likely to fail behind some kind of buffer. Your database might be so important for the application that this doesn't work for it. But if you are posting stuff to a queue (or webservice or ...), consider using a local queue as a buffer, so your application can work as usual even when the target queue isn't available.

Last but not least: Document your application. The agile manifest says that working software is more important then documentation. It doesn't say you don't need documentation. And I'd say the documentation for servicing your application might be the most important one. The normal user who uses your application everyday will figure out a way to get along. If not he will call you or your boss. But the poor support person has to support dozens of the applications and since your application just works he'll encounter it only a few times a year. He will know nothing about it except the stuff documented in the manual. So make sure there are instructions on how to interpret the logs, how to shutdown and restart the application, how to analyze the internal state of the application and what happens when some connected component fails.

Have you noticed something? The vague non functional requirement 'supportability' turned into a nice set of very functional requirements. You can attach a price to that, decide what pieces of it you'll really need and measure if it really works. And I claim this works with all the much hated non functional requirements.


Wan't to meet me in person to tell me how stupid I am? You can find me at the following events: