Tips for Testing Database Code
Almost everybody understands that source code belongs into version control. Many people understand we need to test our code. Quite a few do that automatically.
But everything seems to change when it comes to databases. A lot of stuff in and around databases goes untested. Heck some of the scripts don’t even live in version control. If you can’t believe that, because you haven’t seen it, you are a damn lucky bastard.
At least one reason for this state of affairs is obviously: It hurts. Database tests tend to be slow, interdependent and hard to maintain. Granted. But you know what: If it hurts, you should do it more often. It will teach you what exactly the things are that cause pain when testing and eventually you’ll find approaches to make it less painful. Here are some ideas I found helpful when testing database related code:
- Give every developer her own database. This forces you to find a way to set up the database fast, easy and reliable. If your application lives in a single user/schema/namespace it is sufficient for each developer to have his own user/schema/namespace in a single database. For this to work though …
- … the application should be user/schema/namespace agnostic. It makes it much easier to create multiple instances one a single server.
- Let the application live in a single user/schema/namespace. If you have multiple interdependent namespaces (e.g. for modules) you’ll have a hard time making them agnostic of the names.
- Have separate instances for CI, Demos, QA and so on. Actually ideally it should be trivial to create a fresh instance.
- Stay away from any tool that comes with its own repository. If have seen about a dozen of such tools and although some looked promising in the beginning, they all completely failed to integrate with other tools on the development side of things. Examples of such tools are tools for code generation from UML or ER models and ETL tools.
The previous points where about the setup of your database and infrastructure. Lets have a look at the code:
- Favor a proper Language (like Java, C, PHP …) over database specific languages like T-SQL or PL/SQL. If you are wondering why, make a comparison between your favorite DB language and your all purpose language. For which do you get the better IDE, Unit testing support, code analysis, code coverage and so on. Reconsider your main language if it doesn’t win in that comparison.
- Have a small layer that does all interaction with the database. Make sure no SQL or other database dependent code leaks out. Inject that layer as a dependency into anybody in need of it. This will allow you to test almost everything without worrying about the database. Only the tests for that integration layer actually needs a database for testing.
- Favor database independent SQL or a library abstracting away the differences of various databases. Back in the time people claimed they needed that in case they have to switch database vendors, which never happened. Now it does. See below.
The next points will deal with the actual tests:
- Consider an in-memory-database for testing. Many databases can run in an in-memory-mode. They are great for testing, because you can throw them away after the test, and they are way faster then any database writing to disk. This of course is only possible when you work with a database system that can run as a in-memory-database or if your code is database independent. Hence the previous point.
- If you can’t use your database as in memory database, consider putting it on a RAM disk. We got a huge performance gain for our tests with this approach. A solid state disk might be the next best thing although I’m not sure how it would react to the heavy load of continuous database tests.
- Make test failure messages so explicit that you don’t have to look into the database for analyzing test failures.
- Use code for setting up your test data. Make it nice and concise. If you need a row in a table without special requirements for its values you should be able to create that with a single trivial line of code, no matter how many foreign keys the table contains. I.e. you should have a little DSL for your test data. Doing it with plain code will enable all the refactoring power of your IDE for your tests. For load and performance tests other approaches like loading production data or large amounts of generated data might be suitable.
- Make sure your tests clean up after them selves. There are two approaches I found usable in most cases:
- Recreate the schema for every test. This is slow but really safe.
- Do a rollback after each test. This only works when there is no commit inside the test. The mean thing is: If a test tries to rollback, but fails because there was a commit inside the test some completely different test will fail. It can be a really frustrating task to find the bad test in such a case.
We covered the testing of database related code inside your application. But there is another kind of code we need to deal with: Scripts for deploying your application (or upgrading it)
- The scripts that change your database schema from one version to the next are source code just like everything else. Therefore they belong under version control and should get tested continuously.
- The testing process is really simple: Create a database as it looks now. Apply your change scripts and verify it looks as desired.
- For verifying the resulting schema it is useful to have a script that creates your target database (or at least the schema) from scratch, so you compare it with the result of the test.
- For performance reasons you might want to test this with an empty database first.
- But don’t forget to run it on an instance with realistic data as well. Hint: adding a column to huge tables can take loooooong.
Happy testing everybody.






[...] Update: Jens Schauder published a post about Tips for Testing Database Code. [...]
Hi Jens,
“Stay away from any tool that comes with its own repository” – even though you give some explanation, I am still not sure what kind of tools (and what repository) you talk about. Do you mean things like liquibase.org? Could you please elaborate on this?
–
Cheers,
Tomek Kaczanowski
@Tomek No, Liquibase is fine AFAIK. From their home page:
“All database changes are stored in a human readable yet trackable form and checked into source control.”
This is really important. There are modeling tools that have their own repository/database as part of the application. The effect is that you can’t use the myriads of tools we have for text based files, like diff, version control, editors and so on.
Imagine a situation where your for example java code is checked into subversion. Now there is no problem to obtain a consistent revision of that code. If your database model is stored as sql, xml or similar in the same subversion repository, you can get a consistent copy of both parts.
Now imagine data model is stored in a modeling tool instead. You have no link to your svn model. you can’t branch/tag properly. Even if you just take the complete repository of the modeling tool and put it in svn you still have
a) a manual process, bound to fail in many instances
b) no reasonable way of doing a meaningful diff between two versions
c) a big challange when you want to create a real database for testing based on a specific revision of the model..
Flyway can help you too. I wrote some time ago in my blog: http://alexsotob.blogspot.com/2011/12/far-away-long-ago-glowing-dim-as-ember.html
Very interesting read. I, too, advocate the need for database testing. I also like to challenge myself in writing more efficient SQL statements; I’m constantly wondering if I can do something better.
“Stay away from any tool that comes with its own repository” – that’s one thing I strongly disagree with.
UML tools are useful, as long as they are properly integrated into your IDE.
Database management tools are also useful, as long as you don’t maintain your database in the database itself, but just re-create the database each time from your SQL source, and keep the SQL files in source control as any other source file. It’s a lot harder to test and develop your database code without a tool
I also disagree about “Favor a proper Language (like Java, C, PHP …) over database specific languages” – it all depends on the job at hand. What you shouldn’t do is spread your business logic across different technologies. Putting all persistence-related logic into SQL, and keeping the actual logic which manipulates data in Java, C# or whatever your poison is sometimes not possible – operations which are simple but need to be performed on huge amounts of data are impractical in a non-SQL language. In which case you have two options: you delimit the long-running operations very clearly, implement them in SQL and wrap them neatly into functions in your Java/C#/whatever code, or, if most of your business logic lends itself to such an implementation, you make your non-SQL application a thin wrapper around the SQL application. Either way, you still don’t need to give up maintaining your application’s source in a source control system, and can still apply mostly all the techniques you described – including driving transactions from your non-SQL application.
@Anonymous Coward
I don’t think our opinions are so far apart. Modeling tools are helpful. But everything that results in an artifact that eventually becomes part of a deployment should if at all possible reside in you version control system, be text based and the process to create the final artifact should be completely scriptable.
As far as languages are concerned: I agree there are jobs that should be done using PL/SQL or T-SQL or whatever your RDBMSs equivalent is. But for the majority of application logic the lack of proper tool support is just to cumbersome.
For our db stored procedure testing, we have insert scripts that insert a known set of data into an empty database.
The procs are run with a known set of parameter (XML file keeps track of the proc name, parameters and expected output) and the returned data is compared to benchmark data.
Many tests can exists to handle the different parameter options.
It has taken us a long time to get to this point. This is just one small part of our database testing but it works pretty good.
@bushfoot
how do you deal with changes in your schema? I once worked with a similar approach, but we finally ditched it because it became just impossible to maintain the scripts for inserting data.
Nice entry. It has several good points in it but I liked especially this one:
“Stay away from any tool that comes with its own repository. If have seen about a dozen of such tools and although some looked promising in the beginning, they all completely failed to integrate with other tools on the development side of things. ”
I have used one such tool (it was a database design tool) in one large project I worked on a year ago. The tool did indeed look promising but in the end it was a real nightmare since all changes to the db schema would have go into that tool as well. But since we also had separate SQL change scripts for each version of the software, the we were simply adding the same stuff for two places. The tool itself did support schema import but it was always a very cumbersome process. And to make things worse, we had only one license for that tool. So it became my unpleasant duty to merge changes back to the tool. It ended up to a total nightmare.
The bad thing is that most of these tool Vendors have very good sales departments. Also, since those tools feels pretty promising, you might end up shooting yourself in the foot pretty easily. People should not be afraid of doing things in the “old school” way. It does indeed have some serious advantages.
Hi
Did you know that you can this all aotomatically (without the manual scripts) using SQL Source Control – http://www.red-gate.com/products/sql-development/sql-source-control/
Free 28-day trial if anyone is interested in taking a look.
[...] Тема выпуска: Как использовать и тестировать базы данных [...]
[...] addthis_share = [];}In this blog post, Jens Schauder shares some tips based on experience to improve the testing of database code. People seem to understand the need to test their code, but apparently this is not yet the case for [...]
I have found Liquibase to be an invaluable tool for managing database scripts. It maintains a table which tracks which scripts have/haven’t been applied to avoid applying a script twice. If a script has changed, liquibase can tell by checking a hashsum. If a script has changed it will either fail if a rollback is not possible or rollback/reapply when a rollback script is provided.
There are some points in this post I am disagree with. Please, keep in mind that my opinion is based on my personal experience, mostly with huge Oracle databases (banking and money transfers).
Having test-database for every developer is a good idea in theory, but it creates huge amount of extra work. All of those databases must be synchronized constantly and have to hold identical data. If a production database holds terabytes of data, so should every test-database otherwise testing is pointless. In performance critical applications solutions that work well on small amount of data generally perform terribly in the real world. So usually I saw the following installment – developers have their own copy of production database, so does the QA team and so on. These approach proved to work well.
Second point that I can’t agree with is a choice of language. Personally, if I had a choice of programming language I would certainly go with mentioned criterions “For which do you get the better IDE, Unit testing support, code analysis, code coverage and so on”. But it’s not my opinion that matters in the real world. When I had to deal with hundreds of database requests per second, that could not be cached in anyway and required direct communication with the data, the build-in language wins hands down over proper languages (PL/SQL in Oracle). Yes, it’s far from perfect, but it does its job better. Some modules of the application are working better with the proper languages, but when it comes to the data, the built-in language has no competitors.
Jens asked: “@bushfoot
how do you deal with changes in your schema? I once worked with a similar approach, but we finally ditched it because it became just impossible to maintain the scripts for inserting data.”
Well, you build your database in a CI (continuous integration) environment. If it does not build, someone has to change the insert scripts.
For this to work, you must go one step further than Jens’s first item: “Give every developer her own database.” This must be “Give every developer the means to create her own database.”
By this, people changing the database schema and writing tests can work in an agile way.
Of course, there’s more to all this: You need to keep your schema trackable – so for each and every SQL statement, you need a unit test that checks whether they still run. It helps if you have DAO layer (that contains all SQL statements); and it helps very much if you have a schema comparison tool (we use Redgate) so taht you can check in a “schema snapshot” and then check later what your developers changed.
We run a multi-million LOC software with (currently) more than 250 installations – and yet we allow every developer to change any part of the database schema in every (even minor) release … so yes, all this “agility on the database” does indeed work
Harald M.
@Harald I agree
@Yulya I choose the word ‘favor’ for a reason. There are reasons where PL/SQL or similar languages are the correct choice. Very large databases certainly might qualify.
I don’t agree with the approach to production data. Yes you need a database with production data (or comparably data) for testing performance and such. But that doesn’t mean every developer needs to use that kind of database all the time.
Anyway there are lots of great points in the comments and also from a discussion I had after doing a talk about this topic at OOP 2012 so I guess I have to do a follow up on the topic. Stay tuned
Very good topic, explained very clearly.thank you for good post.