Interfaces for Databases

published: 2009-01-14

I wrote before about the maturity (or lack thereof) of the art of software development in the database realm. So did others.
One thing I noted is that there is no discussion on how to break down a database into modules or packages.

So today instead of mainly complaining, I'going to make a proposal on how to manage you database in a modular fashion.

One of the strengths of databases is the concept of foreign keys, a declarative way of enforcing relationships.
But when it comes to modularity, foreign keys are the source of all evils. They link two tables with each other and there is no standard way to remove the dependency. In object oriented computing languages this would be done with interfaces.

Instead of Class A directly referencing Class B it would reference a nInterface IB and Class B would implement that interface. We now can replace Class B and Instances of Class A won't notice. How can we achieve this with databases?

Probably the first idea that comes to mind is â€œviews!â€. Views are basically select statements with the name. They do hide dependencies to the underlying tables so that is a start, but they do not correspond to interfaces for two reasons.

First: The dependency between a table and a view is just the opposite way than the dependency between a class and an interface. This means we can't reference views in our code and simply swap the underlying table.

Second: At least Oracle (and I guess this is true for most databases) can't use a view a foreign key relationship.
So views don't help that much.

Let's look once more at classes, interfaces. One thing to realize is that in the example above there isÂ a dependency between Instances of A and B but not between their classes. The dependency between the instances is created at runtime. How does that help with databases? Typically classes are identified with tables and instances with rows in tables and that works just fine for ORM tools.

But I want to propose a different mapping. How about considering the create script of the table â€œthe classâ€ and the actual table in a database a instance. If you choose this approach, everything falls into place. The interface becomes a create script that just creates the most essential columns of a table, possibly only the primary key. You now can have different implementing tables / scripts, which either add columns for your normal production version of the table or a more simple version with only a couple of fields. Or you create no additional fields at all.

Obviously this makes organizing the scripts more complex in the first step. What are the benefits? Actually this can help in many ways.

You can use it for testing. With database testing a common problem is the setup of test data. Since everything is connected with foreign keys you need to create tons of master data in order to just create a single row that you actually want to use in your test. Know you replace all tables referenced by the table you want to use in the test with dummy implementations, featuring just the primary key, so no further dependencies and the setup can happen easily, possibly automatically.

You can use it for a modular application. Imagine an application that references some kind of 'Order'. One customer wants an extremely simple order with just one line item with a fixed quantity of 1. For another customer it is the average order with multiple line items, different units of quantities (string is sold by the meter, other stuff is sold by piece ...). The third customer has extremely complex orders with rules governing what combinations of line items can go into a single order. You know this kind of order when you ever configured a car online. The forth customer has a different kind of order, but that doesn't matter because it is a third party system and your application has just to maintain a reference.

Normally you need a table structure that combines all possible implementations, making it more complex than even the most complex implementation that you actually need. With the proposed approach you'll have four different implementations and can choose the one you need for each deployment.

So how do we actually do this? Well that is the real challenge. Right now I don't know of any tool supporting this. One could use Hibernate with a dynamically created Configuration. But there are things in a database that you probably don't want to control via hibernate (e.g. partitions and tablespaces). You could create your own little script assembly tool, store DDL scripts in a directory structure, that has branches for each implementation variant. A little configuration file would define what implementation variant to use for which object. In any case we would need to stop developing directly in a database. We should work on the file system and deploy into a database even in development instead.

I guess there is plenty of room for more ideas and of course tools. What do you think? Is this an approach that could pay off? Is this old stuff and I just reinvented the wheel? Or is all this completely stupid and should be handled differently?

Talks

Wan't to meet me in person to tell me how stupid I am? You can find me at the following events:

« The Ribbon Sucks « » Applying Rules of Graphical Design to Judge the Beauty of Code »