Project Review time (The database)

I recently finished up on a project which has seen some troubled times. I think it is very important to stop and think about what did work, and what didn't work after a project is finished (or near finished), and get some pen to paper so that the memories do not fade (along with key learning's).

Starting right from core of the application, the database. Right from the onset, the client was driving the database design. We put up a few questions and concerns, but as the client was very insistent that we use this design as it was the same design used for another similar project, our hands were tied. In retrospect, this was a critical mistake, we really should have done some analysis on the types of issues we could have and explained to the client what our concerns were on a factual basis. I have never seen this before, but if the database is over normalised it can get quite complicated, and bit you very hard when you start look at performance.

Logical Model Vs Physical Model.
When designing a database, we use a variety of modeling techniques or stages to come to the final design, we build the databases logical model based on business, and the problem domain at hand. Once this design is complete, we can start building our Physical data model which will become our database. To explore this a little deeper, lets consider the good old sample of a person. In a business world there are many types of people. Executives, managers, workers, assistant , etc this could describe as the logic model. However, a physical data model, should probably not go and create table for each of these types of people, after all they are all people. We would create a person table, which would have a PersonType field to denote Executives, managers, workers, assistant.

Inheritance in the database
This was a major issue for our project, which bit us time and time again. We had a few areas which used inheritance in the database. Not only are we tightly coupling the database to the problem domain, we also have issues when retrieving the data. Consider the example above, if you wanted to get all executives, easy, just query the executive table. what if you wanted to all people above the age of 40? Now you need to left join or union all the tables because a person over 40 might come from the executive table, or the manager table or assistant table. Now think about the fact we might have a senior executive and junior executive, and this has been broken out into separate tables, which inherit from executive. How complex do you think the query will be now? Not only was this tightly coupled to the problem domain, it caused massive issues for the junior developers, who had little understanding of inheritance, and took months of explaining to describe how to get the entity type. Anyone not versed in programming concepts such as the BA, reporting guys, DBA, struggled to understand the whole inheritance thing. This lack of flexibility also bit us later when we wanted to change anything, such as add a new type. instead of adding a row such a with a relational database, we needed to create a table to define the type, meaning we needed to rebuild out entire application, and refactor all the code to handle the new type.
The purpose of all these tables is so that each type can store information that is relevant only to itself. so in the example, a person table would contain generic info such as name, title, age, salary and the specific table would contain specific information so executive table would contain bonus amount, and senior executive might have a hasCarSpot flag. This is excellent, because you do not get any null values in your database, as why would an assistant need a hasCarSpot flag, when would an assistant ever need a hasCarSpot? Well what about if half way through your project, we find a connected assistant who's father or mother is a highflyer that has insisted their child to have a car spot? Now we need to take out the hasCarSpot from senior executive table and add it to the person table since we decided anyone can have a car spot. Now the senior executive table is a hollow shell. When someone looks at this table they will simply ask why. We can not remove this table since there is a tone of code build around a senior executive type, and all we can do is hope someone does not delete this empty table some day.

To Normalise or de-Normalise, that is the question
As with any relational database, there are many entities which relate to each other, and to reduce data redundancy we normalise the database but sometimes, this can come at the cost of performance, as we need to use joins to retrieve our data, which can slow down access. Not only that, as more relationships are added, it becomes harder to conceptualise what is actually happening in the database, and how everything relates, for example, when you need to do a intersect between 3 or 4 tables, it becomes hard to see what relates to what. This may actually sound pretty simple, but consider when you have 7 or 8 places which require this kind of thinking to get to your end point. I think a good example here is resourcing. On any project, if you start to struggle for time, you can always add more resources, but there comes a point when more resources actually slows you down. The same can be said for relationships and the complexity it brings to a database.

As you can see, we had a few issues at the core of the application. These issues then resurfaced time, and time again at different points in the project. I think simplicity is a beautiful thing, just because something is simple in design, it does not mean it has not been thought out and thoroughly planned. I think simple designs are often confused with "simple" designs that have not been thought through, and hence people find a need to overly complicate things to ensure they do not fall into this category.

The database was the critical component within the application which is the hardest to change and had the most impact, over the longest period of time. I sugest you get as many people to review the design and take into consideration how it will be used and more importantly how it will be maintained. It may have a thousand tables, but it doesn't mean it is impressive.

Comments

Post a Comment

Popular posts from this blog

What good looks like!!

A microservice journey - part 2: what type of micro service are you?

Validation Rules