During my career, there is one thing that always kept my attention. I saw so many very different data environments.
What do I mean by this?
The most common architecture I see applied to a Business Intelligence project involves a staging layer and then the data warehouse layer. One layer to drop all that was extracted from sources which will be cleaned and get new keys according to business rules and then will rest in the data warehouse layer for consumption.
This is the most basic structure but usually, I didn’t see exactly this structure in the different places I had the chance to work. It can either have a pre-staging layer or a reporting layer that would exist after the data warehouse layer. These layers can exist because of several different reasons: performance, availability, consistency, scalability, restrict access among others.
Although these different reasons are very valid, usually, it also comes with a cost to the ETL process: time.
My experience showed me that usually, the structures get so big and very dependent between each other, that it takes
What should be the rule here?
I can’t tell if a decision is wrong or correct whenever I see these cases. Obviously that the architect made a decision with a context and to the best of his knowledge. I can only tell my preference and my principle whenever I designed a data warehouse: keep it simple. Fewer layers mean fewer environments to manage and maintain. Fewer layers mean, generally, faster ETL processes.
I know, of course, that every case is a case and should be analyzed individually. I’m not saying that all companies should go for the 2 database architecture (staging and data warehouse).
I’m saying that every architecture should be carefully designed and improved during time. In so many ways, the data environment is the same as a living environment, therefore, constant care is mandatory.