During my career, for several times I had the chance to either build a new Data Warehouse for a company or to work with their existing one. Because of that, I understand what should be taken into consideration when people take care of their Data Warehouse and I wrote my 6 golden rules for a great environment.
Keep it simple
This is probably the most complicated rule but represents perhaps 75% of the development time.
Focus on simple steps and keep the data as granular possible. Clean and granular data enables the business to do whatever it wants or needs. It also enables developers to focus on improvements rather than maintenance.
Keep it clean
“Write code that is easy to delete, not easy to extend“
I’m a big fan of this sentence and I believe it’s not only applicable to coding. A Data Warehouse should be like this also.
How many times there was a flow that was redone but the old tables kept existing? How many times there is an SP called “legacy” or “old” and nobody deletes it?
That’s why I believe that a Data Warehouse should be cleaned at least once a year so old objects give space to others and keep the environment easier for everybody.
Keep it automated
Automation. Automation and more automation. Can’t stress more on this even if I wanted. A Data Warehouse should have flows and processes that are automated.
“New entity? Very well, no problem. Just add it to the Excel file and the flow will pick it up.”
Everything should happen inside a framework and automatically. If a developer spends too much time adding some config, changing some SP because of a new entity, editing a package so data ends in Data Warehouse, then I think the whole environment is bad.
Promote it
My belief is that business owns the environment. The sole existence of a Data Warehouse is to enable the business to do its job, therefore, the environment should be promoted everywhere and every time.
Another important matter is that if the business doesn’t have the data, may end up using data coming from uncertified sources. This is one of the greatest risks I see for a company as the Data Governance ends up being at risk.
Monitor it
As everything, an environment can only be at its better shape if people know exactly how it has been progressing.
The growth of tables can impact the performance of a Data Warehouse. Deployments and errors also can impact, that’s why I truly believe monitoring is crucial to have a successful environment.
Keep improving it
Never settle and never stop improving it. It can get faster, it can get simpler therefore never stop looking for ways to improve it.
Either new technologies come into the market or a flow is growing but never settle for the state of a Data Warehouse.