DBT or data build tool is an open source software that has revolutionised the transformation in ETL (ELT) process when it comes to SQL.
Why we use DBT at Archetix
Originally, we had two main motivations:
- the unnecessary complexity of model orchestration and
- the difficulty of documenting individual models.
Orchestration
Orchestrating data models has historically been a more complex task than necessary.
There are many ways to approach this task:
- Orchestration using the time 12:00 12:05 12:10 you will find that you need something in between and have models by the minute. If you find you need to run something in between the models at 12:01 and 12:02, you are at the stage where you need to redo the whole orchestration.
- Orchestration using your own custom script. This is where it can either get very good or very bad.
- Using pub/sub or other messaging tools. This is one of the most robust solutions. Unfortunately the complexity of its construction is unnecessarily high.
- There are other tools that can help us with orchestration, unfortunately they are mostly either enterprise solutions like Airflow or tools that primarily serve other purposes.
Documentation
DBT leads us to keep documentation of each model. At the same time, it allows us to generate documentation that a business user can understand.
Unfortunately, DBT cannot automatically generate documentation. It “only” generates documentation of the information we specify.
Other advantages of DBT
DBT can create a Data lineage (see picture). Which helps us to understand the data models, how they are related and how they build on each other.
DBT allows data professionals to approach data in the style of devops principles. Which are principles without which modern software development could not exist.
It allows us to work with git, versioning, code tests, etc.
At the same time, DBT allows us to alert that the creation of our data models went well, or that a problem occurred.
DBT has a large growing community. Since DBT allows you to work with packages in a similar way to javascript or python, it is possible that you will discover a package that will make your work fundamentally easier.
Disadvantages
One of the few drawbacks of DBT is that it is a relatively young tool, with the stable first version released in early 2022. This means that many packages may not behave exactly as you expect.
At the same time, of course, DBT needs to be hosted somewhere. We can choose DBT cloud or host DBT on our own cloud or onprem solution.
Do you have any questions on this topic? We will be happy to answer them!