dbt principle

Staging

File and structure

The staging layer is where our journey begins. This is the foundation of our project, where we bring all the individual components we’re going to use to build our more complex and useful models into the project.

  • Subdirectories based on the source system: At staging layer, we should divide folder by source, it help to run all model base on the source whenever data source refresh. Source systems also tend to share similar loading methods and properties between tables, and this allows us to operate on those similar sets easily.
  •  Subdirectories based on business grouping: Should not group like this at this time, too early

Models

At staging, we should only

  • ✅ Renaming
  • ✅ Type casting
  • ✅ Basic computations (e.g. cents to dollars)
  • ✅ Categorizing (using conditional logic to group values into buckets or booleans, such as in the case when statements above)
  • ❌ Joins — the goal of staging models is to clean and prepare individual source-conformed concepts for downstream usage. We’re creating the most useful version of a source system table, which we can use as a new modular component for our project. In our experience, joins are almost always a bad idea here — they create immediate duplicated computation and confusing relationships that ripple downstream — there are occasionally exceptions though (refer to base models for more info).
  • ❌ Aggregations — aggregations entail grouping, and we’re not doing that at this stage. Remember - staging models are your place to create the building blocks you’ll use all throughout the rest of your project — if we start changing the grain of our tables by grouping in this layer, we’ll lose access to source data that we’ll likely need at some point. We just want to get our individual concepts cleaned and ready for use, and will handle aggregating values downstream.