101
what and why
dbt embraces modular SQL (relates to [[Data Model for Modularity]]) and considers analytics code, transformations code as data assets.
what: dbt is a transformation workflow to
- modularize and centralize analytics code
- collaborate on data models: versioning, testing, documenting data models, transformations queries
features:
- most important features
- compile sql files with jinja into pure SQL script, the determine the order of executions, no DML, DDL required
- other features:
- document your model and its fields
- test your model
- manage packages
- load seeds file (static data)
- snapshot data for a point in time
dbt projects
dbt enforces high-level folder structure. the following items are must have in a dbt project:
dbt_project.yml
file (project configuration files)models
directorysnapshots
directory
(1) and (2) contains dbt resources.
resource configs and properties
==reference doc==: Resource configs and properties
what: (in the most cases, generally true)
- properties describe resources
- configurations control how dbt builds these resources in the warehouse
config / properties file include:
dbt_project.yml
profiles.yml
<– for dbt core user only, contains the information dbt needs to connect to data platform- often one profile for each warehouse in use (most organizations only have one profile)
properties.yml
<– to declare properties for dbt resources
building your DAGs in dbt
refer to [[dbt - build your DAG]]
references
childrens notes
1LIST
2FROM #programming_tools/dbt
3SORT file.mtime DESC
Organize and structure dbt project + naming convention:
- [[dbt style guide]]
External resources:
useful links
official documents: docs.getdbt.com
-
Medium: Ultimate guide to dbt -> visualization similar to Miro board = dbt canvas
-
Example dbt projects:
-
dbt and BigQuery:
- Getting started with DBT cloud: most detailed
- Gettting started with BigQuery and dbt, the easy way
- Example project: Google Analytics, Big Query, DBT
- Best practices → [[The marriage of BQ and dbt]]
-
CI/CD example with dbt:
-
usecase and best practices:
using dbt
during data modeling task:
- getting started dbt core
- https://gitlab.com/data-engineering11/dbt-tutorial
- https://github.com/dbt-labs/jaffle_shop
- Data modeling techniques for more modularity –> [[Data Model for Modularity]]
- medium: How we mastered dbt, a true story -> good read, practical view
refactoring SQL for modularity course:
- Course: https://courses.getdbt.com/courses/take/refactoring-sql-for-modularity/lessons/27999659-welcome-to-the-refactoring-course
- Course note: [[Course Note - Refactoring SQL for Modularity]]
Sample project: Running data pipeline with bigquery and dbt
- To read: Self-service BI with dbt
- To learn: What is dbt semantic layer (or semantic layer in general)?
useful packages
dbt-metalog
= https://medium.com/indiciumtech/dbt-metalog-your-metadatas-catalog-for-dbt-32eed2234b0e- dbt-utils = macros that can be (re)used across dbt projects
- dbt-expectations is an awesome package to create great expectations of your dbt project.
It’s worth mentioning that we didn’t define any airflow operator representing the dbt model on our own, we used the dbt-airflow-factory which automatically translates your dbt project to airflow tasks and supports the gateway between the staging and presentation layer.
→ More at: Build modern data platform in 4 months for volt.io
learn dbt custom macros
- benefits of macros in dbt
- basic tutorials:
to get started with dbt:
- What is dbt (medium)
- Business logic in dbt
- Using Base model in dbt - Best practice
- Learn dbt the easy way (core concept, summary)
dbt + static websites in GCP
dbt tips and tricks
- package: dbt-column-lineage
- suggested packages by dbt
dbt_codegen
dbt_utils
dbt_project_evaluator
dbt_expectations
dbt_audit_helper
dbt_artifact
dbt_meta_testing