Entity Mapping
This page provides a detailed explanation of how Etleap dbt Schedules work.
dbt Builds
Etleap uses the dbt build
command to trigger the execution of dbt models in your data warehouse.
Apart from running all dbt models that are referenced by the selector of your dbt Schedule, the build command also runs dbt tests, snapshots and seeds.
Relevant dbt Entities
The dbt selector, and model entities are relevant to Etleap dbt Schedules. Dbt sources must be defined in your dbt project to make use of Etleap’s end-to-end data pipelines.
Selectors
A selector specifies which dbt models should be run during each build of a dbt Schedule. A dbt Schedule references one dbt selector in order to specify which models to build. Dbt selectors simplify referencing dbt resources (like models) in your project.
The schedule will include all models that are referenced directly by the assigned selector, as well as parent models (as models can reference other models in their definition).
To enable, you must specify the selectors in the selectors.yml
of your dbt project. You can find more about setting up selectors here .
Models
Dbt models are queries that define tables or views in your data warehouse. They are run during a dbt build. Dbt projects usually specify many models. Models can be referenced by selectors, which Etleap’s dbt Schedules use when running builds. Models can reference other dbt models in the definition and can also reference dbt sources.
Add your models to the models/
directory of your dbt project.
Models are typically written in SQL.
You can read more about models in dbt here .
Sources
Dbt sources define tables or views in your data warehouse that have been created outside of dbt. Sources allow your models to interact with existing data in your warehouse. This can be data ingested by Etleap pipelines. Etleap maps destination tables of existing data pipelines to dbt sources. This allows for dependency scheduling of pipelines and enables end-to-end data pipelines.
You specify your sources in the same .yml
file in the models/
directory that you also referenced your models in.
For more information about dbt source definitions, go here .
dbt Source to Pipeline Mapping
Sources allow your models to interact with existing tables or views in your warehouse. These tables can be destination tables from Etleap pipelines that ingest data into your data warehouse. Etleap automatically maps the specified dbt sources in your dbt project to existing Etleap pipelines. These mappings can be utilized to schedule pipeline runs based on a dependent dbt Schedule.
To map an existing pipeline to a dbt source of a dbt Schedule, Etleap checks the following:
- The schema and table name of the pipeline’s destination and dbt source must be the same. Note that
schema
is an optional property in the source.yml
file, and that it defaults to the sourcename
property. - The hostname and port of the pipeline’s destination connection and the dbt Schedule’s connection must be the same.
- The pipeline destination’s database has to be the same as that of the dbt source. The dbt source’s database is the optional
database
property in the.yml
file if specified, and assumed to be that of the dbt Schedule’s connection otherwise.
A dbt source specification in your .yml
file can look like this:
sources:
- name: <table_schema>
database: <connection_database_name> (optional)
schema: <table_schema> (optional)
tables:
- name: <table_name>
Only pipelines the user has access to are matched to dbt sources and will be included in a dbt schedule.