Skip to Content
DocumentationPipeline Refreshes

Pipeline Refreshes

When a pipeline is refreshed, Etleap either extracts all the data from the source or reprocesses the previously extracted data without ingesting it from the source. During this process, Etleap creates a new version of the pipeline that operates at the same time as the current pipeline version.

Pipeline refreshes trigger for various reasons in Etleap including:

  1. Incompatible script changes that necessitate transforming the data from scratch.
  2. To re-establish consistency between the source and destination by capturing deletes for some source types.

How are Refreshes Triggered?

Pipeline refreshes can be triggered from the pipeline’s Overview page. Within that page, you have two options: trigger a manual refresh or set a periodic refresh schedule.

Refresh options on the pipeline overview page.
The refresh options on the pipeline overview page.

Manual Refreshes

To trigger a pipeline refresh instantly, select the three-dot menu and then click Refresh. Confirm your choice by clicking Refresh Now.

The tab to manually trigger a refresh.
Triggering a refresh manually.

Setting a Refresh Schedule

To configure a pipeline refresh schedule, select between Hourly, Daily, Weekly, or Monthly options and click Update Schedule. Regular refreshes are useful for source types that contain unwanted deleted data.

Refresh schedule setting
Refresh schedule.

Incompatible Script Changes

Some script changes require all of the data to be reprocessed. These types of script changes will automatically trigger a pipeline refresh when they are applied. When you update the script in the Wrangler, Etleap will indicate whether a refresh will be triggered before you confirm the script change.

Sample message for a refresh inducing script update
A script change that requires a refresh.

Bulk Refreshes

You can also start refreshes for many pipelines at once on the pipeline list page. When refreshing pipelines in bulk, you have the option to keep the already extracted data.

Bulk pipeline refresh from pipeline list page
The pipeline list page.
Note

If a refresh is already in progress, Etleap will cancel it and trigger a new one.

Keeping Extracted Data

Etleap can keep the existing extraction data and only reprocess the transformations and loads, depending on the way you trigger your pipeline refresh. See below for details.

Refresh TypeKeep Extracted Data
Refresh SchedulesNo
Incompatible script changeYes
Manual RefreshNo
Bulk RefreshUser decides
Bulk pipeline refresh trigger message
Choosing whether to keep extracted data when refreshing pipelines in bulk

What happens during a Pipeline Refresh?

When your pipeline starts a refresh:

  1. Etleap reprocesses all available data in the source, and loads to a separate location in the destination so that your current destination table can continue to be updated.

    1. For warehouse destinations, the data processed for the refresh is loaded to a temporary table, with a name like n8FJv3aC.
    2. For S3 lake destinations, the data will be loaded to a separate path under the next version number.
  2. The current location in your destination will continue to load new data with a few caveats:

    1. If the refresh was triggered before the pipeline completed its initial load, the initial load terminates, and only data for the refresh processes.
    2. If the refresh was caused by a script change, the existing location is updated with data processed by the old version of the script, while the refresh location will be updated with data processed by the new script.
    3. If another schema change is made to the old version of the script while the refreshing pipeline is using the new version of the script (from step 2), the data flow for the current location will stop and only the refresh will be processed. This is because a newer version of the script already exists for the refresh so the script being used for the existing location cannot be updated.
  3. Once all of the data in the source has been processed and loaded to the refresh location, the refresh completes. For warehouse destinations, the destination table is atomically replaced by the temporary table.

When there’s a pipeline refresh in progress you can see its status inside the pipeline overview page.

The overview page showing that there is a refresh in progress. The yellow card represents the refresh, and it will show the progress of the refresh.
The overview page showing that there is a refresh in progress. The yellow card represents the refresh, and it will show the progress of the refresh.

The refresh’s activities can be seen inside the Activities (Refresh) tab.

The activity page showing the activities for the refreshing version of the pipeline.
The refresh activities on the pipeline activity page.
Note

Once the pipeline refresh is complete, the refreshing pipeline version becomes the current pipeline version and only the activity history for the latest refresh is shown in the UI.

There can also be a substantial amount of data that needs to be extracted, transformed, and reloaded during a refresh. In order to optimize the refresh process, Etleap combines all extractions so that the transformations and loading are only done once.

Warning

Extractions are not combined during the refresh of some types of pipelines and connections.

  1. Replace mode pipelines, since each extraction contains all of the data.
  2. History retaining pipelines, to avoid re-writing the historical record.
  3. Sharded JDBC connections, to avoid losing the sharding information.
  4. Mixpanel pipelines, due to Mixpanel’s API configuration.