Skip to Content

Update Mode

Pipelines in Update mode fetch new and updated records from the source and apply these to the destination table. Deleted records are also fetched and deleted from the destination table if the source supports fetching deleted records.

Loading image...
New, updated, and deleted records are extracted from the source and loaded to the destination table.

Pipelines in update mode must specify a primary key and a last-updated timestamp column in the destination table. This is done in the last step of a pipeline setup, as shown below.

In this example, the source is a MySQL table where Etleap uses SQL queries to extract data. In order to extract changes, the table must have a primary key and a last-updated column as shown on the left in the screenshot below. In the destination settings on the right we are providing the corresponding columns (“mappings”) in the destination table, which is defined by the output of the pipeline’s script. The pipeline must have both a primary key and a last-updated mapping in order to be in Update mode.

Loading image...
For update mode pipelines you set the primary key and last-updated timestamp columns in the last setup step.

Primary Key

A primary key is a unique identifier for a record. This can either be a single column or a combination of multiple columns in the destination table. For a given primary key value there exists only one row in the destination table at any time with that value.

When loading a set of records to the destination table, the primary key value is used to determine whether each record constitutes a new or updated record: if the primary key does not already exist in the destination table, then a new row is inserted, otherwise, the existing row is updated (assuming that the last-updated timestamp has progressed - see below).

Last-Updated Timestamp

A last-updated timestamp is a date-time value that indicates when the record was last modified. When two records have the same primary key value, the record with the most recent last-updated timestamp is loaded to the destination.

When a record is loaded that already exists in the destination table, the last-updated timestamps of the loaded and existing records are compared. If the loaded record has a later last-updated timestamp than the existing record, the row is updated, otherwise, the row in the destination is kept. This ensures data integrity when record updates arrive out of order.

Deleted Records

For sources that support fetching deleted record primary keys, such as Salesforce and CDC-enabled databases, update-mode pipelines will delete the corresponding rows in the destination table.

When records are deleted in a source but the source doesn’t support fetching deletes, the destination table may still contain these records. To work around this, you may specify a refresh schedule for the pipeline. The refresh fetches the entire contents of the source from scratch, effectively removing records deleted since the initial pipeline load or last refresh. This ensures that deleted records only exist in the destination table for as long as the refresh schedule’s interval.