Validation Checks
Validation checks are rules that are applied to Wrangler output columns that evaluate each row before it is loaded to the destination. Rows that fail a check are treated as parsing errors and are held back from the destination.
How Validation Checks Work
Validation checks run as part of the pipeline’s transformation stage, after all Wrangler transforms have been applied. Each check evaluates a condition against the transformed data and flags rows that do not pass.
There are three types of validation checks:
- No Null Values: rejects rows where a specified column is null.
- Values Must Be Unique: rejects rows with duplicate values in a specified column.
- Custom Rule: rejects rows that do not satisfy a boolean expression you define.
For a step-by-step walkthrough of adding checks to a pipeline, see Validate Data Before Loading.
Combining Checks on a Column
Multiple validation checks can be applied to the same column. For example, a column can have both a No Null Values check and a Custom Rule check. In this case, each check is evaluated independently.
A row may fail more than one validation check at a time.
Pipeline Refreshes
Adding, modifying, or removing a validation check requires a pipeline refresh. During the refresh, Etleap reprocesses the data to apply the updated checks.
Failures and Parsing Errors
When a row fails a validation check, it is surfaced as a parsing error. This means:
- The row is not loaded to the destination.
- The failure counts toward the pipeline’s parsing error threshold. If the threshold is exceeded, the pipeline’s behavior depends on your parsing error settings.
- You will receive a notification about the failure, and it will appear in the Dashboard and on the pipeline’s Overview page.
For guidance on resolving validation failures, see Reviewing and Resolving Failures.