Skip to Content

Add Metadata

Quite often, you need to analyze not only the raw data but also information about the source files that contain the data. For example, you may want to differentiate between records or monitor how files are added in the source.

Such “data about data” is called metadata, and can be added in the Wrangler. For this, click on the “+Add Script Step” in the Wrangler and scroll to the Metadata section.

Add the Byte Offset Within the File

The byte offset shows how far away a record or row is from the beginning of the file. This distance is measured in bytes.

36_Adding the byte offset within the file.png
Adding the byte offset within the file

Add the File Modification Time

For file-based sources like S3 or SFTP, this is the “last modified at” timestamp that you can see in your file manager. It will populate values in the date-time format with millisecond precision. For other non-file sources, this is the time that Etleap extracted the data.

37_Adding the file modification timestamp (last modified).png
Adding the file modification timestamp (last modified) column

Add the File Path

The file path can be added as a string column. For file-based sources like S3 or SFTP, this is the file name including the parent folder(s) in the source. For S3, it does not include the bucket name. For databases, the file path shows the input table name. For all others, it shows the entity name (the item you selected when deciding what to Wrangle).

38_Adding the file path.png
Adding the file path column
Note

Note

This metadata comes in handy when you need to ensure data consistency. You can check if all filenames are following the expected naming convention, and if not, maybe there is a file incorrectly named, or the filename filter is incorrectly defined. Additionally, file names are often generated with a timestamp. Thus, you can check when the files were created and if some may be missing.