Skip to Content
DocumentationTransformsPartition Output

Partition Output by Values

Partitions data in your data lake using column values. You can find more information on data partitioning here.

Example

idpathevent_typeregion
0/logoutclickeast1
1/homepageloadeast1
2/userpageloadeast1
3/homepageloadwest1
4/logoutlogoutwest2
5/settingsclickeast1

idpathpartitions
0/logout

click, east1

5/settings
idpathpartitions
1/home

pageload, east1

2/user
idpathpartitions
3/home

pageload, west1

idpathpartitions
4/logoutlogout, west2

Configuration

Column

Select the columns that the transformation output data will be partitioned by.

Any columns selected in this step will be removed from the output data and encoded in the path of the output files.

Operation

Leave the Operation value as Parition Output for this transformation.

Screenshot of configuring the Partition Output transform

Effect on Output Data

When this step is added to a transformation script, the output of that transformation will be partitioned by the selected columns. When more than one column is selected as partition keys, the order in which the columns are specified in the transform determines the order in which the data is partitioned.

For example, if the event_type and region columns are selected as partition keys, the files loaded to the Data Lake destination will be organized by the following directories:

s3://bucket-name/output_path/v{version}/{event_type}/{region}/{load_date}/{is_deleted}

There will be one directory for each distinct combination of values across the selected partition columns. If the Data Lake connection is configured with a Glue database, then the tables created in Glue will contain a partition for each combination of column values in the selected columns.

See Data Partitioning for more information.

Key Considerations

  • This transform can only be added to a Wrangler script once.
  • This transform is only valid for S3 Data Lake destinations.