Skip to Content
DocumentationIcebergIceberg Table Maintenance

Iceberg Table Maintenance

Etleap automatically performs two types of maintenance operations on your Iceberg tables:

  1. Compaction.
  2. Snapshot Expiry.
Maintenance activities for a batch-based Iceberg pipeline
Maintenance activities for a batch-based Iceberg pipeline
Maintenance activities for a streaming-based Iceberg pipeline
Maintenance activities for a streaming-based Iceberg pipeline

Compaction

Compaction rewrites data and deletes files in Iceberg, combining them into fewer, larger files. This reduces the number of files the Iceberg reader needs to scan, and improves the performance of your Iceberg queries.

Etleap runs compactions on Iceberg tables when there have been more than 10,000 row operations (inserts, updates, or deletes) on that table since the previous compaction.

Find more information on compaction in the Iceberg spec here .

Snapshot Expiry

Snapshot expiry removes old versions of the table from the Iceberg metadata, and cleans up any data files that are no longer referenced by the remaining snapshots. It does not affect any data in the snapshots that are kept, so all data in the active version of the table remains queryable.

This prevents bloating of the Iceberg table’s metadata and reduces both read and write times to the table. However, it does limit how far back you can time travel within the table’s history.

Etleap runs a snapshot expiry regularly whenever rows in the table have changed. The two latest snapshots on the table’s main branch will always be retained, any older snapshots are expired to limit the disk space used by the table.

Find more information on snapshot expiry in the Iceberg spec here .

Equality and Position Deletes for Update-Mode Pipelines

To ensure fast data ingestion while maintaining excellent query performance on your main table, Etleap uses a dual-branch architecture:

  • real-time branch: Receives high-frequency streaming writes from your pipeline. This branch prioritizes ingestion speed using equality deletes.

  • main branch: Contains the data you query in your warehouse. Etleap automatically converts data from the real-time branch to this optimized main branch using position deletes for better query performance.

Snapshot expiry maintenance will always preserve the two most recent snapshots on the main branch, along with any snapshots on the real-time branch that have not yet been converted. All other snapshots will be removed.

To enable this feature for your organization, please contact support@etleap.com.