Iceberg Connection Setup

Warning

This feature is currently in preview

Iceberg connections are currently in preview and therefore may contain bugs and rough edges.

The properties required by Iceberg connections may change without notice. Please also familiarize yourself with the considerations at the end of this document.

This page details the manual setup steps required to configure an Iceberg connection in Etleap.

Etleap creates and manages Iceberg tables by writing files to S3 and interacting with your AWS Glue catalog. In order to setup an Iceberg connection in Etleap you will need to provide the following properties:

IAM Role: An AWS role that provides Etleap permission to read and write from S3 and manage tables in AWS Glue.
S3 Bucket Name and Path Prefix: The location in S3 that Iceberg metadata and data files will be written to.
AWS Glue Catalog: The name and region of the Glue catalog that will store the definitions of your Iceberg tables.
Warehouse Connection (Optional): A Redshift or Snowflake connection that Etleap will create external Iceberg tables in.

Step 1. Create an S3 Bucket and Glue Database

Note

It’s important that all the AWS resources below are created in the same AWS account and region. The URLs below assume that us-east-1 is your preferred region, but you can use any region you like.

Go here to create a new S3 bucket to store the Iceberg tables. If there is an existing bucket you would like to use, you may skip this step.
- Give it a memorable name and leave the other fields with their default values.
Go here to create a Glue database. If there is an existing Glue database you would like to use, you may skip this step.
- Click Add database, specify a name and, optionally, a description. Leave the other fields blank.
- Click Create database.

Step 2. Create an IAM Role

Create an IAM role for Etleap to assume in the AWS console here .
Select Trusted Entity Type AWS account and then choose Another AWS account.
Check the box that says Require External ID.
For the Account ID and External ID, enter the IDs provided in the instructions dropdown within the Role section of the Etleap Iceberg connection setup page .
Skip adding permissions for now. Click Next until you reach the Review page.
Enter a name for the role and click Create role.
Find the role you created. In the Permissions tab, under Permissions policies, click Add permissions and pick Create inline policy.

Click JSON and copy-paste the policy below, and make the following replacements:

ICEBERG_BUCKET_NAME is the bucket created in step 1.1.
AWS_ACCOUNT_ID is your 12-digit account ID.
GLUE_DATABASE_NAME is the Glue database created in step 1.2.
AWS_REGION is the region the resources have been created in.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::ICEBERG_BUCKET_NAME", "arn:aws:s3:::ICEBERG_BUCKET_NAME/*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetTable",
        "glue:GetDatabase",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:DeleteTable"
      ],
      "Resource": [
        "arn:aws:glue:AWS_REGION:AWS_ACCOUNT_ID:catalog",
        "arn:aws:glue:AWS_REGION:AWS_ACCOUNT_ID:database/GLUE_DATABASE_NAME",
        "arn:aws:glue:AWS_REGION:AWS_ACCOUNT_ID:table/GLUE_DATABASE_NAME/*"
      ]
    }
  ]
}

Click Next and give the policy a meaningful name, e.g. etleap_iceberg.
Click Create policy.

Step 3. Configure a Data Warehouse (Optional)

You can configure the Iceberg connection to make tables accessible in either Redshift or Snowflake. Both require a one-time setup to configure access to S3 and Glue.

Redshift

Follow these steps to configure an external Iceberg schema in Redshift.

When an Iceberg connection is setup to point to Redshift, the Iceberg tables Etleap creates will be accessible in Redshift. In addition to this, Etleap will create and refresh a materialized view to provide a type-1 view of the Iceberg table.

Snowflake

Follow these steps to configure an external volume and catalog integration in Snowflake.

When an Iceberg connection is setup to point to Snowflake, Etleap will create and refresh an external Iceberg table in Snowflake. In addition to this, Etleap will create a type-1 view on top of the external Iceberg table.

Step 4. Create an Iceberg Connection in Etleap

Create a new connection in Etleap and select Iceberg as the type. Complete the connection setup with the following properties:

The IAM role created in step 2.
The S3 bucket you created in step 1.1.
The Glue database you created in step 1.2.
Optionally, the Redshift or Snowflake connection you created in step 3.

Key Considerations

This feature is currently in preview. As a result, there are the following caveats to consider.

Iceberg Connection Limitations

Setting, unsetting, or changing the warehouse connection property on an existing Iceberg connection is not currently supported.

Iceberg Pipeline Limitations

Currently only the following are supported as sources for Iceberg pipelines:

CDC-enabled databases.
S3 sources when the “Trigger transformations through events” pipeline source option is enabled.
Event Streams.

Only the following transforms are supported for Iceberg pipelines:

Change type of column.
Convert all column names from CamelCase to snake_case.
Extract Root-Level Fields from JSON Object.
Flatten JSON Object.
Interpret Column as Date/DateTime.
Interpret Column as Timestamp.
Rename Columns.
Split JSON Array to Rows.
Split Column on Symbol.
- Splitting columns on positions is not currently supported.

Parsing errors result in a pipeline stoppage, please reach out to support for details on the parsing error.

CDC Iceberg Pipeline Limitations

Automatic schema changes are not currently supported for CDC Iceberg destinations. Schema change notifications will still be sent, and can be resolved by updating the script manually in the Wrangler.
Replace-mode is not currently supported for CDC Iceberg pipelines.

Event-Triggered Iceberg Pipeline Limitations

Replace-mode is not currently supported for event-driven Iceberg pipelines.
Only the following source file formats are supported for Iceberg pipelines:
1. CSV and TSV.
2. Parquet.
3. JSON Data.
4. Other Plain-Text Formats.
5. Compressed Files.

Event Stream Iceberg Pipeline Limitations

Schema changes are not currently detected for event stream Iceberg pipelines. New columns can be added to the event body by manually updating the script in the Wrangler.
File and folder selection during pipeline creation is only used for deciding what sample to show in the Wrangler. When the pipeline is running, this selection is ignored and all events sent to the webhook listed on the source connection will be processed.
Only events sent to the webhook within the last 365 days will be processed by the pipeline. Refreshing a pipeline will remove any data sent more than 365 days before the refresh was started.
Pipelines that load to Iceberg can only be created from event streaming connections that were created after 2025-07-17.