Connection Setup
Etleap needs write access to the S3 bucket that you want load data to. To this end, you can create an AWS IAM user that’s set up to have minimal access, i.e. write access to the destination (aka data lake) bucket and read access to the intermediate data bucket.
If you don’t already have a Glue Data Catalog database you would like to use for Etleap to write table schemas to, go to the Glue databases section of the AWS console and click Add Database.
Here’s how to set up the role and the policy so we can load into the data lake:
- Go to the IAM section of the AWS console .
- Click Create role, and then select Another AWS account.
- Under Account ID, for hosted account, enter
223848809711
. For VPC deployments enter the AWS account ID where the instance is deployed. Contact support if you have trouble finding the account ID. - Check the box that says Require External ID, and enter the ID provided in the instructions dropdown in the Role section if the connection setup page (step 2).
- Click Next until you reach the Review page, i.e. don’t add any permissions at this time.
- Enter a role name.
- Click Create Role. You’ve now created a user without any access to your AWS resources.
- Find the role in the list, and click it.
- Copy the “Role ARN”, and enter it into the Role section on the Etleap setup page (step 2).
- In the “Permissions” tab, click the “Add inline policy” link.
- Click the “JSON” tab.
- In the statement below, substitute:
- Your bucket name for
<DESTINATION_BUCKET>
. - Any subfolder in the bucket for
<CUSTOM_PREFIX>
. - The intermediate bucket name set up previously for
<INTERMEDIATE_BUCKET>
. - The region your Glue Data Catalog database is in for
<REGION>
. - Your 12-digit AWS account ID for
<AWS_ID>
. - The name of your Glue Data Catalog database for
<GLUE_DATABASE>
- Your bucket name for
{
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::<INTERMEDIATE_BUCKET>/output/*"]
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<INTERMEDIATE_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": "output*"
}
}
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": ["arn:aws:s3:::<DESTINATION_BUCKET>/*"]
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<DESTINATION_BUCKET>",
"Condition": {
"StringLike": {
"s3:prefix": "<CUSTOM_PREFIX>*"
}
}
},
{
"Effect": "Allow",
"Action": [
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:BatchCreatePartition",
"glue:GetPartitions"
],
"Resource": [
"arn:aws:glue:<REGION>:<AWS_ID>:table/<GLUE_DATABASE>/*",
"arn:aws:glue:<REGION>:<AWS_ID>:database/<GLUE_DATABASE>",
"arn:aws:glue:<REGION>:<AWS_ID>:catalog"
]
}
]
}
- Copy and paste it into the textbox. Then click “Review policy”.
- For “Name”, enter anything you like, e.g. “etleap_minimal_access_policy”.
- Click “Create policy”