Azure Blob Storage
Source Setup
In order to create an Azure Blob Storage connection, you need to have an App registration in Entra ID with federated credentials and permissions to access Storage data.
Etleap will ask for the following information:
- From Entra ID (These can be found in the Overview section of your App registration):
- Application (client) ID
- Directory (tenant) ID
- From Azure Storage:
- Storage account
- Container name
- Base directory
Step 1. Role Assignments to your App Registration
- In Azure portal, go to the resource you want to give Etleap access to (this can be your storage account).
- Go to Access Control (IAM) → Add → Add role assign.
- Select Storage Blob Data Reader and click next.
- Click + Select members and add your app.
- Finally, click Review + assign to finish role assignment.
If you want to create multiple connections for different storage accounts, you can instead choose your subscription resource in step 1 to avoid having to repeat that for each storage account.
Step 2. Add Federated Credentials to your App Registration
- In Microsoft Entra, go to App Registration → Certificates & Secrets → Federated credentials → Add credential.
- In the Federated credential scenario dropdown, choose
Other issuer - In the issuer field, enter
https://cognito-identity.amazonaws.com - In the value field, enter the identifier of Amazon Cognito that Etleap will create for your organization. This is found within the Authentication section of the Azure Blob Storage connection setup page in Etleap.
- Chose a name for your credential.
- Change the audience value to the one provided in the connection setup page and click add.
What Data is Available?
Any data stored on the Azure Blob Storage server that your app has access to can be ingested through Etleap as long as it is one of the Supported Types.
How Data is Updated
Depending on your use case, you can configure the pipeline to use any of the modes supported for file sources.
Files that Change
When the Later files add to previous files option is used during pipeline creation for Azure Blob Storage sources, Etleap also provides the choice of reprocessing files that have changed.
- Files do not change means that Etleap will not check whether a file has changed in the source and will only process files with new paths.
- Files sometimes change means that Etleap will check whether files that were already processed have changed. If the file has changed, Etleap fetches the new file, removes the old file’s data from the destination, and adds the changed data. (Note, old data is only removed from warehouse destinations.) This option adds a new column
file_path_hashto the destination table which it uses to identify records that came from an old file.
To detect new and changed files, Etleap uses file metadata from Azure Blob Storage including modification timestamps and file sizes. New files are detected by their path while changed files are detected by comparing the file’s modification time and size with the cached values from the previous version of the file.