Data Format
Etleap supports two formats for the output to your data lake: CSV and Parquet. You pick your format in the last step of the pipeline setup. The Parquet files are compressed with Snappy compression .
Due to limitations with Glue, the CSV output format is not supported if the pipeline destination connection has a Glue database defined.
Parquet Type Mappings
The following outlines how Etleap data types map to Parquet data types when loading to a data lake destination.
| Etleap Type | Parquet Physical Type | Parquet Logical Type | Notes |
|---|---|---|---|
| INT | int64 | null | |
| BIGINT | binary | null | Width exceeds int64 range |
| BOOLEAN | boolean | null | |
| NUMBER(s,p) | fixed_len_byte_array | DECIMAL(s,p) | |
| NUMBER | double | null | |
| DATE | int32 | DATE | |
| DATETIME | int64 | TIMESTAMP_MILLIS | Milliseconds are used since AWS Athena and Glue do not currently support microseconds resolution for timestamps |
| STRING | binary | STRING | |
| JSON | binary | JSON |