Skip to Content
DocumentationFilesFile Types

File Types

Etleap supports ingesting a wide selection of file types. This page enumerates most commonly used file types and formats that Etleap supports. The Wrangler is able to infer the file types and format from a sample of the input files.

Avro

Etleap supports reading from any Avro file, produced by any software as long as the Avro schema in the file complies with the Apache Avro specification .

The Wrangler automatically detects this file format and uses the Parse as Avro transform.

Compressed Files

Compressed file formats such as .zip and .gz will be read and interpreted based on the data format inside the file.

Note

Etleap currently only supports a single compressed file and not a compressed file directory that contains multiple files. If there are multiple files in compressed archive, Etleap will arbitrarily select a single file, which may lead to incomplete data in the destination.

CSV and TSV

A file with extension .csv or .tsv and data that is split into rows by using newlines and into columns by using commas or tabs.

The first transformation step will be Parse CSV.

Excel

Files with the extensions .xls or .xlsx are recognized as Excel files in the file picker. You can choose the sheets from which you want to extract data. Unless additional sheets are selected, Etleap will extract data from only the first sheet. Additionally, if a glob or regex pattern is used and matches an Excel file, only the first sheet is selected.

The first transformation step will be Parse as Excel.

JSON Arrays

A textual file containing a JSON array of JSON objects (e.g. [{"foo":"bar}, {"foo":"foo"}]). This is particularly useful when input files contain multi-line JSON objects.

The first transformation step will be Parse as JSON array.

For more details see: JSON Parsing

JSON Data

Any file containing lines of JSON objects delimited by newlines. Both flat and nested JSON objects are supported.

The first transformation steps will be Split data repeatedly on newline and Flatten nested JSON object in data.

For more details see: JSON Parsing

Note

For this file format a single line must contain a single JSON object, multi-line JSON objects are not supported.

Other Plain-Text Formats

Any other Plain-Text file such as .txt. The wrangler will select the most suitable default transformation step.

Parquet

Etleap supports reading from any Parquet file, produced by any software as long as the Parquet schema in the file complies with the Apache Parquet specification . This also includes any compression formats such as Snappy.

Wrangler automatically detects this file format and uses Parse as Parquet transform.

Parquet Type Mapping

The table below describes how Parquet types  are mapped to Etleap types .

Parquet Logical TypeParquet Physical TypeEtleap TypeNotes
nullbooleanBOOLEAN
nullint32BIGINT
nullint64BIGINT
nullint96STRINGBase64 encoding of the binary value
nullfloatDOUBLE
nulldoubleDOUBLE
nullfixed_len_byte_arraySTRINGBase64 encoding of the binary value
nullbinarySTRINGBase64 encoding of the binary value
STRINGbinarySTRINGInterpreted as UTF-8
DATEint32DATEDate string in the form “YYYY-MM-DD”
TIMESTAMP: precision MILLISint64DATETIMEDate string in the form “YYYY-MM-DDTHH:mm:ss.SZ”
TIMESTAMP: precision MICROSint64DATETIMEDate string in the form “YYYY-MM-DDTHH:mm:ss.SZ”
TIMESTAMP: precision NANOSint64DATETIMEDate string in the form “YYYY-MM-DDTHH:mm:ss.SZ”
TIME: precision MILLISint64STRINGTime String in the format “HH:mm:ss.sss”
TIME: precision MICROSint64STRINGTime String in the format “HH:mm:ss.sss”
TIME: precision NANOSint64STRINGTime String in the format “HH:mm:ss.sss”
INTERVALfixed_len_byte_array(12)STRINGString representation of the interval in the format “HH:mm:ss.sss”
INT: signed, precision 8, 16, 32, int64int8, int16, int32, int64BIGINT
INT: unsigned, precision 8, 16, 32int8, int16, int32BIGINT
INT: unsigned, precision 64int64DECIMAL(20,0)Unsigned 64 bit precision integer is too wide to fit into a standard BIGINT type
DECIMAL: any precisionfixed_len_byte_arrayDECIMAL(precision, scale)
DECIMAL: precision <= 9int32DECIMAL(precision, scale)
DECIMAL: 9 < precision <= 18int64DECIMAL(precision, scale)
DECIMAL: any precisionfixed_len_byte_array, binaryDECIMAL(precision, scale)
ENUMbinarySTRINGInterpreted as a UTF-8 String
UUIDfixed_len_byte_array(16)STRINGString formatting of the UUID: 00112233-4455-6677-8899-aabbccddeeff
JSONbinarySTRINGString containing the JSON object
BSONbinarySTRINGString representation of the BSON object: {key=>value, ....}
LIST-STRINGJSON array containing the objects or primitive values
MAP-STRINGJSON object containing the key-value pairs

XML

Files with file extension .xml and following XML specification .

First transformation step: Parse XML.