A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
APACHE-2.0 License
Bot releases are visible (Hide)
Published by yruslan over 2 years ago
Published by yruslan over 2 years ago
SparkUtils.flattenSchema()
for dataframes that have arrays. Array size metadata is used to determine maximum array elements, making it much faster for dataframes produced from mainframe files.maxElements
and minElements
to Spark schema metadata forOCCURS
. This allows knowing the maximum number of elements in arrays when flattening the schema.Published by yruslan almost 3 years ago
Published by yruslan almost 3 years ago
Published by yruslan almost 3 years ago
Published by yruslan almost 3 years ago
Published by yruslan about 3 years ago
¦
) character from EBCDIC.Published by yruslan about 3 years ago
is_bdw_big_endian
- specifies if BDW is big-endian (false by default)bdw_adjustment
- Specifies how the value of a BDW is different from the block payload. For example, if the side in BDW headers includes BDW record itself, use .option("bdw_adjustment", "-4")
.is_record_sequence
and is_xcom
are deprecated. Use .option("record_format", "V")
instead.record_format = D
.Published by yruslan about 3 years ago
.option("schema_retention_policy", "keep_original")
.Published by yruslan over 3 years ago
is_record_sequence = true
.option("paths", inputPaths.mkString(","))
). This is a workaround implementation since adding support for multiple paths in load() would require a big rewrite for spark-cobol from data source to data format..option("improved_null_detection", "false")