cobrix | Apache Spark Ecosystem Directory

Bot releases are visible (Hide)

cobrix - Minor bugfix release

Published by yruslan over 2 years ago

#474 Fix numeric decoder of unsigned DISPLAY format. The decoder made more strict and does not allow sign
overpunching for unsigned numbers.
#477 Fixed NotSerializableException when using non-default logger implementations
(Thanks @joaquin021).

cobrix - Minor feature release

Published by yruslan over 2 years ago

Improved schema flattening method SparkUtils.flattenSchema() for dataframes that have arrays. Array size metadata is used to determine maximum array elements, making it much faster for dataframes produced from mainframe files.
#324 Allow removing of FILLERs from AST when parsing using 'parseSimple()'. The signature of the method has
changed. The boolean arguments now reflect more clearly what they do.
#466 Added maxElements and minElements to Spark schema metadata for
array fields created from fields with OCCURS. This allows knowing the maximum number of elements in arrays when flattening the schema.

cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

cobrix - Minor feature release

Published by yruslan almost 3 years ago

#430 Added support for 'twisted' RDW headers when big-endian or little-endian RDWs use unexpected RDW bytes.

cobrix - Minor feature release

Published by yruslan about 3 years ago

cobrix - Feature Release

Published by yruslan about 3 years ago

#412 Add support for variable block (VB aka VBVR) record format.
Options to adjust BDW settings are added:
- is_bdw_big_endian - specifies if BDW is big-endian (false by default)
- bdw_adjustment - Specifies how the value of a BDW is different from the block payload. For example, if the side in BDW headers includes BDW record itself, use .option("bdw_adjustment", "-4").
- Options is_record_sequence and is_xcom are deprecated. Use .option("record_format", "V") instead.
#417 Multisegment ASCII text files have now direct support using record_format = D.

cobrix - Feature Release

Published by yruslan about 3 years ago

#405 Fix extracting records that contain redefines of the top level GROUPs.
#406 Use 'collapse_root' retention policy by default. This is the breaking,
change, to restore the original behavior add .option("schema_retention_policy", "keep_original").
#407 The layout positions summary generated by the parser now contains level
numbers for root level GROUPs. This is a breaking change if you have unit tests that depend on the formatting of the layout
positions output.

cobrix - Minor feature release

Published by yruslan over 3 years ago

#397 Fix skipping of empty lines when reading ASCII files with is_record_sequence = true
#394 Added an ability to specify multiple paths to read data from (Use .option("paths", inputPaths.mkString(","))). This is a workaround implementation since adding support for multiple paths in load() would require a big rewrite for spark-cobol from data source to data format.
#372 Added an option to better handle null values in DISPLAY formatted data: .option("improved_null_detection", "false")