cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

APACHE-2.0 License

Stars
136
Committers
28

Bot releases are hidden (Show)

cobrix - Minor bugfix release

Published by yruslan over 2 years ago

  • #474 Fix numeric decoder of unsigned DISPLAY format. The decoder made more strict and does not allow sign
    overpunching for unsigned numbers.
  • #477 Fixed NotSerializableException when using non-default logger implementations
    (Thanks @joaquin021).
cobrix - Minor feature release

Published by yruslan over 2 years ago

  • Improved schema flattening method SparkUtils.flattenSchema() for dataframes that have arrays. Array size metadata is used to determine maximum array elements, making it much faster for dataframes produced from mainframe files.
  • #324 Allow removing of FILLERs from AST when parsing using 'parseSimple()'. The signature of the method has
    changed. The boolean arguments now reflect more clearly what they do.
  • #466 Added maxElements and minElements to Spark schema metadata for
    array fields created from fields with OCCURS. This allows knowing the maximum number of elements in arrays when flattening the schema.
cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

  • #451 Fixed COMP-9 (Cobrix extension for little-endian binary fields).
cobrix - Minor bugfix release

Published by yruslan almost 3 years ago

  • #435 Fixed 'INDEXED BY' clause followed by multiple identifiers.
  • #437 Added support for '@' characters inside identifier names.
cobrix - Minor feature release

Published by yruslan almost 3 years ago

  • #430 Added support for 'twisted' RDW headers when big-endian or little-endian RDWs use unexpected RDW bytes.
cobrix - Minor feature release

Published by yruslan about 3 years ago

  • #420 Add experimental support for fixed blocked (FB) record format.
  • #422 Fixed decoding of 'broken pipe' (¦) character from EBCDIC.
  • #424 Fixed an ASCII reader corner case.
cobrix - Feature Release

Published by yruslan about 3 years ago

  • #412 Add support for variable block (VB aka VBVR) record format.
    Options to adjust BDW settings are added:
    • is_bdw_big_endian - specifies if BDW is big-endian (false by default)
    • bdw_adjustment - Specifies how the value of a BDW is different from the block payload. For example, if the side in BDW headers includes BDW record itself, use .option("bdw_adjustment", "-4").
    • Options is_record_sequence and is_xcom are deprecated. Use .option("record_format", "V") instead.
  • #417 Multisegment ASCII text files have now direct support using record_format = D.
cobrix - Feature Release

Published by yruslan about 3 years ago

  • #405 Fix extracting records that contain redefines of the top level GROUPs.
  • #406 Use 'collapse_root' retention policy by default. This is the breaking,
    change, to restore the original behavior add .option("schema_retention_policy", "keep_original").
  • #407 The layout positions summary generated by the parser now contains level
    numbers for root level GROUPs. This is a breaking change if you have unit tests that depend on the formatting of the layout
    positions output.
cobrix - Minor feature release

Published by yruslan over 3 years ago

  • #397 Fix skipping of empty lines when reading ASCII files with is_record_sequence = true
  • #394 Added an ability to specify multiple paths to read data from (Use .option("paths", inputPaths.mkString(","))). This is a workaround implementation since adding support for multiple paths in load() would require a big rewrite for spark-cobol from data source to data format.
  • #372 Added an option to better handle null values in DISPLAY formatted data: .option("improved_null_detection", "false")
Package Rankings
Top 18.07% on Repo1.maven.org
Badges
Extracted from project README
License: Apache v2 FOSSA Status Build Maven Central
Related Projects