parquet-dotnet

Fully managed Apache Parquet implementation

MIT License

Stars
596
Committers
100

Bot releases are visible (Hide)

parquet-dotnet - 4.16.0

Published by github-actions[bot] about 1 year ago

New

  • Markdown documentation fully migrated to GitHub Pages. It was becoming slightly unmanageable and also recent GitHub updates made markdown files look awful. Also I kind of wanted to try Writerside by JetBrains, and publish docs with pride ;) @aloneguid
  • Class deserializer will now skip class properties and not throw an exception if they are missing in the source parquet file. Thanks to @greenlynx in #361.
  • Column statistics can be read with zero cost without reading the data. Thanks to @mirosuav in #252, #368.
  • Support for DELTA_BINARY_PACKED encoding on write. This encoding is now default when writing INT32 and INT64 columns. Most of the work done by @ee-naveen in #382.

Improvements

  • IronCompress was updated to v1.5.1 by @aloneguid.

Fixes

  • Fix precision issues writing DateTime as milliseconds by @spanglerco in #312.
  • In DataColumnWriter, RecycableMemoryStream wasn't used in a particular case, and instead MemoryStream was initialized directly. Thanks to @itayfisz in #373.
  • Bitpacked Hybrid decoder was failing on columns containing exactly one value.
parquet-dotnet - 4.15.0

Published by github-actions[bot] over 1 year ago

Bugs Fixed

  • strings must be null by default (#360). Thanks @waf!

New Stuff

  • You can force optionality of a schema field using the [ParquetRequired] attribute.
  • ParquetSerializer validates class schema against actual file schema on deserialization and throws a helpful exception, like System.IO.InvalidDataException : property 'Id' is declared as Id (System.String?) but source data has it as Id (System.String) (you can spot the difference in nullability here).
parquet-dotnet - 4.14.0

Published by github-actions[bot] over 1 year ago

  • Added support for reading legacy array primitives collection serialized via legacy ParquetConvert class or some other legacy system, thanks to @PablitoCBR. This work was effectively taken from his PR and integrated more natively into this library. Thank you very much!

  • Fixed deserializing parquet generated by Azure Data Explorer non-native writer by @mcbos in #357.

  • re-worked build pipeline to separate build and release stage.

  • use handcrafted release notes file and cut out last version notes with grep/head/tail on release. This is in order to improve release notes experience as autogenerated ones are often of sub-par quality.

parquet-dotnet - 4.13.0

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.12.0...4.13.0

parquet-dotnet - 4.12.0

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.11.3...4.12.0

parquet-dotnet - 4.11.3

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.11.2...4.11.3

parquet-dotnet - 4.11.2

Published by github-actions[bot] over 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.11.1...4.11.2

parquet-dotnet - 4.11.1

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.11.0...4.11.1

parquet-dotnet - 4.11.0

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.10.1...4.11.0

parquet-dotnet - 4.10.1

Published by github-actions[bot] over 1 year ago

parquet-dotnet - 4.10.0

Published by github-actions[bot] over 1 year ago

This release has a low-level breaking change but you will only notice it if you are working with low-level thrift metadata. See the detailed reason and description for this change. In some cases you should see 10x performance increase due to a new handcrafted thrift compiler.

No functional changes are introduced in this release, performance only.

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.9.2...4.10.0

parquet-dotnet - 4.9.2

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.9.1...4.9.2

parquet-dotnet - 4.9.1

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.9.0...4.9.1

parquet-dotnet - 4.9.0

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.8.1...4.9.0

parquet-dotnet - 4.8.1

Published by github-actions[bot] over 1 year ago

What's Changed

This release fully supports integration with Microsoft.Data.Analysis package and the docs are available.

There's also a sample Jupyter notebook in C# demonstrating it in action.

image

image

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.8.0...4.8.1

parquet-dotnet - 4.8.0

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.7.1...4.8.0

parquet-dotnet - 4.7.1

Published by github-actions[bot] over 1 year ago

parquet-dotnet - 4.7.0

Published by github-actions[bot] over 1 year ago

What's Changed

  • Data Column now exposes definition and repetition levels #288
  • ParquetSerializer correctly interprets nulls at various levels of nested structures #282

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.6.2...4.7.0

parquet-dotnet - 4.6.2

Published by github-actions[bot] over 1 year ago

What's Changed

This release has massive performance improvements for ParquetSerializer comparing to ParquetConvert (marked as "Legacy" in tests):

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
Serialise_Legacy 483.0 us 78.04 us 4.28 us 1.00 0.00 73.2422 7.8125 301.23 KB 1.00
Deserialise_Legacy 881.5 us 224.63 us 12.31 us 1.83 0.03 39.0625 12.6953 161.13 KB 0.53
Serialise 417.0 us 140.59 us 7.71 us 0.86 0.02 66.8945 0.4883 273.25 KB 0.91
Deserialise 241.7 us 72.25 us 3.96 us 0.50 0.01 34.6680 0.2441 142 KB 0.47

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.6.1...4.6.2

parquet-dotnet - 4.6.1

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/aloneguid/parquet-dotnet/compare/4.6.0...4.6.1